Probability Models

Chapter 2
Probability Models
2.1
Classical Definition of Probability
In this chapter we shall be discussing the three definitions of probabilityviz. (i)

Classical definition (pioneered by Laplace & Bernoulli), (ii) Statistical definition
(due to Von Mises) (iii) Axiomatic definition (due to Kolmogorov).
First of all, lets take up the classical definition, which bases itself, mainly on
intuition. Before embarking on our main discussion - lets explain at the outset,
a set of terminologies that act as prerequisites - (a) Event and Elementary
event (b) Mutually exclusive elementary outcomes (c) Exhaustive elementary
outcomes (d) Mutually symmetric or equally likely elementary outcomes.
We define any subset of the sample space to be an event and a singleton
subset as an elementary event. Mind that this definition of event will no
longer survive when we pass onto Axiomatic definition (The compulsion behind
this change over to a modified version of event will be clear in course of our
treatment).
Any two elementary outcomes are said to be mutually exclusive if occurrence
of only one of them excludes the occurrence of the other. A set of elementary
outcomes is said to be exhaustive if their union is the full sample space .
If the sample space associated with a random experiment be finite, then its
elementary outcomes are equally likely if no outcome is more likely to occur
than another.
For instance, if we look into the coin-tossing, we observe that there are only
1
2
two possible outcomes viz., head and tail, which are mutually exclusive
and exhaustive. If now we like to incorporate equal likelihood of occurrence
of these elementary outcomes, then we have to presume that the coin is symmetrical (unbiased) and made completely of a homogeneous material.
The classical definition was enunciated by Laplace & Bernoulli. If there are n
exhaustive, mutually exclusive and equally likely cases associated with a random
experiment E, and of these, m are found to be favourable to a given event A,
then the relative frequency of A in n repetitions of the same experiment E will
be
m
n.
According to Laplace this ratio
m
n
P (A) =
is the probability of event A :

m
n
Drawbacks or Limitations of the Classical Definition

Although the classical definition looks rather simple and intuitive, it suffers from
a few serious drawbacks. A priori knowledge that the elementary outcomes are
all equally likely is a basic requirement of this definition. Lets zoom into
what this phrase equally likely stands for. A vague answer is : Elementary
outcomes are equally likely if they are mutually symmetrical. Now what kind
of mutual symmetry is implied ? The very idea of symmetry is subjective and
contextual. When we talked of symmetrical (unbiased) coin, we implied barely
geometrical symmetries and not the kinetic symmetries. However, once we
demand asymmetry of the labels (or engravings) on the two faces of the coin,
the possibility of geometrical symmetry is ruled out ! If we like to harbour on
kinetic symmetry, minimal two criteria need be satisfied :
(i) Coincidence of the c.g. (centre of gravity) of the coin with the geometrical
centre of the coin.
(ii) Parallel and perpendicular axis theorems for the moments of inertia of
rigid bodies must hold good for the coin.
Even if we presume the existence of the mutual symmetry of the coin, despite
all these ifs and buts, we cannot be still out of trouble as there is no means
to prescribe appropriate conditions of the so-called mutual symmetry for the
process of throwing a coin.
Summing up all these discussions, it transpires that the very idea of equally
likely outcomes is iffy and hypothetical. For instance, if the coin is previously
3
known to be biased in favour of heads, the classical definition stands helpless to
interpret the probability of head or tail.
Even if we accept this equally likely criterion, we may not be able to enumerate all the cases in practice. The counting method fails miserably when
the number of possible outcomes become countably infinite or even uncountable. For example, if the random experiment E consists of tossing a coin (supposed unbiased) until a head appears, then would be countably infinite as
= {H, T H, T T H, T T T H, . . . ad inf}. Then to count the elementary outcomes
of we have no combinatorial clues. So the classical definition has a very small
compass of applicability - a limited class of random experiments for which the
sample space is finite. From the mathematical formula it also follows that
classical definition can yield only rational numbers for probability of an event
a limitation which is very much unacceptable.
The use of the classical definition is limited to the games of chance. The concept
of equally likely might perhaps sound natural in throwing a coin or a pair of
dice, where there is some symmetry inherent in the system, but what about
the systems when any such symmetry is lacking ? For instance, can anybody
proclaim any symmetry in the sex-distribution of the newborn that fully depends on the biology of the organisms ? The answer is in the negative. Thus,
overviewing the various drawbacks it seems that refinement of the definition of
probability is on the cards.
2.2
Statistical Definition of Probability
Once we abandon the classical definition based on intuition, we should like to

build up the concept of probability from an entirely new premise. Leaving aside
intuitions, the laymans next guiding factor is common experience.
If we repeat a random experiment under the same system of causes, (i.e., under
uniform conditions) we would expect the same outcome in each of the repetitions. But in practice we do not obtain the same result on all trials. For
instance, consider the tossing of a fair coin. In a single throw or the coin,
none can foretell what the outcome will be in a particular throw - head or
tail. If we repeatedly throw the coin, head and tail occur very irregularly
or chaotically but the total number of heads are roughly half the total number
of trials. There underlies a multitude of uncontrollable factors that collectively
4
account for this apparent irregularity or chaos in the outcomes of the successive
trials. But when the outcomes of a large number of repetitions are looked upon
a striking regularity is felt even in this chaos. This is what is loosely interpreted
as statistical regularity.
We sum up this empirical facts in the following :
Let some random experiment E be repeated under identical (invariable) conditions as many as n times. Let an event A, connected with this random experiment E be observed to occur n(A) times. The ratio
n(A)
n
is a rational number
which denotes the relative frequency of event A. Prolonged observations show

that under the invariable set of conditions, the number of occurrences or nonoccurrences of A obeys stable laws. For a majority of a series of observations it
is experienced that this relative frequency will be more or less a constant - large
deviations being progressively rarer as the total number of trials is increased.
(This kind of stability of the relative frequencies was first noted in demography). This phenomenon of stabilisation of relative frequency of an event for
long sequences of repetitions of a random experiment is known as Statistical
Regularity.
Von Mises, who is nowadays regarded as the propounder of this Statistical definition of probability on the basis of experimental observations, postulated in his
theory that the rational sequence { n(A)
n } is convergent and the limit to which
{ n(A)
n } converges, is to be defined as the probability of event A. Mathematically,
P (A) = lim
n(A)
n
We shall therefore conclude that the event A has a probability, say P (A), if it
possesses the following peculiarities :
(i) It is plausible, at least conceptually, to repeat the random experiment
indefinitely large number of times under uniform conditions in each of
which the event A may or may not occur.
(ii) As a result of a sufficiently large number of trials it is to be presumed that
the frequency of event A for nearly every large run of trials departs only
slightly from a certain constant.
Advantages and Disadvantages of Von Mises Definition
5
Advantages : (i) Unlike the classical definition, Von Mises statistical definition
has a relatively wide domain of applicability since it applies well to the random
experiments for which the elementary outcomes are not equally likely and/or
countably infinite.
(ii) The statistical definition of probability given here is descriptive rather than
formally mathematical in character. The frequency interpretation assigns an
empirical meaning to probability.
Disadvantages : Von Mises definition, although far more efficient than Laplaces
classical definition, suffers from the following deficiencies :
(i) It claims stabilisation of the relative frequency of an event near a definite
value purely on the basis of practical experience and fails to account for
the traits of those phenomena for which relative frequency stabilises.
(ii) Suppose that from some source we come to know that probability of an
event A is p. However, through a series of independent trials we observe
that the relative frequency of A deviates substantially from the proposed
value p. In this case we shall not doubt the existence of a definite probability of the event A (i.e., we shall not be circumspect about the convergence
of the rational sequence { n(A)
n } or the correctness of the a priori information) but will be sceptical about our premises concerning the proper
organisation of the trials. This belief on which statistical definition bases
itself has insuperable logical difficulties.
(iii) Von Mises definition is clueless when the sample space associated with a
random experiment is uncountably infinite. For example, if the experiment
involves choosing a point at random from the unit interval [0, 1], then the
sample space is uncountable and so statistical definition fails to click.
(iv) The greatest weakness of the statistical definition is inherent in its mathematical formalism. It tries to blend the empirical concept of relative
frequency to that of a more subtle analytic concept of convergence of a
sequence by claiming statistical regularity.
A Few Important Results Deduced from the Statistical Definition
In the following discussion, is the sample space associated with a random
experiment E and stands for the impossible event. is the certain event.
6
(a) Since the relative frequency of the impossible event is 0,
n()
= 0,
n n
P () = lim
(n() standing for the frequency of ).

(b) Since the relative frequency of the certain event is unity,
n()
P () = lim
= 1 (n() standing for the frequency of ).
n n
(c) Additive Rule for Probability :
Lets prove the following additive rule for just two mutually exclusive
events A1 and A2 .
With conventional symbols,
P (A1 )
P (A2 )
P (A1 A2 )
n(A1 )
;
n
n(A2 )
;
= lim
n
n
n(A1 A2 )
= lim
;
n
n
=
lim
We prove that P (A1 A2 ) = P (A1 ) + P (A2 )

Observe that
P (A1 )+P (A2 ) = lim
n(A1 )
n(A2 )
n(A1 ) + n(A2 )
+ lim
= lim
= P (A1 A2 ),
n
n
n
n
n
where in the penultimate step we used the result n(A1 A2 ) = n(A1 ) + n(A2 )
by virtue of Inclusion Exclusion principle (See art. 1.3) since A1 and A2 are
mutually exclusive events.
By principle of Mathematical Induction, lets extend this result literally to
any finite number of pairwise mutually exclusive events : Suppose A1 , A2 , . . . , An
be m pairwise mutually exclusive events that is,
Ai Aj = for i 6= j, i, j = 1, 2, . . . , m.
We are to prove that
m
X
i=1
P (Ai ) = P ( Ai ).
i=1
Associate with each m 2 a proposition

P (m) :
m
X
i=1
P (Ai ) = P ( Ai ), Ai Aj = for i 6= j
i=1
Induction step : Let P (k) be true, i.e.,
k
X
P (Ai ) = P ( Ai )
i=1
i=1
hold true. Again we have

k+1
X
P (Ai )
i=1
k
X

k
+ P (Ak+1 ) = P Ai + P (Ak+1 )
i=1
i=1

k
k+1
= P ( Ai ) Ak+1 = P Ai ,
=
P (Ai )
i=1
i=1
where in deduction of the above chain of equalities, we used the assumption of

truth of P (k) and also the result directly shown to be true for any two mutually
exclusive events.
Hence P (k + 1) is found true.
As the proposition were asserted to be true for m = 2, by the method of
induction, it follows that P (m) is true for any finite m.
(d) In case A1 , A2 , . . . , Am are mutually exclusive and exhaustive, i.e., Ai
m
Aj = for i 6= j and Ai = , then from (a) and (c) it follows that

i=1
m
X
m
P (Ai ) = P Ai = P () = 1.
i=1
i=1
If in particular is a finite sample space, say = {a1 , a2 , . . . , am } then may

be looked into as a union of disjoint singletones {ai }.
m
X
P ({ai }) = 1
i=1
However, for classical definition of probability to hold good, the outcomes are
equally likely, i.e, P ({ai }) are all equal.
1=
m
X
P ({ai }) = mp({ai }),
i=1
implying,
P ({ai }) =
1
.
m
(e) As a passing remark, we state that non-negativity criterion satisfied by

probability of any event is automatically follows since relative frequency,
on which the statistical definition is based, is a non-negative rational number.
8
Hence for any event A, P (A) 0.
The salient deductions from Von Mises definition, therefore yield the following
features :
(i) Non-negativity of probability of any event A.
(ii) The probability of certain event is unity and that of impossible event
is zero. As and are mutually exclusive and exhaustive, P () = 1
implies P () = 0 and vice versa. Thus we need not mention both the
results separately as they are interrelated.
(iii) Probability of pairwise mutually exclusive events are finitely additive.
(iv) Under specific restrictions, classical definition follows from the statistical
definition.
Remark : (a) Classical definition being retrievable from the statistical definition and many other important deductions having been made from this definition, we are naturally inclined to improve the statistical definition in such a
way that all these deductions could be transmitted straightforward to the new
theory. Axiomatic theory of probability was propounded by Kolmogorov in order fulfill this aim of not losing what we have gained already and at the same
time overcoming the logical difficulties. Indeed, if we are to bypass the several
logical impediments faced in statistical definition, then Axiomatic theory is the
best alternative.
(b) Axiomatic treatment of a subject is, however, not new to an undergraduate
student as he/she had a number of occasions to lay hands on it. The
previous instances of axiomatic development may be shortlisted as :
(i) In modern theories of geometry, where the existence of the fundamental
ingredients like points, straight line and planes are left uninterpreted but
whose existence is axiomatised at the onset of the development of the
subject.
(ii) In the development of the set theory where a precise definitions of the
fundamental concept of set itself is lacking.
9
(iii) In modern analysis, where the number system is developed from the
Peanos axioms, followed by the Axioms of Integers which appear in
three equivalent forms, viz, the well-ordering principle, the first principle of mathematical induction and the second principle of mathematical
induction. Here also none is involved in the interpretation of axioms themselves.
Once we embark on the axiomatic development, it is irrelevant to ponder over
the interpretation of probability. Its better that we develop the mathematical
theory of probability from a handful of axioms and look into the preceding relative frequency interpretation of probabilities as barely an intuitive motivation
behind the upcoming definitions and theorems.
(c) Axiomatic theory of probability should be processed in such a way that
it is logically sound, free from the shortcomings of the preceding theories
and expansive enough to accommodate all kinds of random experiments no matter whether the associated sample space is finite, countably infinite
or even uncountable (continuous). But once infinite sample spaces get into
our discussion, we are compelled to update the concept of event itself as
no longer the crude definition any subset of the sample space is an event
is sensible.
2.3
Kolmogorovs Axiomatic Theory
The Axiomatic Theory of probability, dating back to 1933, is based on the

mathematical modeling of a random experiment that is either hypothetical or
realistic. In this approach, the concept of random events is not primary and
is derived from more elementary notions. Once we decide on all the possible
outcomes associated with a random experiment, we formulate a set whose
elements symbolise these outcomes. For the logical development of the axiomatic
theory we need not bother about the nature of the elements of and rather
treat it, from a strictly mathematical viewpoint, as just an abstract point set.
This is referred to as sample space associated with the random experiment.
The next goal is to consider a certain family F of subsets of - the elements
of F will be called random events or simply events to which we intend to
assign probabilities. But what would the collection F be ? It is quite reasonable
10
to demand that F is closed under finite union, finite intersection and complementation. For instance, if A and B be two events, then A B occurs if the
outcome of our experiment is representable either by a point in A or a point in
B. Clearly then, if it is going to be meaningful to talk about the probabilities
that A and B occur, it should also be meaningful to talk about the probabilities
that either A or B occurs, i.e., the event A B occurs. Since only sets in F
would be assigned probabilities, we should demand that A B F whenever
A, B F. Again to say that the event A does not occur is to say that the outcome of our experiment is not represented by a point in A, so that it must be
represented by a point in Ac . It sounds rather idiotic that we could talk about
the probability of A and not of Ac . Thus to be more realistic we demand that
F is closed under complementation, i.e., Ac F whenever A F. By principle
n
of induction one can establish that if A1 , A2 , . . . , An F, then Ak F. By

k=1

c
n
n
De Morgans law it follows that Ak = Ack
F whenever Ak F for
k=1
k=1
k = 1, 2, . . . , n. Finally we observe that for and set A F, Ac F and hence

= A Ac and = A Ac as well as belong to F. Hence F must contain the
empty set (impossible event) and the universal set (certain event) . Until,
now, F is seen to be a field of subsets of or a Boolean algebra of subsets
of . The demand that F is a Boolean algebra implies that the elementary
set-theoretic operations cannot take us beyond the limits of the collection F of
random events.
It turns out, however, that for certain mathematical reasons, just taking F
to be a Boolean algebra of subsets of is insufficient. Again, in many important
problems that we encounter in the real world, the need of F being something
more than Boolean algebra is felt. To meet both ends, we table a stringent
demand that F is to be closed not only under finite set-theoretic operations (like
finite union, finite intersection & complementation) but also under countable
union, countable intersection and complementation. This inclusion of countable
set-theoretic operations extends the Boolean algebra to a -algebra (or -field).
Even in this extended version, the elements of F are labeled as events. Given
a non-empty set , a very trivial example of -algebra of subsets of is its power
set. In the theory of probability, the choice of F is of paramount importance.
In case is at most countable, (i.e, finite or countably infinite), one usually
takes F to be equal to power set P(). Here each singleton is a member of F
11
and is our fundamental object of interest as assigning probability to each such
elementary event serves our purpose. In case is uncountably infinite, P(),
although a -algebra by nature, is not chosen for F since it is too large to work
with. So how to choose F when is uncountable ? The clue can be had from
the fact that given any non-empty class C of subsets of , there exists a unique
-algebra F that contains C - in this case F is referred to as the -algebra
generated by C. Hence the working F depends solely on what non-empty class
F of subsets of we choose in a specific context.
Once the -algebra F relative to gets fixed up, we have an algebraic structure (, F), known as a measurable space and the members of F as measurable sets. In the language of probability theory, (, F) is known as Event
space and the elements of F are known to be events. In particular, when
= R and F = B, the measurable space (, F) becomes the one-dimensional
Borel space (R, B). This measurable space lies at the heart of distribution functions of random variables. In the above discussion we have not yet stated what
B consists of B indeed is the -field generated by the class of all semi-infinite
intervals of the form (, a], a R. B contains all the singletons and all types
of intervals,
finite or infinite, because of the following relations :
1
(i)
(, a) = (, a ]; a R and n N
n=1
n
(ii)
(a, b] = (, b] (, a]; a, b R
1
1
(iii)
{a} = (a , a + ]; a R
n=1
n
n
c
(iv)
(a, +) = ((, a]) ; a R
(v)
[a, b] = (a, b] {a}
(vi)
[a, b) = [a, b] {b} = ((a, b] {a}) {b}
(vii)
R = (, n]
(viii)
= (, n]
n=1
n=1
Remark : (i) The family of all open intervals, closed intervals and family of all
open-closed or closed-open intervals are not -algebra.
(ii) The non-trivial -algebra (i.e., other than P()) was required for nonenumerable because it is impossible to define probabilities consistently for all
subsets of a non-enumerable like R or any internal in R (since any interval in
R is numerically equivalent to R itself vide Schroeder Bernstein theorem).
(iii) Some texts define the -algebra F as a collection of subsets of that is
12
closed under countable union & countable intersection and contains the sample
space itself. This definition is equivalent to our definition of -algebra check.
Definition of -algebra : A non-empty collection F of subsets of a set is
called a -algebra or a -field of subsets of provided it satisfies the following
properties :
(i)
If A F, then Ac F (closure under complementation)
(ii)
If A1 , A2 , . . . , An , . . . F, then An F (closure under countable union).
(iii)
If A1 , A2 , . . . , An , . . . F then An F (closure under contable intersection).
n=1
n=1
From this it transpires that every -field is a field but the converse is not
always true.
Probability Space
To complete the construction of a decent mathematical model for a random
experiment there remains the job of specification or assignment of probabilities. The completed model is conventionally called Probability space which
contains together with a measurable space (, F), a set function P (). A probability space is often denoted as (, F, P ()), where is the sample space of the
random experiment under discussion, F is the -algebra of events of possible
interest and P () is the probability measure that assigns to each set A F a
real number P ().
The probability P (A) in the mathematical model is something that is assumed
to exist as an intrinsic feature of the random experiment, intended to represent
the proportion of trials in which A occurs - no matter whether a long sequence
of trials is carried out in practice or not. For a given A F, P (A) is intended
to embody whats commonly deemed as As chances of occurrence. Since
proportion is a positive real number between 0 and 1, 0 P (A) 1 is a
must. Again to reflect the realistic nature of the mathematical model, the event
representing every possible outcome of the experiment should be assigned
probability
Unity i.e., P () = 1 is mandatory. We recall that the need of an axiomatic
theory of probability was felt to overcome the drawbacks of the classical and
statistical definitions and so the new model we are trying to build up, must
involve the extracts of those predated theories. Non-negativity of probability
of all events and the fact P () = 1 were already involved in earlier theories.
13
The other thing that appears as inheritance from the preceding theories is the
finite additivity of the probability measure. It states that if A1 , A2 , . . . , An
be any finite sequence of pairwise mutually exclusive events, i.e., if Ai Aj =
whenever i 6= j,

P
Ak
k=1
n
X

=
P (Ak )
K=1
But mind that, our model promises to be valid for all kinds of sample spaces and
our F is a -algebra of events. Hence, we have to extend the finite additivity
criterion to countable additivity because otherwise, we cannot account for
the probability of events that decompose into a countably infinite number of
more elementary events. In fact, unless the extended axiom of addition is
incorporated there may crop up a series of mathematical predicaments in the
new model. Hence the updated model involves a probability measure which is
a real-valued set function whose domain is the -algebra Fc and which satisfies
the properties :
(i) P (A) 0 A F
(ii) P () = 1
(iii) If {An } be a sequence of pairwise mutually exclusive sets in F, then An
F and moreover, P ( An ) =
n=1
n=1
P (An ).
n=1
The last axiom of countable additivity holds the key in our model and includes
finite additivity as a special case.
Remarks : (i) Kolmogorovs set of axioms is incomplete as even for one and the
same sample space , we can choose the probabilities of the events in F in more
than one ways. For instance, let us consider throwing a die the associated
sample space being = {1, 2, 3, 4, 5, 6}. Had this die been fair, i.e., symmetrical
in geometrical appearence and made of a homogeneous material, we would have
P (k) =
1
6
for all k = 1, 2, 3, 4, 5, 6. Had the die been not fair, the probability
distribution will be quite different, say, P (1) = P (3) = P (5) =

P (4) = P (6) =
1
12 .
1
4;
P (2) =
In both cases, the axioms of Kolmogorov get satisfied.
Thus dice throwing is a phenomenon whose study demands consideration of

identical random events but with different probabilities. It is an indication that
there exist more than one probability spaces which faithfully represent the same
physical experiment.
(ii) In constructing a probability model it is very much desirable to know the
14
correct probabilities of each event, but since to have that knowledge, one has
to perform infinitely many trials, one has no scope of asserting whether the
associated probability space represents the random experiment faithfully. This
shortcoming is a general feature of any mathematical model.
(iii) If the sample space is discrete and has n elements, it is possible to assign
X
to every singleton {w} a definite probability. If A F then P (A) =
P ({}).
A
In particular, one may assign equal probability to all the elementary events of
. If cardinality of A be m, P (A) =
m
n.
If the sample space be countably infinite, it suffices to make the assignment

X
P ({w}). However, unlike the
for each elementary event. If A F, P (A) =
wA
finite sample space, assignment of equal probability is an absurdity as the series

X
X
P ({w}) = C
1 is not convergent.
w
If the sample space be uncountable, each singleton is an elementary event

and one cannot make an equally likely assignment of probabilities. However, one
can never assign positive probability to each elementary event without violation
of P () = 1. In this case one is compelled to assign probabilities to compound
events.
(iii) Kolmogorovs axioms instigate us to develop the theory of probability by
using the abstract probability space as a basis of operation. No doubt, from a
mathematicians point of view having a model in possession is very comfortable,
but we should be aware of the fact that consistency, adequacy and certain salient
features of the model can be explored only through down-to-earth iterative
performance of the experiment. When a random experiment is performed a
large number of times under identical or uniform conditions, the proportion of
trials in which a given event A occurs (the frequency ratio of the event A) is
observed to stabilise near a limiting value, that is, the frequency ratio f (A) of
event A will be approximately equal to its probability P (A). This implies that
the probability P (A) of the event A is the idealised non-negative real number
whose experimentally measured value is the frequency ratio - the degree of
precision increasing with the increasing length of the sequence of iterations.
Important Deductions from Axiomatic Definition
From the three axioms of Kolmogorov we may deduce the following important
rules in probability theory.
15
(i) Since for any event A, A Ac = and A Ac = , axiom of countable
additivity asserts that 1 = P (A) + P (Ac ), which implies that
P (Ac ) = 1 P (A).
(ii) Using the second axiom of Kolmogorov and also (i) we have :
P () = P (c ) = 1 P () = 1 1 = 0
i.e., probability of an impossible event is 0.
Remark : If the probability P (A) of any event A be zero, we cannot conclude
that A = . The events A for which P (A) = 0 are said to be stochastically
impossible. Impossible events are stochastically impossible but converse is not
always true. The best example of a non-trivial stochastically impossible event
can be had in case of continuous distributions.
The complement of a stochastically impossible event is called stochastically certain event. Here also any event B for which P (B) = 1, B is not
necessarily the certain event . The stochastically certain events are independent of any other event, the proof of which will be furnished when we introduce
the idea of independence of events. (see worked example (25))
(iii) P is bounded above by 1 i.e., P (A) 1 A F because P (Ac ) = 1
P (A) 0 (vide first axiom).
(iv) P is monotone and subtractive, i.e, if A, B F and A B, then P (A)
P (B) and P (B A) = P (B) P (A).
Proof : Since A B, we can write B = A (B A) where A (B A) = .
From countable additivity it follows that P (B) = P (A) + P (B A) and as
P (B A) 0, P (A) P (B) holds true. Moreover, P (B A) = P (B) P (A).
If B were , then monotonicity would assert that P (A) P () and as P () =
1, P (A) 1. So (iv) includes (iii) as a special case.
(v) Addition Rule : If A, B are two events in F,
P (A B) = P (A) + P (B) P (A B)
Proof : From the juxtaposed figure it follows that
(A B) (B A) (A B)
A = (A B) (A B)
B = (B A) (A B)
AB
16
Figure 2.1: Venn diagram of A B.
Using countable additivity of P () we have :
= P (A B) + P (B A) + P (A B)
P (A) = P (A B) + P (A B)
P (B) = P (B A) + P (A B)
P (A B)
so that
P (A B) = P (A) + P (B) P (A B).
(vi) Extended or General Addition Rule : If A1 , A2 , . . . , An be n random
events, then
n
P ( Ai ) =
i=1
n
X
P (Ai )
i=1
n
X
P (Ai Aj ) +
i,j=1
i<j
n
X
P (Ai Aj Ak )
i,j,k=1
i<j<k
. . . + (1)n1 P (A1 A2 . . . An )
In (v) we have already seen our result to be true for n = 2, i.e.,
P (A1 A2 ) = P (A1 ) + P (A2 ) P (A1 A2 ).
Assume the result to be true for n = t in particular.
t
P ( Ai ) =
i=1
t
X
i=1
P (Ai )
t
X
i,j=1
i<j
P (Ai Aj ) +
t
X
P (Ai Aj Ak )
i,j,k=1
i<j<k
. . . + (1)t P (A1 A2 . . . At )
17
t+1
Consider At+1 and from P ( Ai )

i=1

t
( Ai ) At+1
i=1

t
t
= P Ai + P (At+1 ) P
Ai At+1
i=1
i=1

t
t
= P Ai + P (At+1 ) P (Ai At+1 )
i=1
i=1
t
t
= P Ai + P (At+1 ) P Bi (writing Bi Ai At+1 )
t+1
P ( Ai )
= P
i=1
i=1
i=1
Again we have

P
Bi
i=1
t
X
P (Bi )
i=1
t
X
t
X
P (Bi Bj ) +
i,j=1
i<j
P (Bi Bj Bk )
i,j,k=1
i<j<k
. . . + (1)t P (B1 B2 . . . Bt )
Now,
t
X
P (Bi ) =
i=1
is
t
X
P (Bi Bj ) =
i,j=1
t
X
P (Ai At+1 ), where # summands of type P (Ai At+1 )
i=1
t
X
P (Ai Aj At+1 ), where # summands of type P (Ai
i,j=1
i<j
Aj Ak ) is tC2 )
and in
t
X
P (Ai Aj ),
i,j=1
i<j
(# sumands of type P (Ai Aj ) is t C2 )

This ordered-pair-involving terms are C2 +t C1 =(t+1) C2 in number and hence
t+1
X
can be written as
P (Ai Aj ) in the abridged form.
i,j=1
i<j
Similarly ordered triplets appear in
t
X
i,j,k=1
i<j<k
P (Ai Aj Ak ) and
t
X
P (Bi
i,j=1
i<j
Bj ), the total number of summands being t C3 in the first case and the total
summands being t C2 i the second case, so that there are t C3 +t C2 =(t+1) C2
ordered-triplet involving terms which can be expressed in the condensed for m
t+1
X
as
P (Ai Aj Ak ).
i,j,k=1
i<j<k
18
Proceeding in this way and employing grouping of terms one may write that
! t+1
t+1
t+1
t+1
X
X
X
X
P
Ai =
P (Ai )
P (Ai Aj ) +
P (Ai Aj Ak )
i=1
i=1
i,j=1
i<j
i,j,k=1
i<j<k
. . . + (1)t+1 P (A1 A2 . . . At+1 ),

i.e., the proposition we set out to prove, is valid for n = (t + 1) on the basis of
assumption that it is true for n = t. However, since the proposition is known to
be valid for n = 2, by Principle of Mathematical Induction (first form) it follows
that the result is valid for all finite natural number n.
Often this result is quoted as Principle of Inclusion-Exclusion.
(vii) Booles Inequality : For any finite sequence of events A1 , A2 , . . . , An
belonging to F, we have the inequality :

Ak
k=1
n
X
P (Ak )
k=1
Proof : From the addition rule it follows that P (A1 A2 ) P (A1 ) + P (A2 ).
Using this result as the basis of induction, we can prove the above result through
the principle of induction.
Alternative form : For any two events A1 and A2 there holds the inequality
P (A1 A2 ) 1 P (Ac1 ) P (Ac2 )
Proof is again based on additive rule V :
P (A1 A2 )
= P (A1 ) + P (A2 ) P (A1 A2 )
P (A1 A2 )
= P (A1 ) + P (A2 ) P (A1 A2 )

=
1 P (Ac1 ) + 1 P (Ac2 ) P (A1 A2 )
1 P (Ac1 ) P (Ac2 ) + P ((A1 A2 )c )
1 P (Ac1 ) P (Ac2 )
(since probability is a non-negative set function).
Extension : For n events A1 , A2 , . . . , An F,
n
P ( Ai ) 1
i=1
n
X
i=1
P (Aci )
19
Suppose that for n = t, the proposition is true.

P
Ai
i=1
t
X
P (Aci )
i=1
We prove, on the basis of this supposition that the proposition is true for i =
(t + 1) :

P
Ai

Ai
i=1
( Ai ) At+1
=P
i=1

=P
t+1
i=1
P (Act+1 ) 1
t
X

1P
( Ai )
i=1
P (Act+1 )
P (Aci ) P (Act+1 ) = 1
i=1
t+1
X
P (Aci )
i=1
Since the proposition was shown to be true already for n = 2, by induction

principle the result is established for every finite n.
(viii) Boundedness Criterion : For any two events A, B F, A B A
A B and A B B A B so that P (A B) P (A), P (B) P (A B).
This is given as : P (A B) min{P (A),
In general,
P (B)} P (A B).
if {An } be a sequence of events then P An P (Ak ) P An for any

n=1
n=1
k = 1, 2, . . . , .
(ix) Deduction of Classical Definition of Probability
In Laplaces classical definition, events (i.e., possible outcomes) are mutually
exclusive, exhaustive and equally likely.
A1 , A2 , . . . , An are said to be pairwise mutually exclusive if Ai Aj =
n
for i 6= j. A1 , A2 , . . . , An are exhaustive if Ai = and A1 , A2 , . . . , An are

i=1
equally likely if P (Ak ) are same. If P (Ak ) = c k = 1, 2, . . . , n then 1 =

n
X
n
P () = P ( Ai ) =
P (Ai ) = cn
i=1
i=1
P (Ak ) = c =
1
k = 1, 2, . . . , n.
n
Taken B as an event which is the union of m simple events, say, {Ai1 , Ai2 , . . . , Aim },
m
we have B = Aik so that due to the countable additivity axiom,

k=1
P (B) = P ( Aik ) =
k=1
m
X
P (Aik ) =
k=1
a result that corresponds to the classical definition.
m
,
n
20
(x) Axiom of Continuity : If {An } be a monotone sequence of events, no
matter whether it is expanding or contracting, then
P ( lim An ) = lim P (An ).
n
Proof : Let {An } be an expanding sequence of events - i.e.,

A1 A2 A3 . . . An . . .
Define B1 = A1 and Bn = An An1 n 2
n
An = Bk and Bi Bj = when i 6= j
k=1
An = lim
Ak = lim An = lim
n k=1
k=1
Bk = Bk
n k=1
k=1
Now, due to axiom of countable additivity,
P ( An )
n=1
= P ( Bk ) =
k=1
=
=
n
X
lim
lim
k=2
n
X
P (Bk ) = lim
k=1
n
X
P (Bk )
k=1
P (Ak Ak1 ) + P (A1 ) since (B1 = A1 )

{P (Ak ) P (Ak1 )} + P (A1 ) lim P (An )
n
k=2
Hence P ( lim An ) = lim P (An ) [since {An } is expanding, An = lim An ]

n
n=1
In case {An } be contracting, {Acn } is expanding and hence, by virtue of the

result just proved for expanding sequence of events we have
P ( Acn ) = lim P (Acn )

n=1
or,
1 P ( An ) = 1 lim P (An )
n=1
or,
P ( An ) = lim P (An )
n=1
or,
P ( lim An ) = lim P (An )
n
[since {An } is contracting, An = limn An ]

n=1
Recalling the definition of a continuous function in the language of limit of a

sequence one therefore conclude that in either case P () is a continuous set
21
function defined on F. Perhaps it is the underlying reason why this result is
popular as Axiom of Continuity.
Remark : The extended axiom of addition (or axiom of countable additivity
of Kolmogorov) follows from the Axiom of Continuity. Details of the proof is
given in Appendix A2.
2.4
Worked Examples on the General Theory
Example 1. Two cards are drawn from a deck of well-shuffled cards. What is
the probability that both the extracted cards are aces ?
Solution : As 52 cards form a complete deck, there exist 52 ways of selecting
the first card. After the first card is drawn, the second card may be one of the
remaining 51 cards and so the total number of ways of drawing a pair of cards
is 52 51. All the cases may be regarded equally likely. To find the number
of favourable cases of drawing a pair of aces, we observe that there are 4 aces;
therefore there exist 4 ways to get the first ace and 3 ways to have the second
ace. Hence the total number of ways to draw a pair of aces is 4 3 = 12 and
the required probability is
12
5152
1
221 .
Example 2. If two fair dice are thrown simultaneously, what is the probability
that the sum of the points exceeds 9 ?
Figure 2.2: Diagrammatic Presentation of Outcomes.
22
Solution: The sample space = {(i, j)/i, j = 1, 2, 3, 4, 5, 6}. For visualisation
we arrange those ordered pairs in the form of a 66 matrix - the first die displays
its points along the horizontal while the second die displays its points along the
vertical. We need to identify those ordered pairs for which i+j > 9. From this it
follows that there are total 1 + 2 + 3 = 6 favourable cases and hence by classical
definition of probability, P (required event) =
6
36
= 16 . Alternatively, one could
have found out separately P ({(i, j) : i + j = 10}), P ({(i, j) : i + j = 11}) and

P ({(i, j) : i + j = 12}) and use Kolmogorovs axiom of additivity for pairwise
mutually exclusive events.
Example 3. A pair of true dice are thrown n times in succession. Find the
probability of having a double six at least once. If we want to make this
probability greater than 13 , find the minimum number of throws.
Solution: Associated with each throw of the pair of fair dice there are 36
possible outcomes and hence the total number of possible cases in n throws will
be (36)n . The total number of unfavourable cases associated with each throw
being 35, the total number of unfavourable cases in n throws will be (35)n so
that the total number of favourable cases is (36)n (35)n . By classical definition
it therefore follows that the probability of having double six is
(36)n (35)n
p=
=1
(36)n
35
36
n
For the second part, we need to determine the minimum value of n so that p >
n
holds. To this end, we must solve the inequality 35
< 23 .
36
n>
1
3
log 3 log 2
14.393
log 36 log 35
It means that in 15 throws, there is more likelihood to obtain a double six at

least once than not to obtain it at all.
Example 4. A coin is tossed until the first head appears. The elementary
events associated with this experiment are sequences each of whose last element
is head, i.e., {H, T H, T T H, . . .} is the sample space. If we assign probabiilties
to these outcomes according to the rule
P (T T . . . T H) =
1
, i = 1, 2, . . .
2i
23
where i number of tosses, then show that it denotes indeed a probability measure
and moreoer, the probability that the first head will appear in an even-numbered
flip of the coin is 1/3.
Solution : The sample space is countably infinite. Since is infinite,
X
1X 1
1/2
1
=
=
P () =
=1
i
i1
2
2
2
1
12
i=1
i=1
If one writes Ai to denote the event T T . . . T H, then the required probability

of having the first head in an even-numbered flip of the coin is given by
p=
1
1
1/22
1
1
+
+
+
.
.
.
ad
inf
=
=
1
2
4
6
2
2
2
3
1 22
Example 5. In a game of bridge, find the probability that each of the four
players will hold an ace.
Solution : A brige deck of 52 cards can be partitioned into four hands of 13
cards each in 52!/(13!)4 ways. The four aces can be distributed in 4! ways among
four contestants. The remaining 48 cards cna be partitioned in four groups with
12 apiece in
48!
(12!)4
ways. Combining, we get that the number of favourable cases
48!
(i.e, where each player holds an ace) is 4! (12!)
4 so that the required probability
is
(13!)4
2197
4!48!
=
0.105
4
(12!)
52!
20824
Example 6. Find the probability that a 3-digit natural number is divisible by

either 6 or 8.
Solution : 3 digit natural numbers run from 100 to 999 and hence their count
is (999 99) = 900. As worked out in example (6) following Inclusion-Exclusion
Principle of chapter 1, the total count of 3-digit natural numbers that are divisible by either 6 or 8 is 225. Hence by classical definition,
Required probability =
225
1
=
900
4
The same result could have been obtained also from the addition rule of probabiilty, viz.,
P (A B) = P (A) + P (B) P (A B)
24
where
A = the set of integers that are divisible by 6.
B = the set of integers that are divisible by 8.
A B = the set of integers that are divisible by both 6 and 8, i.e., divisible
by their l.c.m. 24.
From the workout of chapter 1, it follows that P (A) =
P (A B) =
37
900
and hence by addition rules P (A B) =
1
6;
1
4.
P (B) =
28
225
and
Example 7. The co-efficients a, b, c of the quadratic equation ax2 + bx + c = 0

are determined by throwing three fair dice simultaneously. Find the probabilities
that the roots are real and distinct.
Solution : Reality and distinctness of the roots of the quadratic is given by condition b2 4ac > 0, i.e., ac <
b2
4 .
Each of a, b, c varies over the set {1, 2, . . . , 6}.
Suppose that we are writing out the sample space for two dice one corresponding to a and the other corresponding to c:
(1, 1)
(1, 2)
(1, 3)
(1, 4)
(1, 5)
(1, 6)
(2, 1)
(2, 2)
(2, 3)
(2, 4)
(2, 5)
(2, 6)
(3, 1)
(3, 2)
(3, 3)
(3, 4)
(3, 5)
(3, 6)
(4, 1)
(4, 2)
(4, 3)
(4, 4)
(4, 5)
(4, 6)
(5, 1)
(5, 2)
(5, 3)
(5, 4)
(5, 5)
(5, 6)
(6, 1)
(6, 2)
(6, 3)
(6, 4)
(6, 5)
(6, 6)
Lets combine it with different values of b and enquire whether the condition
ac < b2 /4 of real & distinct rotos is satisfied or not.
Case I : b = 1 and so ac < 14 . There exists no admissible entry in the table to
satisfy it.
Case II : b = 2 and hence ac < 1. Here also no admissible pair exist.
Case III : b = 3, yielding ac < 9/4. Here the admissible ordered pairs are
(1, 1), (1, 2) and (2, 1).
Case IV : b = 4, so that ac < 4. Here number of admissible pairs is 6, i.e.,
(1, 1), (1, 2), (1, 3), (2, 1), (2, 3), (3, 1).
Case V : b = 5, making ac <
25
4 .
Here number of admissible pairs is 14, i.e,
(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6), (2, 1), (2, 2), (2, 3), (3, 1), (3, 2), (4.1), (5.1),
25
(6.1).
Case VI : b = 6, making ac < 9. Here in addition to case (V), the ordered pairs
(2, 4) & (4.2) are to be enlisted, making the count 16.
For convenience, the favourable simple events are enlisted :
b
Total
# cases
14
16
39
Required probability =
39
63
13
72 .
Example 8. Let A and B be two random events. Find the probability of

occurrence of exactly one of them.
Solution : We observe that by the given condition, if A occurs, B should not
occur and if B occurs, A should not occur. Hence the possible event is the
union of (A B c ) and (Ac B). We require P ((A B c ) (Ac B)). Note that
(A B c ) (Ac B) = (A Ac ) (B B c ) = and so by Kolmogorovs axiom
of countable additivity,
P ((A B c ) (Ac B))
= P (A B c ) + P (Ac B)
= P (A A B) + P (B A B)
= P (A) + P (B) 2P (A B)
Remark : If we are to compute the probability of non-occurrence of a set of

events A, B, C, . . ., that is, if we are interested in P (Ac B c . . .), then we
compute the probability of occurrence of at least one and vice versa.
Example 9. In the Annual Examination of a college, 50% of the students passed
in Mathematics, 40% in Physics and 30% in Chemistry, 35% in Mathematics as
well as Physics, 20% in Physics as well as Chemistry while 15% of the students
passed in all three subjects. If 25% of the students fail in all three subjects, find
the probability that a student selected at random passed in Chemistry as well
as Mathematics.
Solution : Define the following events :
A : Event that a student passes in Mathematics.
B : Event that a student passes in Physics.
C : Event that a student passes in Chemistry.
26
Therefore the information supplied reads :
P (A) =
1
2
3
7
; P (B) = ; P (C) =
P (A B) =
;
2
5
10
20
3
1
1
; P (A B C) =
; P (Ac B c C c ) =
5
20
4
1
3
c
c
c
P (A B C) = 1 P (A B C ) = 1 =
4
4
Again by addition rules
P (B C) =
P (A B C)
= P (A) + P (B) + P (C) P (A B) P (A C)

P (B C) + P (A B C)
P (A C)
= P (A) + P (B) + P (C) P (A B) P (B C)

+P (A B C) P (A B C)
1 2
3
7
1
3
3
1
=
+ +
+
=
4 5 10 20 5 20 4
20
Thus the probability that a student selected at random passed in Chemistry as

well as Mathematics is 0.05.
Example 10. A bakery produces 80 pieces of loaves daily. 10 of them are
underweight. An inspector weighs 5 loaves at random. What is the probability
that an underweight loaf be discovered ?
Solution : 5 pieces of loaves can be chosen in as many as
80
C5 ways. This
is the total number of possible samples of size 5 that can be checked by the
inspector. We are to find the probability of detecting an underweight loaf. Out
of the 70 loaves that are either of correct weight or of overweight, number of
possible samples of size 5 is
70
C5 . In other words, number of samples of five,
that include no underweight loaf is
70
C5 .
Therefore, P (at least one underweight loaf) = 1
70
C5
80 C
0.497.
Example 11. An urn contains n = n1 + n2 + . . . + nr balls, of which n1 is of

type / colour 1, n2 is of type / colour 2, . . . , nr of the rth type / colour. A ball
is blindly drawn from the urn. (i) What is the probability that the ball is of
kth colour ? (ii) If i balls are drawn, find the probability that among these, i1
balls are of first type / colour, i2 balls are of second type / colour , . . . , ir balls
are of rth type or colour.
Solution : (i) The balls of the same color are assumed to be distinguishable among
themselves and hence when one is drawn from the urn, the total number
27
of points in the sample space is n of which n1 is of colour 1, n2 of colour
2, . . . , nr of colour r. Therefore the probability of drawing a ball of the
kth colour is
nk
n , (k
= 1, 2, . . . , r).
(ii) Number of possible ways in which n balls of the urn can be partitioned
into r groups (on the basis of colour difference) with n1 balls of colour
1, n2 balls of colour 2, . . . , nr balls of color r is
n!
n1 !n2 !...nr !
(c.f. rule IV,
Permutations and Combinations, Chapter 1).

Number of balls selected is i - of which i1 balls are of colour 1, i2 balls of colour
2, . . . , ir balls of colour r - using the above rule we get that the total number of
ways in which the selected balls are partitioned on the basis of colour difference
is
i!
i1 !i2 !...ir ! .
In the same vein, the total number of ways in which the unselected (n i) balls
can be colourwise partitioned into r groups, of which (n1 i1 ) balls are of colour
1, (n2 i2 ) balls of colour 2, . . ., (nr ir ) balls of colour r is
(n i)!
(n1 i1 )!(n2 i2 )! . . . (nr ir )!
The required probability is p, where
p
=
=
(n i)!
n!
i!
/
i1 !i2! .. . ir! (n
i
)!(n
i
)!
.
.
.
(n
i
)!
n
!n
2
2
r
r
1 2 ! . . . nr !
1 1
n1
n2
nr
i1
i2 . . . i r

n
i
(Application) : Finding the probability of having 4 clubs, 2 diamonds, 3 spades

and 4 hearts in a bridge hand.
Here n = 52; i1 (# clubs) = 4; i2 (# diamonds) = 2;
i3 (# spades) = 3; i4 (# hearts) = 4;
13 13 13
13
p=
2
52
13
3
, (observe n1 = n2 = n3 = n4 = 13)
Example 12. Two numbers i and j are chosen without replacement from the
set {1, 2, . . . , n}. Find the probability that |i j| m, a preassigned natural
number.
Solution : Since the numbers i and j are chosen without replacement from
the set {1, 2, . . . , n}, the total number of elementary events, i.e., ordered pairs
28
(i, j) with i 6= j and i, j = 1, 2, . . . , n is n P2 = n(n 1). These elementary
events can be arranged in the form of an n n matrix with its main diagonal
removed (see the annexing figure). Our aim is to find the cardinality of the set
A = {|i j| m : i 6= j and i, j = 1, 2, . . . , n}.
(1, 1)
(1, 2)
(1, 3)
...
(1, n)
(2, 1)
(2, 2)
(2, 3)
...
(2, n)
(3, 1)
..
.
(3, 2)
..
.
(3, 3)
..
.
...
..
.
(3, n)
..
.
(n, 1)
(n, 2)
(n, 3)
. . . (n, n)
From this presentation it is clear that due to symmetry of the modulus function,
working with the upper triangular matrix is sufficient. In general, the number
of elements appearing in the diagonal parallel to the main one and lying at a
distance r units from it is (nr). In the upper triangular matrix, the admissible
elementary events appear in the parallel diagonals lying at distances m, (m +
1), . . . , (n 1) from the main diagonal - making the total number of favourable
n1
X
elementary events in the upper triangular matrix
(n r). Due to symmetry,
r=m
the picture is exactly the same for the lower triangular matrix. Hence total
n1
X
(n r) = (n m)(n m + 1).
number of favourable elementary events is 2
r=m
So cardinality of A = (n m)(n m + 1) is required probability is

p=
n m)(n m + 1)
n(n 1)
Example 13. Find the probability of obtaining a given sum of points with n
dice ?
Solution : There are as many as 6n elementary events associated with the n
dice. The number of favourable cases is the same as the total number of integer
solutions of the Diophantine equation x1 + x2 + . . . + xn = s, where xk s are
integers ranging from 1 to 6. This number can be defined by the following device.
Multiplying the polynomial (x + x2 + . . . + x6 ) by itself, the product will consist
of terms of the form xx1 +x2 , where x1 and x2 independently assume all integral
values from 1 to 6. Grouping all the terms with the same exponent s, the coefficient of xs will yield the number of solutions of the equation x1 + x2 = s;
x1 , x2 being subject to the condition 1 xi 6. Similarly, multiplying the
29
polynomial (x + x2 + . . . + x6 ) thrice by itself and grouping all the terms with
the same exponent s, the co-efficient of xs will yield the number of admissible
integer solutions of x1 + x2 + x3 = s. In general, the total number of admissible
integer solutions ofthe equation x1 + x2 + . . . + xn = s is the co-efficient of
xs in the expansion of (x + x2 + . . . + xs )n . (Incidentally, in the language of
combinatorics, this multinomial is called the generating function).
(x + x2 + . . . + x6 )n
= xn (1 + x + . . . + x5 )n =
xn (1 x6 )n
(1 x)n
= xn (1 x6 )n (1 x)n

n

X
X
n
n+k1
n
j
6j
= x .
(1)
x .
xk
j
k
j=0
k=0
Using the product of series we observe that the co-efficient of xs is

sn
[X
6 ]
n s 6r 1
(1)r
,
r
n1
r=0
where
sn
6
is the greatest integer less than or equal to
sn
6 .
This being the number of elementary events, the required probability is given
by the classical definition as
sn
[X
6 ]
s 6r 1
n
r n
p=6
(1)
r
n1
r=0
Remark : (i) The above proof is modeled in the shade of Uspenskys Mathematical Probability. The above problem is known as De Moivres problem.
(ii) In the special circumstances, where we are to find the integer-valued solutions of the Diophantine equation x1 + x2 + . . . + xn = s under the constraint
1 xk 6 k = 1, 2, . . . , n and 6(n 1) < s < 6n, we can bypass the above
approach of generating functions and use tricky combinatorial results as follows
Introduce new variables yk = 6 xk k = 1, 2, . . . , n given equation yields a new
n
X
equation, viz., y1 + y2 + . . . + yn =
(6 xk ) = 6n s. The original problem
k=1
is therefore equivalent to finding the number of non-negative integer solutions

of the new equation. From the combinatorial results quoted in Chapter 1, it
follows that this requisite solution-count is

6n s 1 + n
7n s 1
=
n1
n1
30
Hence the required probability is 6n
7ns1
n1
Example 14. What is the probability of obtaining 14 with 3 dice ?

Solution : Method I. It is equivalent to finding the integer-valued solution of
the equation x1 + x2 + x3 = 14. Comparing with the theoretical set one finds

that n = 3; s = 14 and so sn
= 11
= 1. The required probability is
6
6
therefore
p=6
1
X

3
13 6r
3
13
3
7
3
(1)
.
=6
= 5/72
r
2
0
2
1
2
r=0
r
Method II. With the same set of notations it follows that (12 < s < 18) the
required probability is
p = 63
21 14 1
31
= 63

5
6
15
=
=
216
72
2
Example 15. From the urn containing N1 white and N2 black balls (N = N1 +
N2 ), balls are drawn successively without replacement. What is the probability
that the first black ball will be preceded by i white balls ?
Solution : Since balls are being drawn without replacement, the stock of balls
in the urn go on decreasing by one at each step. The first white ball can be
anyone of N1 whites present in the urn, the second white ball can be anyone of
the remaining (N1 1) whites in the urn and so on. Thus the first i whites can
be drawn out of the urn in as many as N1 (N1 1) . . . (N1 i + 1) ways. The
ball drawn at the next step is mandatorily black and it can be any one of the
N2 blacks present in the urn. Now there are (N i 1) balls left in the urn
and there is no a priori restriction over their draws. Hence they can be chosen
one-by-one in (N i 1)! ways. Combining all this, we have the total number of
favourable ways of draw as N1 (N1 1) . . . (N1 i + 1)N2 (N i 1)!. The total
number of ways in which all the balls can be drawn one-by-one, irrespective of
their colour, is N !. This gives us the required probability
p=
N1 (N1 1) . . . (N1 i + 1)N2 (N i 1)!

N2 N1 (N1 1) . . . (N1 i + 1)
=
N!
N (N 1) . . . (N i)
Example 16. Prove the following inequality due to Bonferroni :

X
n
n
X
n
P Ai
P (Ai )
P (Ai Aj )
i=1
i=1
i,j=1
i<j
(1)
31
Solution : For n = 2 this result is satisfied by addition rule, since
P (A1 A2 ) = P (A1 ) + P (A2 ) P (A1 A2 )
For n = k, we presume this inequality to hold true.

P
Ai
i=1
k
X
P (Ai )
i=1
k
X
P (Ai Aj )
i,j=1
i<j
Again

P
k+1
Ai
i=1
( Ai ) Ak+1
i=1
k
k
= P Ai + P (Ak+1 ) P (Ak+1 ( Ai ))
i=1
i=1

k+1
k
X
X
k
P (Ai )
P (Ai Aj ) P (Ai Ak+1 )
= P
i=1
k+1
X
P (Ai )
i=1
k+1
X
i=1
i<j
i,j=1
k
X
P (Ai Aj )
i=1
k+1
X
P (Ai Ak+1 )
i=1
i,j=1
i<j
P (Ai )
k
X
P (Ai Aj ),
i,j=1
i<j
proving the validity of the proposition for n = (k + 1). Note that in the penultimate step we made use of Booles inequality. Hence by principle of induction,
the result is true for any finite n.
Remark : In the same vein one may also prove that

X
n
n
n
X
X
n
P (Ai Aj ) +
P (Ai Aj Ak )
P Ai
P (Ai )
i=1
i=1
i,j=1
i<j
(2)
i,j,k=1
i<j<k
provided we make use of Booles inequality and Bonferronis inequality proved

in the above example.
Observe that while Booles inequality provides us with an upper bound of the
probability of occurrence of at least one event, Bonferronis inequality (1)
provides us with a lower bound for the same. Further, note that inequality (2)
provides us with a better upper bound for the probability of occurrence of at
least one event.
It is rather interesting to observe that if we retain only an odd number of terms
in the expression of the extended addition rule, what we get works as an upper
32
n
bound for P ( Ai ). On the other hand, had we retained an even number of

i=1
terms in the expression of the same addition rule, what we get works as a lower
n
bound for P ( Ai ).
i=1
2.5
Conditional Probability
Let (, F, P ()) be a probability space associated with a random experiment

E. Let A be an event which is non-trivial (i.e, A is neither nor ) and not
stochastically impossible, i.e., P (A) > 0. Define a new set function PA () on F
by the rule
PA (B) =
P (A B)
, with P (A) > 0 and B F.
P (A)
We observe the following features of P (A)() :

(a) PA (B) 0 B F (follows from non-negativity of P ())
(b) PA (A) =
P (AA)
P (A)
= 1 (ensures that PA () is a restriction map on F as it
restricts the upcoming sample space to A).

(c) If {Bn /n = 1, 2, . . .} be a pairwise mutually exclusive sequence of events
associated with random experiment E, i.e., members of F, then Bi Bj =
for i 6= j and so
(A Bi ) (A Bj ) = A (Bi Bj ) = A = ,
for i 6= j.
By definition
PA ( Bn )
n=1

P A ( Bn )
n=1
P (A)
P (A Bn )
n=1
P (A)

(A Bn )
n=1
P (A)
PA (Bn ),
n=1
(where we used Kolmogorovs axiom of countable additivity valid for the original
probability measure P ()).
This restriction (c) validates the countable additivity of the new probability
measure PA (). From the above derivations it is clear that B F. (The advanced reader may look into this aspect as similar to relative topology). However, one should keep in mind that given a probability space (, F, P ()), many
33
conditional probability spaces (, F, PA ()) - one differing from the other on
the basis of the chosen A F. This is why usually we refer to PA (B) (more
conventionally P (B|A)) as the conditional probability of an event B on the
hypothesis that another event A has occurred.
Empirical Interpretation :
The interpretation of this new mathematical entity conditional probability
can be furnished as follows. For a long sequence of n repetitions of the random
experiment E under uniform conditions, there is a subsequence n(A) in which
the event A occurs and among these n(A) repetitions, the event B occurs (together with A) in n(A B) instances. The ratio n(A B)/n(A) is known as
the conditional frequency ratio of B on the hypothesis that A has already
occurred and denoted by fA (B). Assuming that lim fA (B) exists, we refer to
n
this sequential limit as the conditional probability of B on the hypothesis that

A has occurred and denote it by PA (B) or P (B|A).
PA (B) P (B|A)
lim fA (B)
n(A B) n
n(A B)
= lim
.
= lim
n
n
n(A)
n
n(A)
n(A)
n(A B)
/ lim
= lim
n
n
n
n
n(A B)
=
, provided P (A) > 0
P (A)
n
Note that the definition of conditional probability directly gives rise to the
multiplication rule in probability theory :
P (A B) = P (A).PA (B) = P (A).P (B|A)
An Extension result : If A1 , A2 , . . . , An be events in F such that P (Ai ) > 0,
n
Ai Aj 6= and = Aj , then for any B F we have

i=1
P (B) =
n
X
j=1
P (Aj )PAj (B) =
n
X
P (Aj )P (B|Aj )
j=1
Remarks : (a) Conditional probability was introduced from the mathematical

viewpoint of a new probability measure and this has been endowed with an
empirical interpretation. However, it remains translucent what compelled us to
introduce it in the real world framework. The relevant answer may be briefed
34
up as follows. It often happens that the sample space for an experiment needs
to be altered to take into account the availability of some limited information
about the outcomes of the random experiment. Such an information may well
eliminate certain outcomes as impossible which were otherwise (i.e., without
the information) possible, and in such a case either the appropriate or reduced
sample space omit these outcomes or the probabilities (new) assigned to them
in the revised model would be zeros.
(b) Conditional probability gives rise to independence and dependence of
events - a feature that stamps probability measure with a special status over
other measure theoretic treatments.
(c) The definition of conditional probability is often useful in the form of a
multiplication law :
P (A B)
= P (A).P (B|A), provided P (A) > 0
= P (B).P (A|B), provided P (B) > 0

The multiplication law states that the probability of joint occurrence of two
events is equal to the product of probabiilty of one of the events and the conditional probability of the other provided the first event has occurred. The
multiplication law is applicable even in case one of the events A and B happens to be stochastically impossible because P (A) = 0 or P (B) = 0 implies
P (A B) = 0 (due to monotonicity of P ()). This seems very paradoxical and
rather queer.
Generalisation of the Multiplication Rule : For arbitrary evetns A1 , A2 , . . . , An
associated with a random experiment E, show that
P (A1 A2 . . .An ) = P (A1 )P (A2 |A1 )P (A3 |A1 A2 ) . . . P (An |A1 A2 . . .An1 ),
provided P (A1 A2 . . . An1 ) 6= 0.
As for any events A1 , A2 , . . . , An , (A1 A2 . . . An1 ) is a subset of A1 A2
. . . An2 , due to monotonicity of probability measure,
P (A1 A2 . . . An1 ) P (A1 A2 . . . An2 )
and so P (A1 A2 . . . An2 ) > 0 (why ?). This positivity is propagated
through all predecessors, viz., P (A1 A2 . . . An3 ), . . . , P (A1 ) ensuring that
all the given conditional probabilities are well-defined.
35
This generalised multiplication rule can be deduced by method of induction.
For n = k, suppose the given proposition holds true, i.e.,
P (A1 A2 . . . Ak )
P (A1 A2 . . . Ak Ak+1 )
= P (A1 )P (A2 |A1 ) . . . P (Ak |A1 A2 . . . Ak1 )

= P ((A1 A2 . . . Ak ) Ak+1 )
= P (Ak+1 |A1 A2 . . . Ak )P (A1 A2 . . . Ak )
= P (Ak+1 |A1 A2 . . . Ak )P (A1 )P (A2 |A1 )
. . . P (Ak |A1 A2 . . . Ak1 )
(Here we made use in the penultimate step the fact that by definition, the
multiplication rule is true for n = 2).
Thus on the basis of the supposition, we have asserted that the generalised
multiplication rule is true for n = k +1. By Principle of Mathematical Induction
the result is found to be valid for all finite n 2.
Useful Corollary :
P ((B1 B2 )|A) = P (B1 |A) + P (B2 |A) P ((B1 B2 )|A)
The proof starts from ab initio : L.H.S.
L.H.S.
= P (B1 B2 |A)
P ((B1 A) (B2 A))
=
P (A)
P (B1 A) + P (B2 A) P (B1 B2 A)
=
P (A)
P (B1 A) P (B2 A) P (B1 B2 A)
=
+
= R.H.S.
P (A)
P (A)
P (A)
Bayes Theorem - an Application of the Extension Result

There are various situations in which the outcome of an experiment depends
on what happens at intermediate stages. The following is a simple illustration
where the effect is unique but there are various alternative underlying causes.
Suppose a car met with an accident - there were four possible reasons behind
it - viz., the engine was faulty, the driver was drunken, the weather was foul,
the road-condition was poor. Thus branching of the causes and allocation of
a non-zero probability is a must. If in general, there are n alternative causes
A1 , A2 , . . . , An that all lead to the effect B, then extension rule of conditional
P (A1 A2 . . . An1 ) P (A1 A2 . . . An2 )
36
and so P (A1 A2 . . . An2 ) > 0 (why ?). This positivity is propagated
through all predecessors viz., P (A1 A2 . . . An3 ), . . . , P (A1 ) ensuring
that all the given conditional probabilities are well-defined.This multiplication
rule can be deduced by method of induction : For n =k, suppose the given
proposition holds true.
P (A1 A2 . . . Ak ) = P (A1 )P (A2 |A1 ) . . . P (Ak |A1 A2 . . . Ak1 )
P (A1 A2 . . . Ak Ak+1 )
= P ((A1 A2 . . . Ak ) Ak+1 )
= P (Ak+1 |A1 A2 . . . Ak )P (A1 A2 . . . Ak )
= P (Ak+1 |A1 A2 . . . Ak )P (A1 ).P (A2 |A1 )
. . . P (Ak |A1 A2 . . . Ak1 ).
(Here we made use in the penultimate step the fact that by definition, the
multiplication rule is true for n = 2).
Thus on the basis of supposition, we have asserted that the generalised
multiplication rule is true for n = k +1. By Principle of Mathematical Induction
the result is found to be valid for all finite n 2.
Useful corollary :
P (B1 B2 |A) = P (B1 |A) + P (B2 |A) P ((B1 B2 )|A)
The proof starts from abinitio :
L.H.S.
= P (B1 B2 |A)
P ((B1 A) (B2 A))
=
P (A)
P (B1 A) + P (B2 A) P (B1 B2 A)
=
P (A)
P (B1 A) P (B2 A) P (B1 B2 A)
=
+
P (A)
P (A)
P (A)
= R.H.S
Bayes Theorem - an Application of the Extension Result

There are various situations in which the outcome of an experiment depends
on what happens at intermediate stages. The following is a simple illustration
where the effect is unique but there are various alternative underlying causes.
Suppose a car met with an accident - there were four possible reasons behind
it viz., the engine was faulty, the driver was drunken, the weatehr was foul,
the road-condition was poor. Thus branching of the causes and allocation of
37
a non-zero probability is a must. If in general, there are n alternative causes
A1 , A2 , . . . , An that all lead to the effect B, then extension rule of conditional
probability comes into play. In this context we refer to the extension rule as
Theorem of Total Probability.
If the events A1 , A2 , . . . , An , F constitute a partition of the sample space
and P (Aj > 0) j = 1, 2, . . . , n, then for any event B F we have
P (B) =
n
X
P (Aj )P (B|Aj )
j=1
In the language of probability theory one often says that Aj s are n mutually
exclusive and exhaustive causing events and B is the effect. The quantity P (Aj )
is called the Priori Probability of the jth cause Aj while the quantity P (B|Aj )
is called the conditional probability of the effect B due to jth cause Aj . The
tree-diagram shows the case in the figure.
Figure 2.3: Tree diagram of the causes and effect.
Bayes rule may be deemed as focusing the reverse problem : Given that the
effect B has taken place, what is the contribution (as far as probability is concerned) of the jth cause Aj ? The symbolic answer is P (Aj |B), which is the
a posteriori probability of the jth cause Aj . Bayes theorem relates the a
priori and the a posteriori probabilities of the causes :
If A1 , A2 , . . . , An constitute a partition of the sample space and P (Aj ) >
0 j = 1, 2, . . . , n, then for any event B F which is not stochastically impossible,
P (Ak |B) =
P (Ak ).P (B|Ak )

, k = 1, 2, . . . , n.
n
X
P (Ak )P (B|Ak )
k=1
Remark : (a) This result is extensible for any countably infinite number of
events A1 , A2 , . . . , An , . . . each of which has a positive probability and which
38
exhaust . If B F with P (B) > 0, then
P (Ak |B) =
P (Ak )P (B|Ak )
, k = 1, 2, . . . , .
X
P (Ak )P (B|Ak )
k=1
Proof :
P (Ak B)
= P (B).P (Ak |B) = P (Ak ).P (B|Ak )
P (Ak |B)
P (Ak ).P (B|Ak )

=
P (B)
P (Ak ).P (B|Ak )

(using Kolmogorovs axiom of countable additivity)
X
P (Ak B)
P (Ak ).P (B|Ak )

P
Ak B
k=1
k=1
P (Ak ).P (B|Ak )
X
P (Ak ).P (B|Ak )
k=1
For the proof of the original theorem, the steps are same in letter and spirit.
(b) The formula reasseses apriori probabiilties in terms of aposterriori probabilities and has close resemblance with the standard rule for a change of base in
logarithms. The Bayes rule is of great importance in Decision Theory, specially
Bayesian Estimating that is beyond the scope of the present text.
The situation of partitioned sample space so that exactly one event occurs can
be picturised as follows:
Figure 2.4: Partitional sample space with =
Sn
k=1
Ak and B is an effect to
whose occurrence all the causes Ak , k = 1, 2, . . . , n have possible roles.
(c) P (B|Aj )s are also known as likelihoods :
39
2.6
Stochastic or Statistical Independence of Events
Informally speaking, two events A and B associated with the same random
experiment E are independent if the occurrence or non-occurrence of either
one does not affect the probability of occurrence of the other. In mathematical
language, A is independent of B if the conditional probability P (A|B) equals the
unconditional probability P (A), i.e., if P (A) = P (A|B). If this is so and P (A) >
0, then according to Bayes rule, P (B|A) =
P (A).P (A|B)
P (A)
= P (A|B) = P (A),
indicating that the conditional probability P (B|A) equals the unconditional

probability P (B). The multiplication rule therefore produces the result P (A
B) = P (A).P (B).
From this it follows that for events with positive probability, independence of A
and B is characterised by the fact that the probability of their joint occurrence
is equal to product of probabilities of individual occurrences. As per this formulation, the stochastically impossible events are statistically independent of
any other event.
For the purpose of generalisation, we may call the above mentioned independence of events as independence at level-2. Three events A, B, C are said to
have level-3 independence if P (A B C) = P (A).P (B).P (C) holds true.
Three events A, B, C are said to be mutually independent if A, B, C are pairwise independent and moreover enjoy level-3 independence. In symbols A, B
and C are said to be mutually independent if
P (A B) = P (A).P (B); P (B C) = P (B).P (C); P (C A) = P (C).P (A)
and
P (A B C) = P (A).P (B).P (C).
Observe that there are four equations altogether to define mutual independence
of 3 events.
In general, n events A1 , A2 , . . . , An (n > 2) are said to be mutually independent
if
P (Ai Aj ) = P (Ai )P (Aj ),
where i < j; i, j being any combination of 1, 2, . . . , n taken two at a time.
P (Ai Aj Ak ) = P (Ai )P (Aj )P (Ak ), where i < j < k; i, j, k being any
40
combination of 1, 2, . . . , n taken three at a time.
..................
P (A1 A2 . . . An ) = P (A1 )P (A2 ) . . . P (An )

The constraint relations are n2 + n3 + . . . + nn = 2n n 1 in number.
Trivial observation : Mutual independence of events implies pairwise independence of events. However, by the following counter example we show that
pairwise independence of events does not necessarily imply mutual independence
of events.
Let the equally likely outcomes of an experiment be one of the four points in R3
with Cartesian co-ordinates (1, 0, 0), (0, 1, 0), (0, 0, 1) and (1, 1, 1). Let A, B, C
denote the following events :
A : event that x co-ordinate is 1;
B : event that y co-ordinate is 1;
C : event that z co-ordinate is 1.

By Laplaces classical definition it follows that P (A) = P (B) = P (C) =
P (A B) =
1
4
= P (A).P (B); P (B C) =
P (C).P (A). However P (ABC) =
1
4
1
4
1
2
= P (B).P (C); P (C A) =
and
1
4
and so P (ABC) 6= P (A).P (B).P (C).
This indicates that pairwise independence (level-2 independence) is weaker than

mutual independence of events in general.
The relation P (A B C) = P (A).P (B).P (C) does not always indicate mutual
independence of events A, B, C because a level-3 independence does not always
ensure level-2 independence. The following illustration focuses this point.
Let two dice be tossed. Obviously the sample space associated with it is
the set of all admissible ordered pairs (i, j); i, j = 1(1)6. Each ordered pair is
equally likely and has probability 1/36. Consider now the events :
A
= { first die turns up 1, 2, or 3}
= { first die turns up 3, 4, or 5}
= { sum of two unturned faces is 9}.
Obviously we have
AB
= {(3, 1), (3, 2), (3, 3)(3, 4), (3, 5), (3, 6)};
AC
= {(3, 6)};
BC
= {(3, 6), (4, 5), (5, 4)}
ABC
= {(3, 6)}
41
1
1
1
; P (B) = ; P (C) =
2
2
9
1
1 1 1
P (A B C) =
= . . = P (A).P (B).P (C),
36
2 2 9
indicating that the events A, B, C have level-3 independence.
P (A) =
Nevertheless the events A, B, C has no level-2 independence because

P (A B) =
1
1
6= = P (A).P (B)
6
4
1
1
6=
= P (A).P (C)
36
18
1
1
6=
= P (B).P (C)
P (B C) =
12
18
Hence we draw the conclusion that level-3 independence does not imply level-2
P (A C) =
independence of random events.
Figure 2.5: Venn diagram for the example illustrating level-3 independence not
implying level-2 independence.
Another important result regarding independence of events states that pairwise

independence is non-transitive, i.e., independence of A and B together with
independence of A and C does not imply that B is independent of C.
As seen from the diagram,
P (A) = 0.6; P (B) = 0.8; P (C) = 0.5
P (A B) = 0.48 = (0.6)(0.8) = P (A)P (B)
P (A C) = 0.30 = (0.6)(0.5) = P (A)P (C)
42
Figure 2.6: Venn diagram representing the non-transitivity of pairwise independence of events.
P (B C) = 0.38 6= P (B).P (C)

Remark : (a) Disjoint events are not independent unless at least one of them is
stochastically impossible. In fact two events having nonzero probabilities cannot
be mutually exclusive and independent.
(b) The pairwise or mutual dependence is always meant with respect to the
probability measure P () in the probability space (, F, P ()).
(c) There are other concepts, viz., independent random experiments and conditional independence of events to which we shall launch in Chapter 3 on
Compound Experiments.
2.7
Worked Examples on Conditional Probability and Independence
Example 17. I manufacturing automobile spare parts, machine A averages 5%

defectives, while machine B averages 3% and machine C 2% defectives. If a
single spare parts seleted at random from a tumbler containing 100 products of
machine A, 80 products of machine B and 70 products of machine C, is found
to be defective, what is the probability that it was a machine-A-product ?
Solution : In the tumbler, total number of spare parts being (100 + 80 + 70),
i.e., 250, P (A) = 52 ; P (B) =
8
25 ;
P (C) =
7
25 .
Declaring D as the event that a
43
product is defective, we have
P (D|A) =
5
1
3
1
=
; P (D|B) =
and P (D|C) =
.
100
20
100
50
By theorem of total probability it follows that

P (D) = P (D|A).P (A)+P (D|B).P (B)+P (D|C).P (C) =
2 1
8
3
7
1
44
+
+
=
.
5 20 25 100 25 50
1250
We are to find the conditional probability P (A|D), (i.e., knwon that product
selected is random, probability of its being produced by machine A. To this
end, we have Bayes rule;
P (A|D) =
1/50
P (A).P (D|A)
25
= 44 =
P (D)
44
1250
Example 18. Urn 1 contains one white and two black marbles. Urn 2 contains
one black and two white marbles while Urn 3 contains three black and three
white marbles. A die is rolled. If the die show up 1,2 or 3, urn 1 is selected; if
the die shows up 4, urn 2 is selected; if the die show up 5 or 6, urn 3 is selected.
A marble is then drawn at rndom from the selected urn. Let A be the event
that the marble drawn is white. If U, V, W respectively denote the events that
the urn selected is 1, 2, 3, find the probability P (V |A).
Solution : We have
P (A U )
P (U ).P (A|U ) =
3 1
1
. =
6 3
6
P (A V )
= P (V ).P (A|V ) =
1 2
1
. =
6 3
9
P (A W )
2 3
1
= P (W ).P (A|W ) = . =
6 6
6
Since A is the disjoint union of A U, A V and A W , P (A) =
1
6
+ 19 + 16 = 49 .
By Bayes theorem, required probability

P (V |A)
=
=
P (V ).P (A|V )
P (U ).P (A|U ) + P (V ).P (A|V ) + P (W ).P (A|W )
1
1 2
1
6.3
9
3 1
1 2
2 3 = 4 = 4.
.
+
.
+
.
6 3
6 3
6 6
9
Example 19. A box contains three coins, two of them are fair and one twoheaded. A coin is selected at random and tossed. If heads appear, the coin is
tossed again; if the tails appear, then another coin is selected at random from
the two remaining coins and tossed.
44
(i) Find the probabilities that heads appear twice.
(ii) If the same coin is tossed thrice, find the probability that it is the twoheaded coin.
(iii) Find the probability that the tails appear twice.
Solution :
A1 denotes the event that the coin I is chosen.
A2 denotes the event that the coin II is chosen.

A3 denotes the event that the coin III is chosen.
H denotes the event that head occurs on the first trial.
T denotes the event that tail occurs on the first trial.

Initially the coins being chosen randomly, P (A1 ) = P (A2 ) = P (A3 ) = 13 .
P (H|A1 ) = P ( head occurs with the first coin) =
1
2
P (H|A2 ) = P ( head occurs with the second coin) =
1
2
P (H|A3 ) = P ( head occurs with the third coin) = 1

Since
P (H|Ak ) + P (T |Ak ) = 1 k = 1, 2, 3,
1
1
; P (T |A2 ) = and P (T |A3 ) = 0
2
2
By Theorem of total probability,
P (T |A1 ) =
P (H)
P (T )
= P (H|A1 )P (A1 ) + P (H|A2 )P (A2 ) + P (H|A3 )P (A3 )

1
2
1 1 1 1
=
. + . + 1. =
2 3 2 3
3
3
2
1
= 1 P (H) = 1 =
3
3
With this set up, lets workout (i), (ii) & (iii)
(i) P (Two heads) = P (HH)
By condition of the problem, as two heads are required, the first trial should be
head and the coin used is retained for second trial.
P (HH)
= P (HH|A1 )P (A1 ) + P (HH|A2 )P (A2 ) + P (HH|A3 )P (A3 )

1 1 1
1
=
+ +1 =
3 4 4
2
45
(ii) P (The coin is biased, given that the same coin were tossed twice) =
P (A3 |H).
Since same coin is tossed twice, head on the first trial is a must as otherwise
the coin, as per the rules, should be left out.
Required probability is
P (A3 |H) =
P (H|A3 ).P (A3 )

P (A3 H)
=
=
P (H)
P (H)
1
3 .1
2
3
1
2
(iii) P (Two tails) = P (T T ) P (T |T ).P (T ) = 31 P (T |T ).

Since tail occurred on the first trial, the coin with which the first trial was made,
should be a fair one, viz., A1 or A2 .
If A1 was the coin with which we did the first trial, the coin of the second trial
is either A2 or A3 .
If A2 was the coin with which we did the first trial, the coin of the second trial
is either A1 and A3 .
Thus in this case
P (A1 ) =
1
1
1
; P (A2 ) = ; P (A3 ) =
4
2
2
P (T |T ) P ( Tail on second trial first trial yields tails)

= P ( TailA1 used on trial II).P (A1 ) + P ( TailA2 used on trial II).
+P ( TailA3 used on trial II).P (A3 )
1 1 1 1
1
1
=
+ +0 =
2 4 2 4
2
4
1 1
1
. =
3 4
12
Example 20. Five letters are selected at random one after another from the 26
P (T T )
letters of the English Alphabet, (a) with replacement (b) without replacement,
Find for each of the cases (a) and (b), the probabilities of the word formed (i)
contains an a (ii) Consists of only vowels (iii) The word is woman.
Solution : Part (a). Left to the reader as an exercise.
Part (b). All the letters being equally likely, P (a in the first draw) =
1
26 .
(i) Left to the reader.

(ii) If a occurs in the five-lettered word, remaining 4 letters should be chosen

from the 25 letters of the residual English alphabet. So 25
4 is the number
46

of favourable cases while 26
5 is the number of total outcomes. This shows
26
that 25
is the required probability.
4 / 5
(iii) For the formulation of word woman, first letter is w, followed by o, m, a
and n in succession.
P (w drawn first) =
1
26
P (o drawn / w were drawn and not replaced in the first drawn) =
1
25 .
P (m being drawn w and o were drawn in the 1st & 2nd draw) without
replacement =
1
24 .
P (a being drawn w, o and m drawn in previous three draws and not

replaced) =
1
23 .
P (n being drawn w,o, m and a drawn previous four draws without

replacement) =
1
22 .
By multiplication rule of compound events

P (Woman)
= P (w)P (o|w)P (m|wo)P (o|wom)P (n|woma)

1 1 1 1 1
1
=
. . . .
=
26 25 24 23 22
(26)5
Example 21. You heard that an old friend of yours has two children, one of
them is a girl, but you do not know whether the other child is a boy or a girl.
How likely is it that the other child is a boy ?
The sample space is = {BB, BG, GB, GG} with uniform probability. (Here
we denote a boy by B and a girl by G). Consider the events X = {BB, BG, GB};
Y = {BG, GB, GG}. Obviously X occurs if at least one child is a boy and Y
occurs if at least one child is a girl. i.e.
P (X|Y ) =
P (X Y )
#{BG, GB}
2
=
= .
P (Y )
#{BG, GB, GG}
3
Example 22. There are a white and b black balls in an urn. Two players draw
one ball each in turn, replacing it each time and stirring the balls of the urn.
The player who first draws a white ball wins the game. Find the probabiilty
that the player who begins the game being the winner.
Solution : The first player can obviously win the game either in the first or in
the third or in any subsequent odd-numbered draw. In the first draw, probability
a
of drawing the white ball is a+b
. In the third draw, probability of drawing the

2

b
a
white ball is a+b
[Observe that the game is extended upto the third
a+b
47
draw only if the first two draws yield black balls and the third draw produces
white ball and moreover these events are statistically independent.] In general,
2m

a
b
the (2m + 1)th draw can yield the white ball with probability a+b
a+b .
Using Kolmogorovs axiom of countable additivity we have

2

2m

a
b
a
b
a
=
+
+ ... +
+ . . . to
a+b
a+b
a+b
a+b
a+b

X
2m

a
a
1
a+b
b
=
=
.

2 =
a + b m=0 a + b
a+b
a + 2b
b
1 a+b

p1
Note : (i) The geometric series is convergent as

(ii) p1 >
1
2
b
a+b
< 1.
obviously, no matter what a and b are. This an indirect evidence
that the game heavily leans on who initates the game. Had there been a toss
to demarcate which of the two players will set the game rolling, it is a foregone
conclusion that in the longrun the starter has a greater probability of winning.
Generalisation of the previous problem is the following Huygens problem that
states :
Example 23. A and B throw alternately a pair of dice in that order. A wins
if the scores a total of 6 points before B scores a total of 7 points, in which case
B wins. If A starts the game what is his probability of winning ?
Figure 2.7: Diagrammatic representation of the outcomes. Cross-marks lying

on the lines denote favourable elementary events.
48
Solution :
p1
q1
p1
q2
P (A)
=
=
X
k=0
5
36
31
= P (A not scoring 6) =
36
6
= P (B scoring 7) =
36
30
= P (B not scoring 7) =
36
= P (A scoring 6) =
P ( winning at the (k + 1)th throw of dice)

q1k .q2k p1 (why ?)
k=0
= p1
=
(q1 q2 )k =
k=0
5
36
30
31
36 36
p1
1 q1 q2
30
61
30
31
=
.
61
61
Remark. If the set up of the problem be such that q2 = q1 then as a corollary
P (B)
1 P (A) = 1
we would get back the previous problem. Actually the gneralisation owes to the
lack of symmetry of this problem.
Example 24. Polyas Urn model : Consider an urn that initially contains r
red balls and b black balls. At each trial, one ball is drawn. It is replaced and
c(> 0) balls of the colour drawn are added to the urn. Let Aj denote the event
that the jth ball drawn is black. Show that P (Aj ) =
b
b+r
for every j = 1, 2, . . ..
If this process is repeated n times, find the probability of a complete run of

n black balls, where Aj denotes the event the ball drawn in the jth step is
black.
Solution :
b
( # balls in the urn = b + r)
b+r
P (A2 ) = P (A2 |A1 ).P (A1 ) + P (A2 |Ac1 )P (Ac1 )( using Theorem of total probability)
b+c
b
b
r
b(b + c) + br
=
.
+
.
=
b+r+c b+r b+r+c b+r
(b + r)(b + r + c)
b(b + r + c)
b
=
=
(b + r)(b + r + c)
b+r
P (A1 )
(Here we used the fact if A1 occurred, 1st ball drawn is black and hence at
the onset of second draw, urn contains r red and (b + c) black balls so that
49
P (A2 |A1 ) =
b+c
r+b+c .
If on the otherhand, Ac1 occurred, 1st ball drawn is red and
so at the onset of the second draw, urn contains (r + c) red and b black balls so
that P (A2 |Ac1 ) =
b
b+r+c ).
By principle of mathematical induction, (details of which is skipped here) we

have the immediate conclusion P (Aj ) =
b
b+r
for all finite js.
To compute the second part, we need multiplication rule of conditional probability :

P (A1 A2 . . .An ) = P (A1 ).P (A2 |A1 )P (A3 |A1 A2 ) . . . P (An |A1 A2 . . .An1 )
Observe that
P (A1 A2 )
= P (A2 |A1 )P (A1 )

b+c
b
=
.
r+b+c
r+b
and so,
P (A3 |A1 A2 ) =
b + 2c
( why ?)
r + b + 2c
P (An |A1 A2 . . . An1 ) =

P (A1 A2 . . . An ) =
b + (n 1)c
r + b + (n 1)c
b(b + c)(b + 2c) . . . [b + (n 1)c]

(r + b)(r + b + c) . . . [r + b + (n 1)c]
Example 25. Enroute problem or Multichannel problem.

(a) There are two roads from A to B and two roads from B to C. Each of
the four roads has a probability p of being blocked by traffic jam, independently
of the other. What is the probability that there is an open road from A to C ?
Solution :
P (open road)
= P ((open road from A to B) (open road from B to C))

= P (open road from A to B) P (open road from B to C),
where we have made use of independence of events. However, p is same for all
roads p being the probability of road blocked by traffic jam.
P (open road)
= [1 P (no road from A toB)]2

=
[1 P (first road blocked and second road blocked)]2
[1 P (first road blocked).P (second road blocked)]2
(1 p2 )2
50
Further, suppose that there is also a direct road from A to C, which is independently blocked by traffic jam with probability p.
P (open road)
= P (open road direct road blocked) p

+P (open road direct road open) (1 p)
=
(1 p2 )2 p + 1.(1 p)
(b) Three urns contain respectively 1 white and 2 black balls; 2 white and 1
black balls; 2 white and 2 black balls. One ball is transferred from the first to
the second urn; then one ball is transferred from the second to the third urn;
finally one ball is drawn from the third urn. Find the probability that the ball
drawn from the third urn is white.
Figure 2.8: Schematic diagram.
Solution : Let the three urns be U1 , U2 , U3 . We denote by W and B the

elementary events of drawing white and black ball respectively from an urn.
For this given problem, there are four possible channels in which this chain of
ball-transfer can take place. These channels C1 , C2 , C3 , C4 are displayed in the
following diagram.
W
C1 : U1 U2 U3 W
C2 : U1 U2 U3 W
C3 : U1 U2 U3 W
Observe that
P (W |U1 ) =
1
3
P (B|U1 ) =
Now P (white from U2 |W were transported from U1 ) =
3
4
and P (white from U3 |W were transferred from U1 to U2 and W were transferred

from U2 to U3 ) = 53 .
51
By multiplication rule,
P (C1 channel followed) =
1 3 3
9
=
3 4 5
60
Further, P (black from U2 |W is transported from U1 ) =
(i)
1
4
and P (white from U3 |W were transferred from U1 to U2 and B transferred

from U2 to U3 ) = 25 .
1 1 2
2
=
3 4 5
60
P (White from U2 |B were transferred from U1 ) =
(ii)
2
4
and P (white from U3 |W were transferred from U2 to U3 while B were transferred from U1 to U2 ) = 35 .
2 2 3
12
=
3 4 5
60
(iii)
P (black from U2 |B were transferred from U1 ) = 42 .

P (white from U3 |B were transferred from U2 to U3 while B were transferred
from U1 to U2 ) = 25 .
2 2 2
8
=
3 4 5
60
(iv)
Further the channels C1 , C2 , C3 , C4 are mutually exclusive and exhaustive in the

sense that via one of these four channels the incident will occur in reality. So
by finite additivity, P (ball drawn from U3 is white) =
9
60
2
60
12
60
8
60
31
60 .
Example 26. Matching or Rencontre Problem : Consider the random

permutation of n distinct objects. We say that a match occurs at the kth
position if the kth object is placed in the kth position. Let Ak denote the
event that there is a match at the kth position.
Find the probability that
(i)
There is at least one match
(ii)
There is no match
(iii)
Exactly r matches
52
Solution : (i) By the general addition rule or so called Poincare Theorem,
n
P ( Ak )
k=1
n
X
P (Ak )
k=1
P (Ai1 Ai2 ) +
P (Ai1 Ai2 Ai3 )
1i1 <i+2 <i3 n
1i1 <i2 <n

(1)n1 P (A1
A2 . . . An )
n
X
n1
= S1 S2 + S3 . . . + (1)
Sn =
(1)k1 Sk ,
+... +
k=1
where
X
Sk =
P (Ai1 Ai2 . . . Aik )
1i1 <i2 <...<ik n

n
k
is the kth sum that contains
terms and moreover P (Ai1 Ai2 . . . Aik )
= P (k matches at the specified position labelled (i1 , i2 , . . . , ik ) =
(nk)!
n! .
[Note
that the total number of ways in which n distinct objects can be placed into
n positions is n! and when matches at the specific k positions take place, the
remaining (n k) objects can be placed in the remaining (n k) positions in
(n k)! ways]
n
P ( Ak ) =
k=1
P (no matches)
n
X
k=1
= P(
Ack )
k=1
n
X
(1)k1 Sk =
k=1
n
X
(1)k1
k!
k=1

(1)
k!
k=1
n
X
k=0

=1P
Ak
=P
k1
c
Ak
k=1
k
(1)
k!
1
Note that the R.H.S is the sum of first (n + 1) terms of the Taylor series of
2
1
and so for moderate values of n, works as a remarkably good approximation
e
to the probability of no matches. It seems from this simple observation that
the probability of at least one match among n randomly permuted objects is
practically independent of n.
(iii) To solve the problem of exactly, r matches, all that is necessary is to realise
that the events exactly r matches occurring at the positions i1 , i2 , . . . , ir are
disjoint events for various choices of i1 , i2 , . . . , ir the total number of such

choices being nr . Thus using definition of conditional probability,
n
P (exactly rmatches) =
.P (A1 A2 . . . Ar B)
r
=
n
r
P (A1 A2 . . . Ar )P (B|A1 A2 . . . Ar ),
53
where B denotes the event no matching for the last (n r) positions.
Again P (B|A1 A2 . . . Ar ) = P (no matches in the last (n r) positions
nr
X (1)k
given that matches occur at the first r positions) =
,
k!
k=0
P (exactly r matches) =
nr
n (n r)! nr
X (1)k
1 X (1)k
.
.
=
n!
k!
r!
k!
r
k=0
k=0
Practical Instances where Matching Problem Occurs :

(a) A man has to mail n letters. If he has n envelopes with the requisite
addresses of mailing and manages to put the n letters randomly in the n
envelopes, then natural question that strikes one is conformable as : Will
all the letters go to wrong destinations, or, will at least one letter reach
the correct addressee etc. ? It is just the matching problem in a new
mould.
(b) n students, while attending a class, dump their wet shoes in the rack.
After the class is over, they randomly pick up one pair of shoes. What
is the probability that there are all mismatches, or there is at least one
correct match or there are exactly r instances of students getting back
their own pairs ? It is also a problem of matching.
(c) Another variety of problems where matching is involved is the following :
Two equivalent decks of cards are well-shuffled and matched against each other.
What is the probability of exactly r matches ?
Example 27. Show that independence of one of the four pairs (A, B), (A, B c ), (Ac , B)
and (Ac , B c ) imply the independence of other pairs.
Suppose A and B are statistically independent, i.e., P (A B) = P (A).P (B).
Now
P (A)
= P (A ) = P (A (B B c )) = P (A B) + P (A B c )
= P (A).P (B) + P (A B c )
P (A B c )
= P (A) P (A)P (B) = P (A)(1 P (B)) = P (A)P (B c ),
establishing that A and B c are independent.
54
Again,
P (B)
= P (B ) = P (B (A Ac )) = P (B A) + P (B Ac )
= P (A).P (B) + P (B Ac )
P (B B c )
= P (B) P (A)P (B) = P (B)(1 P (A)) = P (B)P (Ac ),
Finally,
P (Ac B c )
= P ((A B)c ) = 1 P (A B) = 1 P (A) P (B) + P (A B)

=
1 P (A) P (B) + P (A)P (B) = (1 P (A))(1 P (B))
= P (Ac ).P (B c ),
indicating that Ac and B c are independent.
Example 28. Any event A is independent of a stochastically certain event B,
i.e., if P (B) = 1, P (A B) = P (A).P (B) prove or disprove.
Solution : By monotonicity of probability function, P (B) P (A B). Since
P (B) = 1 by hypothesis and for any event A B, P (A B) 1, we have
1 P (A B) 1, i.e, P (A B) = 1. Using addition rule,
P (A) + P (B) P (A B) = 1
Hence
P (A B) = P (A) = P (A).P (B),
ascertaining the claim.
Example 29. If A, B, C are mutually independent events, show that Ac , B c
and C c are mutually independent events.
Solution : From the definition of mutual independence of A, B, C it follows
that
P (A B) = P (A).P (B); P (B C) = P (B)P (C); P (C A) = P (C).P (A)
Again P (A B C) = P (A).P (B).P (C). To prove that Ac , B c and C c are
mutually independent we are to show both level-2 and level 3 independence.
But
P (Ac B c ) = P (Ac )P (B c ); P (B c C c ) = P (B c ).P (C c ); P (C c Ac ) = P (C c )D(Ac )
55
because A, B, C are pairwise independent (cf., example 25 above). Finally for
level-3 independence,
P (Ac B c C c )
=
1 P (A B C)
1 [P (A) + P (B) + P (C) P (A B) P (B C) P (C A) + P (A B C)]
(1 P (A))(1 P (B))(1 P (C)), mutual independence of A, B and C)
= P (Ac )P (B c )P (C c )
2.8
Exercises
1. Prove the following relations :

(a) P (A0 B 0 ) = 1 P (A0 B 0 )
(b) P (A0 B 0 ) = 1 P (A) + P (A B)
2. Bonferromis inequality : Prove that if A1 , A2 , . . . , An be any n events
connected to a random experiment E, then
3. A box contains 10 balls labelled 1, 2, . . . , 10. A ball is taken from the box
at random and then a second ball is picked at random from the remaining
9 balls. Find probability that the number labelled on the two selected
balls differ from each other by two or more.
4. Suppose two dice are rolled once and that the 36 possible outcomes are
equally likely. Find the probability that the sum of the numbers on the
two faces is even.
5. If n dice are thrown at a time, what is the probability of having each of
the points 1, 2, . . . , 6 appear at least once ?
6. In a game of poker a hand of cards means 5 cards randomly selected
(without replacement) from a deck of 52 cards. What is the probability
that a hand of cards
(a) consists of ace, queen, jack, king, and ten of the same suit ?
(b) contains 4 cards of the same denomination, i.e., either all aces or all
kings etc.
56
7. A man seeks advice regarding one of the possible two courses of action
from three seniors who arrive at their recommendations independently.
He follows the recommendations of the majority. The probabilities of the
individual advisers being wrong are 0.1, 0.05, 0.05 respectively. What is
the probability that the man takes an incorrect advice ?
8. The numbers 1, 2, 3, 4, 5 are written on five cards. Three cards are drawn
in succession. If the resulting digits are arranged from left to right, what
the probability that the three-digit number formed will be even ?
9. De Meres Paradox : What is more probable to get an ace with four dice
or to get one double ace in 24 throws of two dice ?
10. (a) An experiment has four possible outcomes A, B, C, D which are mutually exclusive. Is the following assignment of probabilities a feasible one
?
3
7
11
1
; P (B) =
; P (C) =
; P (D) =
10
30
20
5
Explain your answer.
P (A) =
(b) Explain the inconsistency in the following probability assignment :

The probability that the home team will win the upcoming cricket match
is 0.77, the probability that it will tie the match is 0.08 and the probability
of its losing or making a tie is 0.4.
11. From the numbers 1, 2, . . . , (2n + 1), 3 numbers are chosen at random.
Prove that the probability of these being in A.P. is
3n
4n2 1 .
12. Repeated drawings are made with replacement from the set of five letters
A, B, C, D, E. What is the probabiilty that B will not occur ?
13. From an urn containing N1 white and N2 black balls (N = N1 + N2 ), balls
aer drawn one by one without replacement until only those of the same
colour are left. Prove that the probability that the balls left are white is
N1
N .
14. Find the probability of obtaining a total of 21 points with 5 dice.

15. One person is to be chosen at random from a commitee composed of 4
men and 2 women. If it is a man, 2 women are added to those remaining;
if a woman, there is no replacement. Then a second person is chosen and
57
it is found to be a woman. What is the probability that our first choice
was a man ?
16. Three urns contain respectively 1 white and 2 black balls; 2 white and 1
black balls; 2 white and 2 black balls. One ball is transferred from the first
urn into the second, then one from the latter is transferred into the third;
finally, one ball is drawn from the third urn. What is the probability of
its being white ?
17. In a competitive examination, a student has to undergo a set of multiple
choice questions in which each question has 4 possible alternative answers,
exactly one of which is correct. If the student knows the answer, he picks
the correct alternative and otherwise, he selects one answer randomly out
of the 4 alternatives provided. If the student knows 60% answers correctly,
(a) what is the probability that for a given question, the student gets the
correct answer ?
(b) if the student chooses the correct alternative answer to a given question, what is the probability that he applied guesswork ?
18. For any three events A, B, C defined on the same probability space, if
B C and P (A) > 0, then prove that P (B|A) P (C|A).
19. Two persons throw a pair of dice once each. What is the probability that
the outcome of two throws is equal ?
20. There are two identical boxes, each provided with two drawers. In the
first, each drawer contains a gold coin; in the second, one drawer contains
a gold and other a silver coin; in the third, each drawer contains a silver
coin. A box is selected at random and one of the drawers opened. If a
gold coin is found, find the probability that the box chosen was the second
one.
21. There are n urns each containing N balls, of which N1 are white and N2
are black. One ball is transferred from the first to the second urn; then
one ball is transferred from the second to the third and so on; finally one
ball is drawn from the nth urn. Prove that the probability of the last ball
being white is
N1
N .
58
22. If the probabilities of n mutually independent events are p1 , p2 , . . . , pn ,
then show that the probability that at least one of these events will occur
is 1 (1 p1 )(1 p2 ) . . . (1 pn ). In technical term, this probability is
referred to as reliability of a system having n components connnected
parallelly.
23. A player randomly chooses one of the coins A and B. The probability of
coin A showing up heads is 3/4 while the probability of heads of coin B
is 14 . He tosses the coin twice.
(a) Find the probability that the obtains (i) two heads. (ii) one head.
(b) Instead of the above strategy, suppose the player chooses an unbiased
coin and tosses it twice. What procedure or strategy should the player
adopt to maximize the probability of at least one head ?
24. In Chennai 75% of the population is Tamils and rest non-Tamils. 20% of
the Tamils and 10% of the non-Tamils speak English. A stranger to Chennai meets a local resident who can speak English. What is the probability
that the local is a non-Tamil.
25. A person wrote letters to n addresses, put one letter in each envelope, and
then at random wrote one of the n addresses on each envelope. What is
the probability that no letter reached its proper destination ?
26. If the events A, B, C are mutually independent, then prove tht the pairs
(A, B C), (B, A C), (C, A B) also consist of independent events.
27. An experiment can result in one of the five outcomes, assigned probabilities of which are as follows : w1 with probability 18 ; w2 , w3 , w4 each with
probabilities
3
16 ;
and w5 with probability
5
16 .
Define E, F and G as follows
:
E = {w1 , w2 , w3 }; F = {w1 , w2 , w4 }; G = {w1 , w3 , w4 }
Show that E, F, G are not pairwise independent, although P (E F G) =
P (E).P (F ).P (G), i.e, has level-3 independence.
28. If n objects are distributed at random among men and b(b < a) women,
show that the probability that the women get an odd number of objects
is
1
2
[(a + b)n (a b)n ] /(a + b)n .
59
29. A parent particle can be split up into 0, 1 or 2 particles with probability
1 1
4, 2
and
1
4
respectively. Beginning with a single particle, the progenitor,
lets denote by Xi the number of particles in the ith generation. Find (a)
P (X2 > 0) and (b) The probability that X1 = 2, given that X2 = 1.
30. It is suspected that a patient has one of the diseases A1 , A2 , A3 . Suppose
that the population percentages suffering from these illnesses are in the
ratio 2 : 1 : 1. Patient is given a test which turns out to be positive in
25% the cases of A1 , 50% of A2 , and 90% of A3 . Given that out of three
tests taken by the patient two were positive, find the probability for each
of the three illnesses.
31. Laplaces law succession : Assuming that it was hot on n consecutive days,
what is the probability that it will be hot during the next m days ? Try
to formulate an example to show that this law of succession yields absurd
results at times.
32. In a box there are 10 cut up alphabet cards with letters 3As, 4Ms and 3Ns.
We draw three cards one after another and place these letters on the table
in the order they have been drawn. Whats the probability that the word
MAN will appear ? If instead, the letters are drawn out simultaneously,
then whats the probability that the word MAN can be had from the
letters drawn ?
33. A man goes to his office following one of the three routes A1 , A2 and
A3 . His choice of route is quite independent of the weather. If it rains, his
probabilities of arriving late, following routes A1 , A2 , A3 are 0.06, 0.15, 0.12
respectively. The corresponding probabilities, if it does not rain, are
0.05, 0.10, 0.15.
(a) Given that on a sunny day he arrives late, what is the probability that
he used route A3 ? (Assume that two in every five days are rainy).
(b) Given that on a day he arrives late, what is the probability that it is
a rainy day ?
34. If A, B, C are any three events associated with a random experiment, show
that P (AB|C) = P (B|AC)P (A|C).
60
35. The integers x and y are chosen at random with replacement from the first
nine natural numbers {1, 2, . . . , 9}. Find the probability that |x2 y 2 | is
divisible by 2.
36. Most pairs of independent events that we deal with are distinct events.
However, this need not be the case as there are events that are independent
of themselves. An event A is said to be independent of itself P (A) =
(P (A))2 . Identify these events associated with a given random experiment.
37. A coin is tossed (m + n) times (m > n). Show that the probability
of exactly m consecutive heads is (n + 3)/2m+2 and that of at least m
consecutive heads is (n + 2)/2m+1 .
38. You have two coins in your pocket a fair one and an unfair one with
probability of heads
1
3,
but otherwise identical. A coin is selected at
random and tossed, falling heads up. How likely is it that it is the fair one
?
39. Cards are dealt from a well-shuffleed pack until the first heart appears.
(a) What is the probability that exactly 5 deals are required ?
(b) What is the probability that 5 or fewer deals are required ?
(c) What is the probability that exactly 3 deals were required, given the
information that 5 or fewer were required.
40. The four major blood groups are present approximately in the following
proportions in the Indians :
Type
AB
24
28
30
18
(a) If two people are picked at random from this population, whats the
chance that their blood groups are same ? different ?
(b) If four people are picked at random, and P (k) dentoe the chances of
these four people having exactly k different blood types among them, find
P (k) for k = 1, 2, 3, 4.
41. Suppose that a laboratory test on a blood sample yields one of the two
results - positive or negative. It is found that 95% of people with a particular disease produce a positive result. But 2% of the people without the
61
disease will also produce a positive result (a false positive). Suppose that
1% of the population actually has the disease. What is the probability
that a person chosen at random from the population will have the said
diesase, given that the laboratory test yields a positive result.
Appendix to Chapter II
A1. -algebra construction in R
In the remark (i) of Section 2.3 we stated that the family of all open intervals
in R, the family of all half open intervals in R, the family of closed intervals
in R none is a -algebra in its own right. However, the -algebras which
these families generate are all identical. We shall now prove a theorem which is
more general viz., the -algebras generated by each of the following families
of subsets of R are all identical.
[In what follows, the -algebra generated by the family Ak of subsets in R is
denoted by F(Ak )].
Theorem : Show that F(A1 ) = F(A2 ) = . . . = F(A8 ) provided
A1 : family of open intervals of the form (a, b).
A2 : family of half-open intervals of the form (a, b].
A3 : family of half-open intervals of the form [a, b).
A4 : family of closed intervals of the form (a, b].
A5 : family of semiinfinite intervals of the form (, a).
A6 : family of semiinfinite intervals of the form (a, +).
A7 : family of open subsets in R.
A8 : family of closed subsets in R.
Proof : We shall try to prove this theorem in the following sequence.
F(A1 ) F(A2 ) F(A3 ) F(A4 ) F(A5 F(A6 ) F(A7 ) F(A8 ) F(A1 )
(a, b) = (a, b
n=1
1
],
n
(a, b) A1 F(A1 ) implies (a, b) F(A2 ). So F(A1 ) F(A2 ).
62
Now
(a, b)
{b}
(a, b]
F(A2 )
[a +
n=1
[b
n=1
1
1
, b ) F(A3 ) and
n
n
1
1
, b + ) F(A3 )
n
n
(a, b) {b} F(A3 )

F(A3 ).
Again since

1
[a, b)
= a, b
, [a, b) F(A4 )
n=1
n
F(A3 ) F(A4 )
Further,
= (, b] (, a)
1
= (, b + ) (, a) F(A5 ),
n=1
n
since being a -algebra, F(A5 ) is closed under countable intersection and com[a, b]
plementation.
F(A4 ) F(A5 )
We observe that F(A5 ) F(A6 ) as for any real a,

c

c
1
1
(, a) = ([a, ))c = a ,
=
a ,
n=1
n=1
n
n
Since each open interval is an open set in R, any interval of the form (a, ) is
an open set and so belongs to A7 .
F(A6 ) F(A7 )
As complement of any closed set in R is an open set in R and F(A8 ) is closed
under complementation, every open set in R is contained in F(A8 ).
F(A7 ) F(A8 )
Finally, due to Lindeloff covering theorem, every open set in R is expressible
as a countable union of open intervals. Hence every open set in R belongs to
F(A1 ) and so also every closed set in R. This ensures that F(A8 ) F(A1 ).
All these prove that the -algebras F(Ak ), k = 1, 2, . . . , 8 are identical. We
therefore denote them by a common symbol, viz., F(R) or B and refer to it as
63
the -algebra of Borel subsets in R. The elements of B are called Borel sets
in R.
Note : If Ak be a family of subsets of R, then F(Ak ), the -algebra generated
by Ak is the intersection of all -algebras (of subsets in R) that contain Ak .
F(Ak ) = {F is a algebra containing Ak }
Very interesting to see that N and Z are Borel sets in R. (For each k N , {k}
belongs to F(A3 ) and hence belongs to B
N = {k} B.
k=1
In the same vein, one can ensure that Z is a Borel set in R. One can also show
that cantor set is a Borel set in R. As N and Z are Borel sets. We can develop
the theory of discrete random variables. Cantor set is useful in constructing the
Cantor function which is a singular continuous distribution function (See A3,
Chapter 4).
Few remarks on the independence-idea of two or more events
Two events A and B are said to be independent if both the relations P (B|A) =
P (B) and P (B|Ac ) = P (B) hold true. But why two ? This answer can be found
out provided we look into the basic definition of conditional probability. Note
that P (B) = P (B|A) is valid only if P (A) 6= 0. However, if P (A) = 0, this
relation is undefined. Again P (A) = 0 means P (Ac ) = 1 and hence P (B|Ac )
is well-defined. So the two relations put up for characterising independence,
supplement each other and both of them lead us to the multiplication rule, viz.,
P (A B) = P (A)P (B).
P (B|Ac ) = P (B Ac )/P (Ac )
P (B Ac ) = P (B) P (B A),
P (B|Ac ) = P (B) implies P (A B) = P (A)P (B).
The pair of relations defining indpendence of events is very intuitively appealing
as it pronounces that the event B is independent of A provided the knowledge
of occurrence non-occurrence of A does not anyhow determine the probability
of B.
The idea of independence of several events is a natural extension of the idea
of independence of two events. For instance, three events A, B, C are called
independent if
64
(i) Probability of B does not anyhow depend on whether or not A occurs.
(ii) Probability of C is not anyhow influenced by the apriori information which
two of the four events A, B, Ac , B c did occur, i.e,
P (C) = P (C|A B) = P (C|Ac B) = P (C|A B c ) = P (C|Ac B c )
All these however, lead us to the set of four relations characterising mutual
independence of three events A, B, C :
Since A and B are independent, P (A B) = P (A)P (B)
P (C) = P (C|A B) =
P (A B C)
P (A B C)
=
P (A B)
P (A)P (B)
i.e.,
P (A B C) = P (A)P (B)P (C)
[level-3 independence]. Again
P ((B C) Ac )
P ((B C) Ac )
=
P (Ac B)
P (Ac )P (B)
P (B C) P (A B C)
P (B C) P (A)P (B)P (C)
=
=
P (B)(1 P (A))
P (B)(1 P (A))
P (B C) = P (B)P (C)
P (C)
= P (C|Ac B) =
Similarly,
P (C)
= P (C|A B c ) implies P (C A) = P (C)P (A)
P (C)
= P (C|Ac B c ) implies P (A B) = P (A)P (B)
These characterise level-2 independence.

A2. Derivation of Kolmogorovs Axiom of Countable Additivity
We have show that Axiom of Continuity togethe with Axiom of finite additivity of the probability measure imply Kolmogorovs Axiom of countable
additivity.
For this, let {An } be a sequence of pairwise mutually exclusive events, i.e.,
Ai Aj = i, j = 1, 2,
. . . withi 6=j.

Define Bn = Ak = Ak Ak , so that Bn+1 = Bn An+1 n

k=n+1
k=1
k=1
1.
Clearly {Bn } is a nested sequence of events with Bn = . This emptiness is

n=1
ensuerd due to pairwise disjoint nature of events An .
65
Now,
lim Bn
= [since {Bn } is nested]
n=1
P ( lim Bn )
= P ()
P (Bn )

P Ak
k=1

P Ak
0 [due to Axiom of Continuity]

= lim
Ak [due to subadditivity]
n
k=1

P
Ak
k=1
lim
k=+1
n
X
P (Ak ) [due to Axiom of finite additivity]
k=1
P (Ak ).
k=1
A3. A new Outlook to the Rule of Total Probability

The rule of total probability or the so-called rule of elimination states
that if the sample space is partitioned into a finite number of sets {Ai /i =
1, 2, . . . , n}, then
P (B) =
n
X
P (Aj )P (B|Aj )
j=1
This relation reminds us of the linear span of a vector in a vector space in terms
of a set of preassigned basis vectors. Here the apriori probabilities P (Aj )s
which partition unity work as the basis vectors while the likelihoods P (B|Aj )
work like the co-efficients appearing in the linear span. However, if you are too
much professional, you may treat it barely as a fiddlestick.

Probability Models

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Probability Models

Hochgeladen von

Copyright:

Verfügbare Formate

Chapter 2

Classical Definition of Probability

In this chapter we shall be discussing the three definitions of probabilityviz. (i)

According to Laplace this ratio

is the probability of event A :

Drawbacks or Limitations of the Classical Definition

Statistical Definition of Probability

Once we abandon the classical definition based on intuition, we should like to

which denotes the relative frequency of event A. Prolonged observations show

Advantages and Disadvantages of Von Mises Definition

(n() standing for the frequency of ).

We prove that P (A1 A2 ) = P (A1 ) + P (A2 )

Associate with each m 2 a proposition

Induction step : Let P (k) be true, i.e.,

hold true. Again we have

where in deduction of the above chain of equalities, we used the assumption of

Aj = for i 6= j and Ai = , then from (a) and (c) it follows that

If in particular is a finite sample space, say = {a1 , a2 , . . . , am } then may

P ({ai }) = mp({ai }),

(e) As a passing remark, we state that non-negativity criterion satisfied by

Kolmogorovs Axiomatic Theory

The Axiomatic Theory of probability, dating back to 1933, is based on the

of induction one can establish that if A1 , A2 , . . . , An F, then Ak F. By

k = 1, 2, . . . , n. Finally we observe that for and set A F, Ac F and hence

[a, b] = (a, b] {a}

[a, b) = [a, b] {b} = ((a, b] {a}) {b}

If A F, then Ac F (closure under complementation)

If A1 , A2 , . . . , An , . . . F, then An F (closure under countable union).

If A1 , A2 , . . . , An , . . . F then An F (closure under contable intersection).

(iii) If {An } be a sequence of pairwise mutually exclusive sets in F, then An

distribution will be quite different, say, P (1) = P (3) = P (5) =

In both cases, the axioms of Kolmogorov get satisfied.

Thus dice throwing is a phenomenon whose study demands consideration of

If the sample space be countably infinite, it suffices to make the assignment

for each elementary event. If A F, P (A) =

finite sample space, assignment of equal probability is an absurdity as the series

If the sample space be uncountable, each singleton is an elementary event

Figure 2.1: Venn diagram of A B.

Using countable additivity of P () we have :

Consider At+1 and from P ( Ai )

P (Ai At+1 ), where # summands of type P (Ai At+1 )

P (Ai Aj At+1 ), where # summands of type P (Ai

(# sumands of type P (Ai Aj ) is t C2 )

Similarly ordered triplets appear in

. . . + (1)t+1 P (A1 A2 . . . At+1 ),

= P (A1 ) + P (A2 ) P (A1 A2 )

= P (A1 ) + P (A2 ) P (A1 A2 )

1 P (Ac1 ) + 1 P (Ac2 ) P (A1 A2 )

1 P (Ac1 ) P (Ac2 ) + P ((A1 A2 )c )

Since the proposition was shown to be true already for n = 2, by induction

if {An } be a sequence of events then P An P (Ak ) P An for any

for i 6= j. A1 , A2 , . . . , An are exhaustive if Ai = and A1 , A2 , . . . , An are

equally likely if P (Ak ) are same. If P (Ak ) = c k = 1, 2, . . . , n then 1 =

we have B = Aik so that due to the countable additivity axiom,

a result that corresponds to the classical definition.

Proof : Let {An } be an expanding sequence of events - i.e.,

Now, due to axiom of countable additivity,

P (Ak Ak1 ) + P (A1 ) since (B1 = A1 )

Hence P ( lim An ) = lim P (An ) [since {An } is expanding, An = lim An ]

In case {An } be contracting, {Acn } is expanding and hence, by virtue of the

P ( Acn ) = lim P (Acn )

[since {An } is contracting, An = limn An ]

Recalling the definition of a continuous function in the language of limit of a