Probability Theory and Stochastic Processes Lectures

1
TABLE OF CONTENTS
PROBABILITY THEORY
Lecture 1 Basics
Lecture 2 Independence and Bernoulli Trials
Lecture 3 Random Variables
Lecture 4 Binomial Random Variable Applications and Conditional
Probability Density Function
Lecture 5 Function of a Random Variable
Lecture 6 Mean, Variance, Moments and Characteristic Functions
Lecture 7 Two Random Variables
Lecture 8 One Function of Two Random Variables
Lecture 9 Two Functions of Two Random Variables
Lecture 10 Joint Moments and Joint Characteristic Functions
Lecture 11 Conditional Density Functions and Conditional Expected Values
Lecture 12 Principles of Parameter Estimation
Lecture 13 The Weak Law and the Strong Law of Large numbers
2
STOCHASTIC PROCESSES
Lecture 14 Stochastic Processes - Introduction
Lecture 15 Poisson Processes
Lecture 16 Mean square Estimation
Lecture 17 Long Term Trends and Hurst Phenomena
Lecture 18 Power Spectrum
Lecture 19 Series Representation of Stochastic processes
Lecture 20 Extinction Probability for Queues and Martingales
Note: These lecture notes are revised periodically with new materials
and examples added from time to time. Lectures 1 11 are
used at Polytechnic for a first level graduate course on Probability
theory and Random Variables. Parts of lectures 14 19 are used at
Polytechnic for a Stochastic Processes course. These notes are intended
for unlimited worldwide use. Any feedback may be addressed to
pillai@hora.poly.edu
S. UNNIKRISHNA PILLAI
3
PROBABILITY THEORY
1. Basics
Probability theory deals with the study of random
phenomena, which under repeated experiments yield
different outcomes that have certain underlying patterns
about them. The notion of an experiment assumes a set of
repeatable conditions that allow any number of identical
repetitions. When an experiment is performed under these
conditions, certain elementary events occur in different
but completely uncertain ways. We can assign nonnegative
number as the probability of the event in various
ways:
), (
i
P
i
PILLAI
4
Laplaces Classical Definition: The Probability of an
event A is defined a-priori without actual experimentation
as
provided all these outcomes are equally likely.
Consider a box with n white and m red balls. In this case,
there are two elementary outcomes: white ball or red ball.
Probability of selecting a white ball
,
outcomes possible of number Total
to favorable outcomes of Number
) (
A
A P =
.
m n
n
+
=
(1-1)
PILLAI
5
Relative Frequency Definition: The probability of an
event A is defined as
where n
A
is the number of occurrences of A and n is the
total number of trials.
The axiomatic approach to probability, due to
Kolmogorov, developed through a set of axioms (below) is
generally recognized as superior to the above definitions, as
it provides a solid foundation for complicated applications.
n
n
A P
A
n
lim ) (

=
(1-2)
PILLAI
6
The totality of all known a priori, constitutes a set ,
the set of all experimental outcomes.
has subsets Recall that if A is a subset of
, then implies From A and B, we can
generate other related subsets etc.
,
i
{ } , , , ,
2 1 k
=
(1-3)
A
.
. , , , C B A
, , , , B A B A B A
{ }
{ } B A B A
B A B A
=
=

and |
or |
and
{ } A A = |
(1-4)
PILLAI
7
A B
B A
A
A
A
A B
B A
If the empty set, then A and B are
said to be mutually exclusive (M.E).
A partition of is a collection of mutually exclusive
subsets of such that their union is .
, = B A
. and ,
1
= =
=
i
i j i
A A A
B
A
= B A
1
A
2
A
n
A
i
A
(1-5)
j
A
Fig.1.1
Fig. 1.2
PILLAI
8
De-Morgans Laws:
(1-6)
B A B A B A B A = = ;
A B
A B A B
A
B
B A
B A
B A
Often it is meaningful to talk about at least some of the
subsets of as events, for which we must have mechanism
to compute their probabilities.
Example 1.1: Consider the experiment where two coins are
simultaneously tossed. The various elementary events are
Fig.1.3
PILLAI
9
) , ( ), , ( ), , ( ), , (
4 3 2 1
T T H T T H H H = = = =
and
{ }. , , ,
4 3 2 1
=
The subset is the same as Head
has occurred at least once and qualifies as an event.
Suppose two subsets A and B are both events, then
consider
Does an outcome belong to A or B
Does an outcome belong to A and B
Does an outcome fall outside A?
{ } , ,
3 2 1
= A
B A =
B A =
PILLAI
10
Thus the sets etc., also qualify as
events. We shall formalize this using the notion of a Field.
Field: A collection of subsets of a nonempty set forms
a field F if
Using (i) - (iii), it is easy to show that etc.,
also belong to F. For example, from (ii) we have
and using (iii) this gives
applying (ii) again we get where we
have used De Morgans theorem in (1-6).
, , , , B A B A B A
. then , and If (iii)
then , If (ii)
(i)
F B A F B F A
F A F A
F

, , B A B A
, , F B F A ; F B A
, F B A B A =
(1-7)
PILLAI
11
Thus if then
From here on wards, we shall reserve the term event
only to members of F.
Assuming that the probability of elementary
outcomes of are apriori defined, how does one
assign probabilities to more complicated events such as
A, B, AB, etc., above?
The three axioms of probability defined below can be
used to achieve that goal.
, , F B F A
{ }. , , , , , , , , B A B A B A B A B A F =
) (
i i
P p =
i
(1-8)
PILLAI
12
Axioms of Probability
For any event A, we assign a number P(A), called the
probability of the event A. This number satisfies the
following three conditions that act the axioms of
probability.
(Note that (iii) states that if A and B are mutually
exclusive (M.E.) events, the probability of their union
is the sum of their probabilities.)
). ( ) ( ) ( then , If (iii)
unity) is set whole the of ty (Probabili 1 ) ( (ii)
number) e nonnegativ a is ty (Probabili 0 ) ( (i)
B P A P B A P B A
P
A P
+ = =
=

(1-9)
PILLAI
13
The following conclusions follow from these axioms:
a. Since we have using (ii)
But and using (iii),
b. Similarly, for any A,
Hence it follows that
But and thus
c. Suppose A and B are not mutually exclusive (M.E.)?
How does one compute
, = A A
. 1 ) ( ) P( = = P A A
, A A
). ( 1 ) or P( 1 ) P( ) ( ) P( A P A A A P A A = = + =
(1-10)
{ } { }. = A
{ } ( ) . ) ( ) ( P A P A P + =
{ } , A A = { } . 0 = P
(1-11)
? ) ( = B A P
PILLAI
14
To compute the above probability, we should re-express
in terms of M.E. sets so that we can make use of
the probability axioms. From Fig.1.4 we have
where A and are clearly M.E. events.
Thus using axiom (1-9-iii)
To compute we can express B as
Thus
since and are M.E. events.
B A
, B A A B A =
(1-12)
). ( ) ( ) ( ) ( B A P A P B A A P B A P + = =
), ( B A P
A B BA A B A B
A A B B B
= =
= =
) ( ) (
) (
), ( ) ( ) ( A B P BA P B P + =
AB BA =
B A A B =
B A
(1-13)
(1-14)
(1-15)
A
B A
B A
Fig.1.4
PILLAI
15
From (1-15),
and using (1-16) in (1-13)
Question: Suppose every member of a denumerably
infinite collection A
i
of pair wise disjoint sets is an
event, then what can we say about their union
i.e., suppose all what about A? Does it
belong to F?
Further, if A also belongs to F, what about P(A)?
) ( ) ( ) ( AB P B P B A P =
). ( ) ( ) ( ) ( AB P B P A P B A P + =
?
1
=
=
i
i
A A
, F A
i

(1-18)
(1-16)
(1-17)
(1-19)
(1-20)
PILLAI
16
The above questions involving infinite sets can only be
settled using our intuitive experience from plausible
experiments. For example, in a coin tossing experiment,
where the same coin is tossed indefinitely, define
A = head eventually appears.
Is A an event? Our intuitive experience surely tells us that
A is an event. Let
Clearly Moreover the above A is . =
j i
A A
(1-21)
{ }
} , , , , , {
toss th the on 1st time for the appears head
1
h t t t t
n A
n
n

=
=
(1-22)
(1-23)
.
3 2 1
=
i
A A A A A
PILLAI
17
We cannot use probability axiom (1-9-iii) to compute
P(A), since the axiom only deals with two (or a finite
number) of M.E. events.
To settle both questions above (1-19)-(1-20), extension of
these notions must be done based on our intuition as new
axioms.
-Field (Definition):
A field F is a -field if in addition to the three conditions
in (1-7), we have the following:
For every sequence of pair wise disjoint
events belonging to F, their union also belongs to F, i.e.,
, 1 , = i A
i
.
1
F A A
i
i
=
=
(1-24)
PILLAI
18
In view of (1-24), we can add yet another axiom to the
set of probability axioms in (1-9).
(iv) If A
i
are pair wise mutually exclusive, then
Returning back to the coin tossing experiment, from
experience we know that if we keep tossing a coin,
eventually, a head must show up, i.e.,
But and using the fourth probability axiom
in (1-25),
). (
1 1

=
=
n
n
n
n
A P A P

(1-25)
. 1 ) ( = A P
(1-26)
=
=
1
,
n
n
A A
). ( ) (
1 1

=
=
=
n
n
n
n
A P A P A P

(1-27)
PILLAI
19
From (1-22), for a fair coin since only one in 2
n
outcomes
is in favor of A
n
, we have
which agrees with (1-26), thus justifying the
reasonableness of the fourth axiom in (1-25).
In summary, the triplet (, F, P) composed of a nonempty
set of elementary events, a -field F of subsets of , and
a probability measure P on the sets in F subject the four
axioms ((1-9) and (1-25)) form a probability model.
The probability of more complicated events must follow
from this framework by deduction.
, 1
2
1
) ( and
2
1
) (
1 1
= = =

=
= n
n
n
n
n
n
A P A P (1-28)
PILLAI
20
Conditional Probability and Independence
In N independent trials, suppose N
A
, N
B
, N
AB
denote the
number of times events A, B and AB occur respectively.
According to the frequency interpretation of probability,
for large N
Among the N
A
occurrences of A, only N
AB
of them are also
found among the N
B
occurrences of B. Thus the ratio
. ) ( , ) ( , ) (
N
N
AB P
N
N
B P
N
N
A P
AB B A

(1-29)
) (
) (
/
/
B P
AB P
N N
N N
N
N
B
AB
B
AB
= =
(1-30)
PILLAI
21
is a measure of the event A given that B has already
occurred. We denote this conditional probability by
P(A|B) = Probability of the event A given
that B has occurred.
We define
provided As we show below, the above definition
satisfies all probability axioms discussed earlier.
,
) (
) (
) | (
B P
AB P
B A P =
. 0 ) ( B P
(1-31)
PILLAI
22
We have
(i)
(ii) since B = B.
(iii) Suppose Then
But hence
satisfying all probability axioms in (1-9). Thus (1-31)
defines a legitimate probability measure.
, 0
0 ) (
0 ) (
) | (
>
=
B P
AB P
B A P
.
) (
) (
) (
) ) ((
) | (
B P
CB AB P
B P
B C A P
B C A P

=

=
, 1
) (
) (
) (
) (
) | ( = =
=
B P
B P
B P
B P
B P
), | ( ) | (
) (
) (
) (
) (
) | ( B C P B A P
B P
CB P
B P
AB P
B C A P + = + =
. 0 = C A
, = AC AB
). ( ) ( ) ( CB P AB P CB AB P + =
(1-35)
(1-33)
(1-32)
(1-34)
PILLAI
23
Properties of Conditional Probability:
a. If and
since if then occurrence of B implies automatic
occurrence of the event A. As an example, but
in a dice tossing experiment. Then and
b. If and
, , B AB A B =
1
) (
) (
) (
) (
) | ( = = =
B P
B P
B P
AB P
B A P
(1-36)
, A B
). (
) (
) (
) (
) (
) | ( A P
B P
A P
B P
AB P
B A P > = =
, , A AB B A =
(1-37)
, A B
. 1 ) | ( = B A P
{outcome is even}, ={outcome is 2}, A B =
PILLAI
24
(In a dice experiment,
so that The statement that B has occurred (outcome
is even) makes the odds for outcome is 2 greater than
without that information).
c. We can use the conditional probability to express the
probability of a complicated event in terms of simpler
related events.
Let are pair wise disjoint and their union is .
Thus and
Thus
. B A
.
1
=
=
n
i
i
A
n
A A A , , ,
2 1

, =
j i
A A
. ) (
2 1 2 1 n n
BA BA BA A A A B B = =
(1-38)
(1-39)
PILLAI
{outcome is 2}, ={outcome is even}, A B =
25
But so that from (1-39)
With the notion of conditional probability, next we
introduce the notion of independence of events.
Independence: A and B are said to be independent events,
if
Notice that the above definition is a probabilistic statement,
not a set theoretic notion such as mutually exclusiveness.
). ( ) ( ) ( B P A P AB P = (1-41)
, = =
j i j i
BA BA A A

= =
= =
n
i
i i
n
i
i
A P A B P BA P B P
1 1
). ( ) | ( ) ( ) (
(1-40)
PILLAI
26
Suppose A and B are independent, then
Thus if A and B are independent, the event that B has
occurred does not shed any more light into the event A. It
makes no difference to A whether B has occurred or not.
An example will clarify the situation:
Example 1.2: A box contains 6 white and 4 black balls.
Remove two balls at random without replacement. What
is the probability that the first one is white and the second
one is black?
Let W
1
= first ball removed is white
B
2
= second ball removed is black
). (
) (
) ( ) (
) (
) (
) | ( A P
B P
B P A P
B P
AB P
B A P = = =
(1-42)
PILLAI
27
We need We have
Using the conditional probability rule,
But
and
and hence
? ) (
2 1
= B W P
). ( ) | ( ) ( ) (
1 1 2 1 2 2 1
W P W B P W B P B W P = =
,
5
3
10
6
4 6
6
) (
1
= =
+
= W P
,
9
4
4 5
4
) | (
1 2
=
+
= W B P
. 25 . 0
81
20
9
4
9
5
) (
2 1
= = B W P
.
1 2 2 1 2 1
W B B W B W = =
(1-43)
PILLAI
28
Are the events W
1
and B
2
independent? Our common sense
says No. To verify this we need to compute P(B
2
). Of course
the fate of the second ball very much depends on that of the
first ball. The first ball has two options: W
1
= first ball is
white or B
1
= first ball is black. Note that
and Hence W
1
together with B
1
form a partition.
Thus (see (1-38)-(1-40))
and
As expected, the events W
1
and B
2
are dependent.
,
1 1
= B W
.
1 1
= B W
,
5
2
15
2 4
5
2
3
1
5
3
9
4
10
4
3 6
3
5
3
4 5
4

) ( ) | ( ) ( ) | ( ) (
1 1 2 1 1 2 2
=
+
= + =
+
+
+
=
+ = B P R B P W P W B P B P
.
81
20
) (
5
3
5
2
) ( ) (
1 2 1 2
= = W B P W P B P
PILLAI
29
From (1-31),
Similarly, from (1-31)
or
From (1-44)-(1-45), we get
or
Equation (1-46) is known as Bayes theorem.
). ( ) | ( ) ( B P B A P AB P =
(1-44)
,
) (
) (
) (
) (
) | (
A P
AB P
A P
BA P
A B P = =
(1-45)
). ( ) | ( ) ( A P A B P AB P =
). ( ) | ( ) ( ) | ( A P A B P B P B A P =
) (
) (
) | (
) | ( A P
B P
A B P
B A P =
(1-46)
PILLAI
30
Although simple enough, Bayes theorem has an interesting
interpretation: P(A) represents the a-priori probability of the
event A. Suppose B has occurred, and assume that A and B
are not independent. How can this new information be used
to update our knowledge about A? Bayes rule in (1-46)
take into account the new information (B has occurred)
and gives out the a-posteriori probability of A given B.
We can also view the event B as new knowledge obtained
from a fresh experiment. We know something about A as
P(A). The new information is available in terms of B. The
new information should be used to improve our
knowledge/understanding of A. Bayes theorem gives the
exact mechanism for incorporating such new information.
PILLAI
31
A more general version of Bayes theorem involves
partition of . From (1-46)
where we have made use of (1-40). In (1-47),
represent a set of mutually exclusive events with
associated a-priori probabilities With the
new information B has occurred, the information about
A
i
can be updated by the n conditional probabilities
,
) ( ) | (
) ( ) | (
) (
) ( ) | (
) | (
1
=
= =
n
i
i i
i i i i
i
A P A B P
A P A B P
B P
A P A B P
B A P
(1-47)
, 1 , n i A
i
=
. 1 ), ( n i A P
i
=
47). - (1 using , 1 ), | ( n i A B P
i
=
PILLAI
32
Example 1.3: Two boxes B
1
and B
2
contain 100 and 200
light bulbs respectively. The first box (B
1
) has 15 defective
bulbs and the second 5. Suppose a box is selected at
random and one bulb is picked out.
(a) What is the probability that it is defective?
Solution: Note that box B
1
has 85 good and 15 defective
bulbs. Similarly box B
2
has 195 good and 5 defective
bulbs. Let D = Defective bulb is picked out.
Then
. 025 . 0
200
5
) | ( , 15 . 0
100
15
) | (
2 1
= = = = B D P B D P
PILLAI
33
Since a box is selected at random, they are equally likely.
Thus B
1
and B
2
form a partition as in (1-39), and using
(1-40) we obtain
Thus, there is about 9% probability that a bulb picked at
random is defective.
.
2
1
) ( ) (
2 1
= = B P B P
. 0875 . 0
2
1
025 . 0
2
1
15 . 0
) ( ) | ( ) ( ) | ( ) (
2 2 1 1
= + =
+ = B P B D P B P B D P D P
PILLAI
34
(b) Suppose we test the bulb and it is found to be defective.
What is the probability that it came from box 1?
Notice that initially then we picked out a box
at random and tested a bulb that turned out to be defective.
Can this information shed some light about the fact that we
might have picked up box 1?
From (1-48), and indeed it is more
likely at this point that we must have chosen box 1 in favor
of box 2. (Recall box1 has six times more defective bulbs
compared to box2).
. 8571 . 0
0875 . 0
2 / 1 15 . 0
) (
) ( ) | (
) | (
1 1
1
=
= =
D P
B P B D P
D B P
? ) | (
1
= D B P
(1-48)
; 5 . 0 ) (
1
= B P
, 5 . 0 857 . 0 ) | (
1
> = D B P
PILLAI
1
2. Independence and Bernoulli Trials
Independence: Events A and B are independent if
It is easy to show that A, B independent implies
are all independent pairs. For example,
and so that
or
i.e., and B are independent events.
). ( ) ( ) ( B P A P AB P =
(2-1)
; , B A
B A B A , ; ,
B A AB B A A B = = ) (
, = B A AB
), ( ) ( ) ( )) ( 1 ( ) ( ) ( ) ( ) ( B P A P B P A P B P A P B P B A P = = =
) ( ) ( ) ( ) ( ) ( ) ( ) ( B A P B P A P B A P AB P B A AB P B P + = + = =
A
PILLAI
2
If P(A) = 0, then since the event always, we have
and (2-1) is always satisfied. Thus the event of zero
probability is independent of every other event!
Independent events obviously cannot be mutually
exclusive, since and A, B independent
implies Thus if A and B are independent,
the event AB cannot be the null set.
More generally, a family of events are said to be
independent, if for every finite sub collection
we have
A AB
, 0 ) ( 0 ) ( ) ( = = AB P A P AB P
0 ) ( , 0 ) ( > > B P A P
. 0 ) ( > AB P
, , , ,
2 1 n
i i i
A A A
= =
=
|
|
.
|
\
|
n
k
i
n
k
i
k k
A P A P
1 1
). (
{ }
i
A
(2-2)
PILLAI
3
Let
a union of n independent events. Then by De-Morgans
law
and using their independence
Thus for any A as in (2-3)
a useful result.
,
3 2 1 n
A A A A A =
(2-3)
n
A A A A
2 1
=
(2-4)
. )) ( 1 ( ) ( ) ( ) (
1 1
2 1

= =
= = =
n
i
i
n
i
i n
A P A P A A A P A P
(2-5)
, )) ( 1 ( 1 ) ( 1 ) (
1
=
= =
n
i
i
A P A P A P
(2-6)
PILLAI
4
, ) ( p A P
i
=
. 3 1 = i
Example 2.1: Three switches connected in parallel operate
independently. Each switch remains closed with probability
p. (a) Find the probability of receiving an input signal at the
output. (b) Find the probability that switch S
1
is open given
that an input signal is received at the output.
Solution: a. Let A
i
= Switch S
i
is closed. Then
Since switches operate independently, we have
Fig.2.1
Input Output
). ( ) ( ) ( ) ( ); ( ) ( ) (
3 2 1 3 2 1
A P A P A P A A A P A P A P A A P
j i j i
= =
PILLAI
5
Let R = input signal is received at the output. For the
event R to occur either switch 1 or switch 2 or switch 3
must remain closed, i.e.,
(2-7)
.
3 2 1
A A A R =
. 3 3 ) 1 ( 1 ) ( ) (
3 2 3
3 2 1
p p p p A A A P R P + = = =
). ( ) | ( ) ( ) | ( ) (
1 1
1 1
A P A R P A P A R P R P + =
, 1 ) | (
1
= A R P
2
3 2
1
2 ) ( ) | ( p p A A P A R P = =
Using (2-3) - (2-6),
We can also derive (2-8) in a different manner. Since any
event and its compliment form a trivial partition, we can
always write
But and
(2-8)
(2-9)
and using these in (2-9) we obtain
, 3 3 ) 1 )( 2 ( ) (
3 2 2
p p p p p p p R P + = + =
(2-10)
which agrees with (2-8).
PILLAI
6
Note that the events A
1
, A
2
, A
3
do not form a partition, since
they are not mutually exclusive. Obviously any two or all
three switches can be closed (or open) simultaneously.
Moreover,
b. We need From Bayes theorem
Because of the symmetry of the switches, we also have
. 1 ) ( ) ( ) (
3 2 1
+ + A P A P A P
). | (
1
R A P
.
3 3
2 2
3 3
) 1 )( 2 (
) (
) ( ) | (
) | (
3 2
2
3 2
2
1 1
1
p p p
p p
p p p
p p p
R P
A P A R P
R A P
+
+
=
+

= =
(2-11)
). | ( ) | ( ) | (
3 2 1
R A P R A P R A P = =
PILLAI
7
Repeated Trials
Consider two independent experiments with associated
probability models (
1
, F
1
, P
1
) and (
2
, F
2
, P
2
). Let
1
,
2
represent elementary events. A joint
performance of the two experiments produces an
elementary events = (, ). How to characterize an
appropriate probability to this combined event ?
Towards this, consider the Cartesian product space
=
1

2
generated from
1
and
2
such that if

1
and
2
, then every in is an ordered pair
of the form = (, ). To arrive at a probability model
we need to define the combined trio (, F, P).
PILLAI
8
Suppose AF
1
and B F
2
. Then A B is the set of all pairs
(, ), where A and B. Any such subset of
appears to be a legitimate event for the combined
experiment. Let F denote the field composed of all such
subsets A B together with their unions and compliments.
In this combined experiment, the probabilities of the events
A
2
and
1
B are such that
Moreover, the events A
2
and
1
B are independent for
any A F
1
and B F
2
. Since
we conclude using (2-12) that
). ( ) ( ), ( ) (
2 1 1 2
B P B P A P A P = = (2-12)
(2-13)
, ) ( ) (
1 2
B A B A =
PILLAI
9
) ( ) ( ) ( ) ( ) (
2 1 1 2
B P A P B P A P B A P = =
for all A F
1
and B F
2
. The assignment in (2-14) extends
to a unique probability measure on the sets in F
and defines the combined trio (, F, P).
Generalization: Given n experiments and
their associated let
represent their Cartesian product whose elementary events
are the ordered n-tuples where Events
in this combined space are of the form
where and their unions an intersections.
) (
2 1
P P P
, , , ,
2 1 n

, 1 , and n i P F
i i
=
, , , ,
2 1 n
.
i i

n
A A A
2 1
n
=
2 1
(2-15)
(2-16)
,
i i
F A
PILLAI
(2-14)
10
If all these n experiments are independent, and is the
probability of the event in then as before
Example 2.2: An event A has probability p of occurring in a
single trial. Find the probability that A occurs exactly k times,
k n in n trials.
Solution: Let (, F, P) be the probability model for a single
trial. The outcome of n experiments is an n-tuple
where every and as in (2-15).
The event A occurs at trial # i , if Suppose A occurs
exactly k times in .
) (
i i
A P
i
A
i
F
). ( ) ( ) ( ) (
2 2 1 1 2 1 n n
A P A P A P A A A P =
(2-17)
{ } , , , ,
0 2 1
=
n
(2-18)

i
=
0
. A
i

PILLAI
11
Then k of the belong to A, say and the
remaining are contained in its compliment in
Using (2-17), the probability of occurrence of such an is
given by
However the k occurrences of A can occur in any particular
location inside . Let represent all such
events in which A occurs exactly k times. Then
But, all these s are mutually exclusive, and equiprobable.
i
, , , ,
2 1 k
i i i

k n
. A
. ) ( ) ( ) ( ) ( ) ( ) (
}) ({ }) ({ }) ({ }) ({ }) , , , , , ({ ) (
2 1 2 1
0
k n k
k n k
i i i i i i i i
q p A P A P A P A P A P A P
P P P P P P
n k n k
= =
= =

(2-19)
N
, , ,
2 1

. trials" in times exactly occurs "
2 1 N
n k A =
(2-20)
i
PILLAI
12
Thus
where we have used (2-19). Recall that, starting with n
possible choices, the first object can be chosen n different
ways, and for every such choice the second one in
ways, and the kth one ways, and this gives the
total choices for k objects out of n to be
But, this includes the choices among the k objects that
are indistinguishable for identical objects. As a result
, ) ( ) (
) trials" in times exactly occurs ("
0
1
0
k n k
N
i
i
q Np NP P
n k A P
=
= = =

(2-21)
(2-22)
) 1 ( n
) 1 ( + k n
). 1 ( ) 1 ( + k n n n
! k
|
|
.
|
\
|
=
=
+
=
k
n
k k n
n
k
k n n n
N
! )! (
!
!
) 1 ( ) 1 (
PILLAI
13
, , , 2 , 1 , 0 ,
) trials" in times exactly occurs (" ) (
n k q p
k
n
n k A P k P
k n k
n
=
|
|
.
|
\
|
=
=
(2-23)
) ( A =
) ( A =
represents the number of combinations, or choices of n
identical objects taken k at a time. Using (2-22) in (2-21),
we get
a formula, due to Bernoulli.
Independent repeated experiments of this nature, where the
outcome is either a success or a failure
are characterized as Bernoulli trials, and the probability of
k successes in n trials is given by (2-23), where p
represents the probability of success in any one trial.
PILLAI
14
Example 2.3: Toss a coin n times. Obtain the probability of
getting k heads in n trials ?
Solution: We may identify head with success (A) and
let In that case (2-23) gives the desired
probability.
Example 2.4: Consider rolling a fair die eight times. Find
the probability that either 3 or 4 shows up five times ?
Solution: In this case we can identify
Thus
and the desired probability is given by (2-23) with
and Notice that this is similar to a biased coin
problem.
). (H P p =
{ } { }. } 4 or 3 either { success" "
4 3
f f A = = =
,
3
1
6
1
6
1
) ( ) ( ) (
4 3
= + = + = f P f P A P
5 , 8 = = k n
. 3 / 1 = p
PILLAI
15
Bernoulli trial: consists of repeated independent and
identical experiments each of which has only two outcomes A
or with and The probability of exactly
k occurrences of A in n such trials is given by (2-23).
Let
Since the number of occurrences of A in n trials must be an
integer either must
occur in such an experiment. Thus
But are mutually exclusive. Thus
A
. ) ( q A P = , ) ( p A P =
, , , 2 , 1 , 0 n k =
. trials" in s occurrence exactly " n k X
k
= (2-24)
n
X X X X or or or or
2 1 0

. 1 ) (
1 0
=
n
X X X P (2-25)
j i
X X ,
PILLAI
16

= =
|
|
.
|
\
|
= =
n
k
n
k
k n k
k n
q p
k
n
X P X X X P
0 0
1 0
. ) ( ) (
(2-26)
From the relation
(2-26) equals and it agrees with (2-25).
For a given n and p what is the most likely value of k ?
From Fig.2.2, the most probable value of k is that number
which maximizes in (2-23). To obtain this value,
consider the ratio
, ) (
0
k n k
n
k
n
b a
k
n
b a

=
|
|
.
|
\
|
= + (2-27)
, 1 ) ( = +
n
q p
) ( k P
n
. 2 / 1 , 12 = = p n
Fig. 2.2
) (k P
n
k
PILLAI
17
.
1 !
! )! (
)! 1 ( )! 1 (
!
) (
) 1 (
1 1
p
q
k n
k
q p n
k k n
k k n
q p n
k P
k P
k n k
k n k
n
n
+
=
+
=
+
(2-28)
Thus if or
Thus as a function of k increases until
if it is an integer, or the largest integer less than
and (2-29) represents the most likely number of successes
(or heads) in n trials.
Example 2. 5: In a Bernoulli experiment with n trials, find
the probability that the number of occurrences of A is
between and
), 1 ( ) ( k P k P
n n
p k n p k ) 1 ( ) 1 ( + . ) 1 ( p n k +
) ( k P
n
p n k ) 1 ( + =
, ) 1 ( p n +
(2-29)
1
k .
2
k
max
k
PILLAI
18
Solution: With as defined in (2-24),
clearly they are mutually exclusive events. Thus
Example 2. 6: Suppose 5,000 components are ordered. The
probability that a part is defective equals 0.1. What is the
probability that the total number of defective parts does not
exceed 400 ?
Solution: Let
, , , 2 , 1 , 0 , n i X
i
=
. ) ( ) (
) " and between is of s Occurrence ("
2
1
2
1
2 1 1
1
2 1

=
=
+
|
|
.
|
\
|
= = =
k
k k
k n k
k
k k
k k k k
q p
k
n
X P X X X P
k k A P
(2-30)
". components 5,000 among defective are parts " k Y
k
=
PILLAI
19
Using (2-30), the desired probability is given by
Equation (2-31) has too many terms to compute. Clearly,
we need a technique to compute the above term in a more
efficient manner.
From (2-29), the most likely number of successes in n
trials, satisfy
or
. ) 9 . 0 ( ) 1 . 0 (
5000

) ( ) (
5000
400
0
400
0
400 1 0
k k
k
k
k
k
Y P Y Y Y P
=
=
|
|
.
|
\
|
=
=
(2-31)
max
k
p n k p n ) 1 ( 1 ) 1 (
max
+ +
(2-32)
,
max
n
p
p
n
k
n
q
p +
(2-33)
PILLAI
20
so that
From (2-34), as the ratio of the most probable
number of successes (A) to the total number of trials in a
Bernoulli experiment tends to p, the probability of
occurrence of A in a single trial. Notice that (2-34) connects
the results of an actual experiment ( ) to the axiomatic
definition of p. In this context, it is possible to obtain a more
general result as follows:
Bernoullis theorem: Let A denote an event whose
probability of occurrence in a single trial is p. If k denotes
the number of occurrences of A in n independent trials, then
. lim p
n
k
m
n
=

(2-34)
, n
n k
m
/
.
2
n
pq
p
n
k
P <
|
|
.
|
\
|
)
`
>
(2-35)
PILLAI
21
Equation (2-35) states that the frequency definition of
probability of an event and its axiomatic definition ( p)
can be made compatible to any degree of accuracy.
Proof: To prove Bernoullis theorem, we need two identities.
Note that with as in (2-23), direct computation gives
Proceeding in a similar manner, it can be shown that
n
k
) ( k P
n
. ) (
! )! 1 (
)! 1 (
! )! 1 (
!

)! 1 ( )! (
!
! )! (
!
) (
1
1
1
0
1 1
1
0
1
1
1 0
np q p np
q p
i i n
n
np q p
i i n
n
q p
k k n
n
q p
k k n
n
k k P k
n
i n i
n
i
i n i
n
i
k n k
n
k
n
k
k n k
n
k
n
= + =

=

=

=
=
+
=

(2-36)
.
)! 1 ( )! (
!

)! 2 ( )! (
!
)! 1 ( )! (
!
) (
2 2
1
2 1 0
2
npq p n q p
k k n
n
q p
k k n
n
q p
k k n
n
k k P k
k n k
n
k
k n k
n
k
n
k
k n k
n
k
n
+ =

+

=

=
= =

(2-37)
PILLAI
22
Returning to (2-35), note that
which in turn is equivalent to
Using (2-36)-(2-37), the left side of (2-39) can be expanded
to give
Alternatively, the left side of (2-39) can be expressed as
, ) ( to equivalent is
2 2 2
n np k p
n
k
> >
(2-38)
. ) ( ) ( ) (
2 2
0
2 2
0
2
n k P n k P np k
n
n
k
n
n
k
= >

= =
(2-39)
. 2
) ( 2 ) ( ) ( ) (
2 2 2 2
2 2
0 0
2
0
2
npq p n np np npq p n
p n k P k np k P k k P np k
n
n
k
n
n
k
n
n
k
= + + =
+ =

= = =
(2-40)
n
{ }.
) ( ) ( ) (
) ( ) ( ) ( ) ( ) ( ) (
2 2
2 2 2
2 2
0
2

n np k P n
k P n k P np k
k P np k k P np k k P np k
n
n np k
n
n np k
n
n np k
n
n np k
n
k
> =
>
+ =

> >
> =
(2-41)
PILLAI
23
Using (2-40) in (2-41), we get the desired result
Note that for a given can be made arbitrarily
small by letting n become large. Thus for very large n, we
can make the fractional occurrence (relative frequency)
of the event A as close to the actual probability p of the
event A in a single trial. Thus the theorem states that the
probability of event A from the axiomatic framework can be
computed from the relative frequency definition quite
accurately, provided the number of experiments are large
enough. Since is the most likely value of k in n trials,
from the above discussion, as the plots of tends
to concentrate more and more around in (2-32).
.
2
n
pq
p
n
k
P <
|
|
.
|
\
|
)
`
>
(2-42)
2
/ , 0 n pq >
n
k
max
k
, n
) ( k P
n
max
k
PILLAI
24
Next we present an example that illustrates the usefulness of
simple textbook examples to practical problems of interest:
Example 2.7 : Day-trading strategy : A box contains n
randomly numbered balls (not 1 through n but arbitrary
numbers including numbers greater than n). Suppose
a fraction of those balls are initially
drawn one by one with replacement while noting the numbers
on those balls. The drawing is allowed to continue until
a ball is drawn with a number larger than the first m numbers.
Determine the fraction p to be initially drawn, so as to
maximize the probability of drawing the largest among the
n numbers using this strategy.
Solution: Let drawn ball has the largest
number among all n balls, and the largest among the
say ; 1 m np p = <
st
k
k X ) 1 ( + =
PILLAI
25
first k balls is in the group of first m balls, k > m. (2.43)
Note that is of the form
where
A = largest among the first k balls is in the group of first
m balls drawn
and
B = (k+1)
st
ball has the largest number among all n balls.
Notice that A and B are independent events, and hence
Where m = np represents the fraction of balls to be initially
drawn. This gives
P (selected ball has the largest number among all balls)
k
X , A B
.
1

1
) ( ) ( ) (
k
p
k
np
n k
m
n
B P A P X P
k
= = = =
(2-44)
1 1

1 1
( ) ln
ln .
n n
n
n
k
np
np
k m k m
P X p p p k
k k
p p

= =
= = =
=

(2-45)
26
Maximization of the desired probability in (2-45) with
respect to p gives
or
From (2-45), the maximum value for the desired probability
of drawing the largest number equals 0.3679 also.
Interestingly the above strategy can be used to play the
stock market.
Suppose one gets into the market and decides to stay
up to 100 days. The stock values fluctuate day by day, and
the important question is when to get out?
According to the above strategy, one should get out
0 ) ln 1 ( ) ln ( = + = p p p
dp
d
1
0.3679. p e
= (2-46)
PILLAI
27
at the first opportunity after 37 days, when the stock value
exceeds the maximum among the first 37 days. In that case
the probability of hitting the top value over 100 days for the
stock is also about 37%. Of course, the above argument
assumes that the stock values over the period of interest are
randomly fluctuating without exhibiting any other trend.
Interestingly, such is the case if we consider shorter time
frames such as inter-day trading.
In summary if one must day-trade, then a possible strategy
might be to get in at 9.30 AM, and get out any time after
12 noon (9.30 AM + 0.3679 6.5 hrs = 11.54 AM to be
precise) at the first peak that exceeds the peak value between
9.30 AM and 12 noon. In that case chances are about 37%
that one hits the absolute top value for that day! (disclaimer :
Trade at your own risk)
PILLAI
28
PILLAI
We conclude this lecture with a variation of the Game of
craps discussed in Example 3-16, Text.
Example 2.8: Game of craps using biased dice:
From Example 3.16, Text, the probability of
winning the game of craps is 0.492929 for the player.
Thus the game is slightly advantageous to the house. This
conclusion of course assumes that the two dice in question
are perfect cubes. Suppose that is not the case.
Let us assume that the two dice are slightly loaded in such
a manner so that the faces 1, 2 and 3 appear with probability
and faces 4, 5 and 6 appear with probability
for each dice. If T represents the combined
total for the two dice (following Text notation), we get
1
6
0 , + >
1
6

29
2
4
2 2
5
2 2
6
7
1
6
1 1
36 6
1 1
36 6
{ 4} {(1, 3), (2, 2), (1, 3)} 3( )
{ 5} {(1, 4), (2, 3), (3, 2), (4,1)} 2( ) 2( )
{ 6} {(1, 5), (2, 4), (3, 3), (4, 2), (5,1)} 4( ) ( )
{ 7} {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6,1
p P T P
p P T P
p P T P
p P T P

= = = =
= = = = +
= = = = +
= = =
2
2 2
8
2 2
9
2
10
11
1
36
1 1
36 6
1 1
36 6
1
6
1
6
)} 6( )
{ 8} {(2, 6), (3, 5), (4, 4), (5, 3), (6, 2)} 4( ) ( )
{ 9} {(3, 6), (4, 5), (5, 4), (6, 3)} 2( ) 2( )
{ 10} {(4, 6), (5, 5), (6, 4)} 3( )
{ 11} {(5, 6), (6, 5)} 2(
p P T P
p P T P
p P T P
p P T P
=
= = = = + +
= = = = + +
= = = = +
= = = = +
2
) .
(Note that (1,3) above represents the event the first dice
shows face 1, and the second dice shows face 3 etc.)
For we get the following Table:
0.01, =
PILLAI
30
0.0624 0.0936 0.1178 0.1419 0.1661 0.1353 0.1044 0.0706 p
k
= P{T = k}
11 10 9 8 7 6 5 4 T = k
PILLAI
This gives the probability of win on the first throw to be
(use (3-56), Text)
and the probability of win by throwing a carry-over to be
(use (3-58)-(3-59), Text)
Thus
Although perfect dice gives rise to an unfavorable game,
1
( 7) ( 11) 0.2285 P P T P T = = + = =
(2-47)

2
10
2
4
7
7
0.2717
k
k
k
k
p
p p
P
=
+
= =
(2-48)
1 2
{winning the game} 0.5002 P P P = + =
(2-49)
31
PILLAI
a slight loading of the dice turns the fortunes around in
favor of the player! (Not an exciting conclusion as far as
the casinos are concerned).
Even if we let the two dice to have different loading
factors and (for the situation described above), similar
conclusions do follow. For example,
gives (show this)
Once again the game is in favor of the player!
Although the advantage is very modest in each play, from
Bernoullis theorem the cumulative effect can be quite
significant when a large number of game are played.
All the more reason for the casinos to keep the dice in
perfect shape.
1
1 2
0.01 and 0.005 = =
{winning the game} 0.5015. P =
(2-50)
32
In summary, small chance variations in each game
of craps can lead to significant counter-intuitive changes
when a large number of games are played. What appears
to be a favorable game for the house may indeed become
an unfavorable game, and when played repeatedly can lead
to unpleasant outcomes.
1
3. Random Variables
Let (, F, P) be a probability model for an experiment,
and X a function that maps every to a unique
point the set of real numbers. Since the outcome
is not certain, so is the value Thus if B is some
subset of R, we may want to determine the probability of
. To determine this probability, we can look at
the set that contains all that maps
into B under the function X.
,
, R x

. ) ( x X =
B X ) (
=

) (
1
B X A

R
) ( X
x
A
B
Fig. 3.1
PILLAI
2
Obviously, if the set also belongs to the
associated field F, then it is an event and the probability of
A is well defined; in that case we can say
However, may not always belong to F for all B, thus
creating difficulties. The notion of random variable (r.v)
makes sure that the inverse mapping always results in an
event so that we are able to determine the probability for
any
Random Variable (r.v): A finite single valued function
that maps the set of all experimental outcomes into the
set of real numbers R is said to be a r.v, if the set
is an event for every x in R.
) (
1
B X A

=
)). ( ( " ) ( " event the of y Probabilit
1
B X P B X

=
(3-1)
) (
1
B X

. R B
) ( X
{ } ) ( | x X
) ( F
PILLAI
3
Alternatively X is said to be a r.v, if where B
represents semi-definite intervals of the form
and all other sets that can be constructed from these sets by
performing the set operations of union, intersection and
negation any number of times. The Borel collection B of
such subsets of R is the smallest -field of subsets of R that
includes all semi-infinite intervals of the above form. Thus
if X is a r.v, then
is an event for every x. What about
Are they also events ? In fact with since
and are events, is an event and
hence is also an event.
} { a x <
a b >
{ } { }? , a X b X a = <
{ } b X
{ } { } } { b X a b X a X < = >
{ } { }

a X a X
c
> =
{ } { } ) ( | x X x X =
F B X
) (
1
} { a X
(3-2)
PILLAI
4
Thus, is an event for every n.
Consequently
is also an event. All events have well defined probability.
Thus the probability of the event must
depend on x. Denote
The role of the subscript X in (3-4) is only to identify the
actual r.v. is said to the Probability Distribution
Function (PDF) associated with the r.v X.
)
`
<
1
a X
n
a
=
= =
)
`
<
1
} {
1

n
a X a X
n
a
{ } ) ( | x X
{ } . 0 ) ( ) ( | = x F x X P
X

(3-4)
) (x F
X
(3-3)
PILLAI
5
Distribution Function: Note that a distribution function
g(x) is nondecreasing, right-continuous and satisfies
i.e., if g(x) is a distribution function, then
(i)
(ii) if then
and
(iii) for all x.
We need to show that defined in (3-4) satisfies all
properties in (3-6). In fact, for any r.v X,
, 0 ) ( , 1 ) ( = = + g g
, 0 ) ( , 1 ) ( = = + g g
,
2 1
x x <
), ( ) (
2 1
x g x g
), ( ) ( x g x g =
+
(3-6)
) (x F
X
(3-5)
PILLAI
6
{ } 1 ) ( ) ( | ) ( = = + = + P X P F
X

{ } . 0 ) ( ) ( | ) ( = = = P X P F
X
(i)
and
(ii) If then the subset
Consequently the event
since implies As a result
implying that the probability distribution function is
nonnegative and monotone nondecreasing.
(iii) Let and consider the event
since
,
2 1
x x <
). , ( ) , (
2 1
x x
{ } { }, ) ( | ) ( |
2 1
x X x X
1
) ( x X
. ) (
2
x X
( ) ( ) ), ( ) ( ) ( ) (
2 2 1 1
x F x X P x X P x F
X X
= =
(3-9)

(3-7)
(3-8)
,
1 2 1
x x x x x
n n
< < < < <

{ }. ) ( |
k k
x X x A < =
(3-10)
{ } { } { }, ) ( ) ( ) (
k k
x X x X x X x = < (3-11)
PILLAI
7
using mutually exclusive property of events we get
But and hence
Thus
But the right limit of x, and hence
i.e., is right-continuous, justifying all properties of a
distribution function.
( ) ). ( ) ( ) ( ) ( x F x F x X x P A P
X k X k k
= < =
(3-12)
,
1 1

+

k k k
A A A
. 0 ) ( lim hence and lim
1
= = =

=

k
k
k
k k
k
A P A A
(3-13)
. 0 ) ( ) ( lim ) ( lim = =

x F x F A P
X k X
k
k
k
), ( ) ( x F x F
X X
=
+
) (x F
X
(3-14)
, lim
+

= x x
k
k
PILLAI
8
Additional Properties of a PDF
(iv) If for some then
This follows, since implies
is the null set, and for any will be a subset
of the null set.
(v)
We have and since the two events
are mutually exclusive, (16) follows.
(vi)
The events and are mutually
exclusive and their union represents the event
0 ) (
0
= x F
X
,
0
x . , 0 ) (
0
x x x F
X
=
(3-15)
( ) 0 ) ( ) (
0 0
= = x X P x F
X

{ }
0
) ( x X
{ } ) ( ,
0
x X x x
{ } ). ( 1 ) ( x F x X P
X
= >
(3-16)
{ } { } , ) ( ) ( = > x X x X
{ } . ), ( ) ( ) (
1 2 1 2 2 1
x x x F x F x X x P
X X
> = <
(3-17)
} ) ( {
2 1
x X x <
{ } ) (
1
x X
{ }. ) (
2
x X
PILLAI
9
(vii)
Let and From (3-17)
or
According to (3-14), the limit of as
from the right always exists and equals However the
left limit value need not equal Thus
need not be continuous from the left. At a discontinuity
point of the distribution, the left and right limits are
different, and from (3-20)
( ) ). ( ) ( ) (

= = x F x F x X P
X X
(3-18)
, 0 ,
1
> = x x
.
2
x x =
{ } ), ( lim ) ( ) ( lim
0 0

= <

x F x F x X x P
X X
(3-19)
{ } ). ( ) ( ) (
= = x F x F x X P
X X
(3-20)
), (
0
+
x F
X
) ( x F
X
0
x x
). (
0
x F
X
) (
0
x F
X
). (
0
x F
X
) ( x F
X
{ } . 0 ) ( ) ( ) (
0 0 0
> = =

x F x F x X P
X X
(3-21)
PILLAI
10
Thus the only discontinuities of a distribution function
are of the jump type, and occur at points where (3-21) is
satisfied. These points can always be enumerated as a
sequence, and moreover they are at most countable in
number.
Example 3.1: X is a r.v such that Find
Solution: For so that and
for so that (Fig.3.2)
Example 3.2: Toss a coin. Suppose the r.v X is
such that Find
) ( x F
X
0
x
. , ) ( = c X
). (x F
X
{ } { }, ) ( , = < x X c x , 0 ) ( = x F
X
{ }. , T H =
. 1 ) ( , 0 ) ( = = H X T X
) (x F
X
x
c
1
Fig. 3.2
). (x F
X
. 1 ) ( = x F
X
{ } , ) ( , = > x X c x
PILLAI
11
Solution: For so that
X is said to be a continuous-type r.v if its distribution
function is continuous. In that case for
all x, and from (3-21) we get
If is constant except for a finite number of jump
discontinuities(piece-wise constant; step-type), then X is
said to be a discrete-type r.v. If is such a discontinuity
point, then from (3-21)
{ } { }, ) ( , 0 = < x X x
. 0 ) ( = x F
X
{ } { } { }
{ } { } 3.3) (Fig. . 1 ) ( that so , , ) ( , 1
, 1 ) ( that so , ) ( , 1 0
= = =
= = = <
x F T H x X x
p T P x F T x X x
X
X
{ } ). ( ) (

= = =
i X i X i i
x F x F x X P p
(3-22)
) (x F
X
) ( ) ( x F x F
X X
=
{ } . 0 = = x X P
) ( x F
X
) (x F
X
x
Fig.3.3
1
q
1
i
x
PILLAI
12
From Fig.3.2, at a point of discontinuity we get
and from Fig.3.3,
Example:3.3 A fair coin is tossed twice, and let the r.v X
represent the number of heads. Find
Solution: In this case and
{ } . 1 0 1 ) ( ) ( = = = =

c F c F c X P
X X
{ } . 0 ) 0 ( ) 0 ( 0 q q F F X P
X X
= = = =

{ }, , , , TT TH HT HH =
). ( x F
X
. 0 ) ( , 1 ) ( , 1 ) ( , 2 ) ( = = = = TT X TH X HT X HH X
{ }
{ } { } { }
{ } { } { }
{ } 3.4) (Fig. . 1 ) ( ) ( , 2
,
4
3
, , ) ( , , ) ( , 2 1
,
4
1
) ( ) ( ) ( ) ( , 1 0
, 0 ) ( ) ( , 0
= =
= = = <
= = = = <
= = <
x F x X x
TH HT TT P x F TH HT TT x X x
T P T P TT P x F TT x X x
x F x X x
X
X
X
X

PILLAI
13
From Fig.3.4,
Probability density function (p.d.f)
The derivative of the distribution function is called
the probability density function of the r.v X. Thus
Since
from the monotone-nondecreasing nature of
{ } . 2 / 1 4 / 1 4 / 3 ) 1 ( ) 1 ( 1 = = = =

X X
F F X P
) (x F
X
) ( x f
X
.

) (
) (
dx
x dF
x f
X
X
=
, 0
) ( ) (
lim

) (
0

+
=

x
x F x x F
dx
x dF
X X
x
X
(3-23)
(3-24)
), ( x F
X
) (x F
X
x
Fig. 3.4
1
4 / 1
1
4 / 3
2
PILLAI
14
it follows that for all x. will be a
continuous function, if X is a continuous type r.v.
However, if X is a discrete type r.v as in (3-22), then its
p.d.f has the general form (Fig. 3.5)
where represent the jump-discontinuity points in
As Fig. 3.5 shows represents a collection of positive
discrete masses, and it is known as the probability mass
function (p.m.f ) in the discrete case. From (3-23), we
also obtain by integration
Since (3-26) yields
0 ) ( x f
X
) ( x f
X
, ) ( ) (

=
i
i i X
x x p x f
). ( x F
X
(3-25)
. ) ( ) ( du u f x F
x
x X

=
(3-26)
, 1 ) ( = +
X
F
, 1 ) ( =
+

dx x f
x
(3-27)
i
x
) (x f
X
x
i
x
i
p
Fig. 3.5
) ( x f
X
PILLAI
15
which justifies its name as the density function. Further,
from (3-26), we also get (Fig. 3.6b)
Thus the area under in the interval represents
the probability in (3-28).
Often, r.vs are referred by their specific density functions -
both in the continuous and discrete cases - and in what
follows we shall list a number of them in each category.
{ } . ) ( ) ( ) ( ) (
2
1
1 2 2 1
dx x f x F x F x X x P
x
x
X X X

= = < (3-28)
Fig. 3.6
) ( x f
X
) , (
2 1
x x
) (x f
X
(b)
x
1
x
2
x
) (x F
X
x
1
(a)
1
x
2
x
PILLAI
16
Continuous-type random variables
1. Normal (Gaussian): X is said to be normal or Gaussian
r.v, if
This is a bell shaped curve, symmetric around the
parameter and its distribution function is given by
where is often tabulated. Since
depends on two parameters and the notation
will be used to represent (3-29).
.
2
1
) (
2 2
2 / ) (
2

=
x
X
e x f (3-29)
,
,
2
1
) (
2 2
2 / ) (
2

|
.
|
\
|

= =
x
y
X
x
G dy e x F

(3-30)
dy e x G
y
x
2 /
2
2
1
) (

, (
2
N X
) (x f
X
) (x f
X
,
2
)
x
Fig. 3.7
PILLAI
17
2. Uniform: if (Fig. 3.8) , ), , ( b a b a U X <
=
otherwise. 0,
, ,
1
) (
b x a
a b
x f
X
(3.31)
3. Exponential: if (Fig. 3.9)
) ( X

=

otherwise. 0,
, 0 ,
1
) (
/
x e
x f
x
X
(3-32)
) (x f
X
x
Fig. 3.9
) (x f
X
x
a
b
a b
1
Fig. 3.8
PILLAI
18
4. Gamma: if (Fig. 3.10)
If an integer
5. Beta: if (Fig. 3.11)
where the Beta function is defined as
) , ( G X ) 0 , 0 ( > >
otherwise. 0,
, 0 ,
) (
) (
/
1
x e
x
x f
x
X

(3-33)
n = )!. 1 ( ) ( = n n
) , ( b a X ) 0 , 0 ( > > b a
< <
=

otherwise. 0,
, 1 0 , ) 1 (
) , (
1
) (
1 1
x x x
b a
x f
b a
X

(3-34)
) , ( b a

=
1
0
1 1
. ) 1 ( ) , ( du u u b a
b a
(3-35)
x
) ( x f
X
x
Fig. 3.11
1
0
) ( x f
X
Fig. 3.10
PILLAI
19
6. Chi-Square: if (Fig. 3.12)
Note that is the same as Gamma
7. Rayleigh: if (Fig. 3.13)
8. Nakagami m distribution:
), (
2
n X
) (
2
n
). 2 , 2 / (n

=

otherwise. 0,
, 0 ,
) (
2 2
2 /
2
x e
x
x f
x
X
(3-36)
(3-37)
, ) (
2
R X
x
) ( x f
X
Fig. 3.12
) ( x f
X
x
Fig. 3.13
PILLAI
=

otherwise. 0,
, 0 ,
) 2 / ( 2
1
) (
2 / 1 2 /
2 /
x e x
n
x f
x n
n
X
2
2 1 /
2
, 0
( )
( )
0 otherwise
X
m
m mx
m
x e x
f x
m

| |

|
=

\ .
(3-38)
20
9. Cauchy: if (Fig. 3.14)
10. Laplace: (Fig. 3.15)
11. Students t-distribution with n degrees of freedom (Fig 3.16)
( )
. , 1
) 2 / (
2 / ) 1 (
) (
2 / ) 1 (
2
+ < <
|
|
.
|
\
|
+
+
=
+
t
n
t
n n
n
t f
n
T
, ) , ( C X
. ,
) (
/
) (
2 2
+ < <
+
= x
x
x f
X

. ,
2
1
) (
/ | |
+ < < =

x e x f
x
X

) ( x f
X
x
Fig. 3.14
(3-41)
(3-40)
(3-39)
x
) ( x f
X
Fig. 3.15
t
( )
T
f t
Fig. 3.16
PILLAI
21
12. Fishers F-distribution
/ 2 / 2 / 2 1
( ) / 2

{( ) / 2}
, 0
( ) ( / 2) ( / 2) ( )
0 otherwise
m n m
m n
z
m n m n z
z
f z m n n mz
+
+

= +
(3-42)
PILLAI
22
Discrete-type random variables
1. Bernoulli: X takes the values (0,1), and
2. Binomial: if (Fig. 3.17)
3. Poisson: if (Fig. 3.18)
. ) 1 ( , ) 0 ( p X P q X P = = = =
(3-43)
), , ( p n B X
. , , 2 , 1 , 0 , ) ( n k q p
k
n
k X P
k n k
=
|
|
.
|
\
|
= =

(3-44)
, ) ( P X
. , , 2 , 1 , 0 ,
!
) ( = = =

k
k
e k X P
k
(3-45)
k
) ( k X P =
Fig. 3.17
1
2
n
) ( k X P =
Fig. 3.18
PILLAI
23
4. Hypergeometric:
5. Geometric: if
6. Negative Binomial: ~ if
7. Discrete-Uniform:
We conclude this lecture with a general distribution due
PILLAI
(3-49)
(3-48)
(3-47)
. , , 2 , 1 ,
1
) ( N k
N
k X P = = =
), , ( p r NB X
1
( ) , , 1, .
1
r k r
k
P X k p q k r r
r

| |
= = = +
|
\ .

. 1 , , , 2 , 1 , 0 , ) ( p q k pq k X P
k
= = = =
) ( p g X

, max(0, ) min( , ) ( )
m N m
k n k
N
n
m n N k m n P X k
| | | |
| |
| |
\ . \ .
| |
|
|
\ .
+ = =
(3-46)
24
PILLAI
to Polya that includes both binomial and hypergeometric as
special cases.
Polyas distribution: A box contains a white balls and b black
balls. A ball is drawn at random, and it is replaced along with
c balls of the same color. If X represents the number of white
balls drawn in n such draws, find the
probability mass function of X.
Solution: Consider the specific sequence of draws where k
white balls are first drawn, followed by n k black balls. The
probability of drawing k successive white balls is given by
Similarly the probability of drawing k white balls
0, 1, 2, , , X n =

2 ( 1)

2 ( 1)
W
a a c a c a k c
p
a b a b c a b c a b k c
+ + +
=
+ + + + + + +
(3-50)
25
PILLAI
followed by n k black balls is given by
Interestingly, p
k
in (3-51) also represents the probability of
drawing k white balls and (n k) black balls in any other
specific order (i.e., The same set of numerator and
denominator terms in (3-51) contribute to all other sequences
as well.) But there are such distinct mutually exclusive
sequences and summing over all of them, we obtain the Polya
distribution (probability of getting k white balls in n draws)
to be
n
k
| |
|
|
\ .

1 1
0 0
( )
( 1)

( 1) ( 1)
.
k w
k n k
i j
b jc
a ic
a b ic a b j k c
b b c b n k c
p p
a b kc a b k c a b n c

= =
+
+
+ + + + +
+ +
=
+ + + + + + +
=

(3-51)
1 1
0 0
( )
( ) , 0,1, 2, , .
k n k
k
i j
n n
k k
b jc
a ic
a b ic a b j k c
P X k p k n

| | | |
| |
| |
\ . \ .
= =
+
+
+ + + + +
= = = =

(3-52)
26
PILLAI
Both binomial distribution as well as the hypergeometric
distribution are special cases of (3-52).
For example if draws are done with replacement, then c = 0
and (3-52) simplifies to the binomial distribution
where
Similarly if the draws are conducted without replacement,
Then c = 1 in (3-52), and it gives
( ) , 0,1, 2, ,
k n k
n
k
P X k p q k n
| |

|
|
\ .
= = =
(3-53)
, 1 .
a b
p q p
a b a b
= = =
+ +

( 1)( 2) ( 1) ( 1) ( 1)
( )
( )( 1) ( 1) ( ) ( 1)
n
k
a a a a k b b b n k
P X k
a b a b a b k a b k a b n
| |
|
|
\ .
+ + +
= =
+ + + + + + +

27
which represents the hypergeometric distribution. Finally
c = +1 gives (replacements are doubled)
we shall refer to (3-55) as Polyas +1 distribution. the general
Polya distribution in (3-52) has been used to study the spread
of contagious diseases (epidemic modeling).

1 1

1

( 1)! ( 1)! ( 1)! ( 1)!
( )
( 1)! ( 1)! ( 1)! ( 1)!
= .
n
k
a k b n k
k n k
a b n
n
a k a b b n k a b k
P X k
a a b k b a b n
| |
|
|
\ .
| | | |
| |
| |
\ . \ .
| |
|
|
\ .
+ +
+ +
+ + + + + +
= =
+ + + +
(3-55)

! !( )! !( )!
( )
!( )! ( )!( )! ( )!( )!
a
b
k
n k
a b
n
n a a b k b a b n
P X k
k n k a k a b b n k a b k
| |
| |
|
|
|
|
\ .
\ .
| |
|
|
\ .
+
+ +
= = =
+ + +
(3-54)
PILLAI
1
4. Binomial Random Variable Approximations
and
Conditional Probability Density Functions
Let X represent a Binomial r.v as in (3-42). Then from (2-30)
Since the binomial coefficient grows quite
rapidly with n, it is difficult to compute (4-1) for large n. In
this context, two approximations are extremely useful.
4.1 The Normal Approximation (Demoivre-Laplace
Theorem) Suppose with p held fixed. Then for k in
the neighborhood of np, we can approximate
( )

=
=
|
|
.
|
\
|
= =
2
1
2
1
. ) (
2 1
k
k k
k n k
k
k k
n
q p
k
n
k P k X k P
(4-1)
! )! (
!
k k n
n
k
n
=
|
|
.
|
\
|
n
npq
PILLAI
2
(4-2)
.
2
1
2 / ) (
2
npq np k k n k
e
npq
q p
k
n

|
|
.
|
\
|
Thus if and in (4-1) are within or around the

neighborhood of the interval we can
approximate the summation in (4-1) by an integration. In
that case (4-1) reduces to
where
We can express (4-3) in terms of the normalized integral
that has been tabulated extensively (See Table 4.1).
1
k
2
k
( ), , npq np npq np +
( ) ,
2
1
2
1
2 /

2 / ) (

2 1
2
2
1
2
2
1
dy e dx e
npq
k X k P
y
x
x
npq np x
k
k

= =

(4-3)
) (
2
1
) (

0
2 /
2
x erf dy e x erf
x
y
= =

(4-4)
. ,
2
2
1
1
npq
np k
x
npq
np k
x

=
=
PILLAI
3
For example, if and are both positive ,we obtain
Example 4.1: A fair coin is tossed 5,000 times. Find the
probability that the number of heads is between 2,475 to
2,525.
Solution: We need Here n is large so
that we can use the normal approximation. In this case
so that and Since
and the approximation is valid for
and Thus
Here
( ) ). ( ) (
1 2 2 1
x erf x erf k X k P =
1
x
2
x
). 525 , 2 475 , 2 ( X P
(4-5)
,
2
1
= p
500 , 2 = np . 35 npq
, 465 , 2 = npq np
, 535 , 2 = + npq np
475 , 2
1
= k
. 525 , 2
2
= k
( )

=
2
1
2
.
2
1
2 /
2 1
x
x
y
dy e k X k P
.
7
5
,
7
5
2
2
1
1
=
= =
=
npq
np k
x
npq
np k
x
PILLAI
4
2
1
) (
2
1
) ( erf

0
2 /
2
= =

x G dy e x
x
y
x erf(x) x erf(x) x erf(x) x erf(x)

0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
0.50
0.55
0.60
0.65
0.70
0.75
0.01994
0.03983
0.05962
0.07926
0.09871
0.11791
0.13683
0.15542
0.17364
0.19146
0.20884
0.22575
0.24215
0.25804
0.27337
0.80
0.85
0.90
0.95
1.00
1.05
1.10
1.15
1.20
1.25
1.30
1.35
1.40
1.45
1.50
0.28814
0.30234
0.31594
0.32894
0.34134
0.35314
0.36433
0.37493
0.38493
0.39435
0.40320
0.41149
0.41924
0.42647
0.43319
1.55
1.60
1.65
1.70
1.75
1.80
1.85
1.90
1.95
2.00
2.05
2.10
2.15
2.20
2.25
0.43943
0.44520
0.45053
0.45543
0.45994
0.46407
0.46784
0.47128
0.47441
0.47726
0.47982
0.48214
0.48422
0.48610
0.48778
2.30
2.35
2.40
2.45
2.50
2.55
2.60
2.65
2.70
2.75
2.80
2.85
2.90
2.95
3.00
0.48928
0.49061
0.49180
0.49286
0.49379
0.49461
0.49534
0.49597
0.49653
0.49702
0.49744
0.49781
0.49813
0.49841
0.49865
Table 4.1
PILLAI
5
Since from Fig. 4.1(b), the above probability is given
by
where we have used Table 4.1
4.2. The Poisson Approximation
As we have mentioned earlier, for large n, the Gaussian
approximation of a binomial r.v is valid only if p is fixed,
i.e., only if and what if np is small, or if it
does not increase with n?
, 0
1
< x
( )
, 516 . 0
7
5
erf 2
|) (| erf ) ( erf ) ( erf ) ( erf 525 , 2 475 , 2
1 2 1 2
=
|
.
|
\
|
=
+ = = x x x x X P
1 >> np . 1 >> npq
( ). 258 . 0 ) 7 . 0 ( erf =
Fig. 4.1
x
(a)
1
x
2
x
2 /
2
2
1
x
e

0 , 0
2 1
> > x x
x
(b)
1
x
2
x
2 /
2
2
1
x
e

0 , 0
2 1
> < x x
PILLAI
6
Obviously that is the case if, for example, as
such that is a fixed number.
Many random phenomena in nature in fact follow this
pattern. Total number of calls on a telephone line, claims in
an insurance company etc. tend to follow this type of
behavior. Consider random arrivals such as telephone calls
over a line. Let n represent the total number of calls in the
interval From our experience, as we have
so that we may assume Consider a small interval of
duration as in Fig. 4.2. If there is only a single call
coming in, the probability p of that single call occurring in
that interval must depend on its relative size with respect to
T.
0 p
, n
= np
T ). , 0 ( T
n
. T n =
"
1 2
n
0
T
Fig. 4.2
PILLAI
7
Hence we may assume Note that as
However in this case is a constant,
and the normal approximation is invalid here.
Suppose the interval in Fig. 4.2 is of interest to us. A call
inside that interval is a success (H), whereas one outside is
a failure (T ). This is equivalent to the coin tossing
situation, and hence the probability of obtaining k
calls (in any order) in an interval of duration is given by
the binomial p.m.f. Thus
and here as such that It is easy to
obtain an excellent approximation to (4-6) in that situation.
To see this, rewrite (4-6) as
.
T
p

=
0 p
. T
= =
=
T
T np
) ( k P
n
, ) 1 (
! )! (
!
) (
k n k
n
p p
k k n
n
k P

=
(4-6)
0 , p n
. = np
PILLAI
8
.
) / 1 (
) / 1 (
!
1
1
2
1
1
1
) / 1 (
!
) ( ) 1 ( ) 1 (
) (
k
n k
k n
k
k
n
n
n
k n
k
n n
n np
k
np
n
k n n n
k P
|
.
|
\
|

|
.
|
\
|

|
.
|
\
|
=
+
=

"
"
(4-7)
Thus
,
!
) ( lim
, 0 ,

=
= e
k
k P
k
n
np p n
(4-8)
since the finite products as well
as tend to unity as and
The right side of (4-8) represents the Poisson p.m.f and the
Poisson approximation to the binomial r.v is valid in
situations where the binomial r.v parameters n and p
diverge to two extremes such that their
product np is a constant.
|
.
|
\
|

|
.
|
\
|

|
.
|
\
|

n
k
n n
1
1
2
1
1
1 "
k
n
|
.
|
\
|

1
, n
. 1 lim

=
|
.
|
\
|
e
n
n
n
) 0 , ( p n
PILLAI
9
Example 4.2: Winning a Lottery: Suppose two million
lottery tickets are issued with 100 winning tickets among
them. (a) If a person purchases 100 tickets, what is the
probability of winning? (b) How many tickets should one
buy to be 95% confident of having a winning ticket?
Solution: The probability of buying a winning ticket
Here and the number of winning tickets X in the n
purchased tickets has an approximate Poisson distribution
with parameter Thus
and (a) Probability of winning
. 10 5
10 2
100
tickets of no. Total
tickets winning of No.
5
6

=
= = p
, 100 = n
. 005 . 0 10 5 100
5
= = =

np
,
!
) (
k
e k X P
k

= =
. 005 . 0 1 ) 0 ( 1 ) 1 ( = = = =

e X P X P
PILLAI
10
(b) In this case we need
But or Thus one needs to
buy about 60,000 tickets to be 95% confident of having a
winning ticket!
Example 4.3: A space craft has 100,000 components
The probability of any one component being defective
is The mission will be in danger if five or
more components become defective. Find the probability of
such an event.
Solution: Here n is large and p is small, and hence Poisson
approximation is valid. Thus
and the desired probability is given by
. 95 . 0 ) 1 ( X P
. 3 20 ln implies 95 . 0 1 ) 1 ( = =

e X P
3 10 5
5
= =

n np . 000 , 60 n
( ) n
). 0 ( 10 2
5

p
, 2 10 2 000 , 100
5
= = =

np
PILLAI
11
. 052 . 0
3
2
3
4
2 2 1 1
!
1
!
1 ) 4 ( 1 ) 5 (
2
4
0
2
4
0
=
|
.
|
\
|
+ + + + =
= = =

e
k
e
k
e X P X P
k
k k
k

Conditional Probability Density Function

For any two events A and B, we have defined the conditional
probability of A given B as
Noting that the probability distribution function is
given by
we may define the conditional distribution of the r.v X given
the event B as
. 0 ) ( ,
) (
) (
) | (
= B P
B P
B A P
B A P
(4-9)
) ( x F
X
{ }, ) ( ) ( x X P x F
X
=
(4-10)
PILLAI
12
{ }
( ) { }
.
) (
) (
| ) ( ) | (
B P
B x X P
B x X P B x F
X

= =

Thus the definition of the conditional distribution depends

on conditional probability, and since it obeys all probability
axioms, it follows that the conditional distribution has the
same properties as any distribution function. In particular
Further
( ) { }
( ) { }
. 0
) (
) (
) (
) (
) | (
, 1
) (
) (
) (
) (
) | (
= =

=
= =
+
= +
B P
P
B P
B X P
B F
B P
B P
B P
B X P
B F
X
X

(4-12)
(4-11)
( ) { }
), | ( ) | (
) (
) (
) | ) ( (
1 2
2 1
2 1
B x F B x F
B P
B x X x P
B x X x P
X X
=
<
= <

(4-13)
PILLAI
13
Since for
The conditional density function is the derivative of the
conditional distribution function. Thus
and proceeding as in (3-26) we obtain
Using (4-16), we can also rewrite (4-13) as
,
1 2
x x
( ) ( ) ( ). ) ( ) ( ) (
2 1 1 2
x X x x X x X < =
(4-14)
,

) | (
) | (
dx
B x dF
B x f
X
X
=
(4-15)

=
x
X X
du B u f B x F

. ) | ( ) | ( (4-16)
( )

= <
2
1

2 1
. ) | ( | ) (
x
x
X
dx B x f B x X x P
(4-17)
PILLAI
14
Example 4.4: Refer to example 3.2. Toss a coin and X(T)=0,
X(H)=1. Suppose Determine
Solution: From Example 3.2, has the following form.
We need for all x.
For so that
and
). | ( B x F
X
) ( x F
X
) | ( B x F
X
{ } , ) ( , 0 = < x X x ( ) { } , ) ( = B x X
. 0 ) | ( = B x F
X
}. {H B =
) (x F
X
x
(a)
q
1
1
) (x F
X
x
(b)
1
1
Fig. 4.3
PILLAI
15
For so that
For and
(see Fig. 4.3(b)).
Example 4.5: Given suppose Find
Solution: We will first determine From (4-11) and
B as given above, we have
{ } { }, ) ( , 1 0 T x X x = <
( ) { } { } { } = = H T B x X ) (
. 0 ) | ( and = B x F
X
{ } , ) ( , 1 = x X x
( ) { } { } } { ) ( B B B x X = = 1
) (
) (
) | ( and = =
B P
B P
B x F
X
), ( x F
X
{ }. ) ( a X B =
). | ( B x f
X
). | ( B x F
X
( ) ( ) { }
( )
. ) | (
a X P
a X x X P
B x F
X

=
(4-18)
PILLAI
16
( ) ( ) ( ) x X a X x X a x = < ,
( )
( )
.
) (
) (
) | (
a F
x F
a X P
x X P
B x F
X
X
X
=
=
( ) ( ) ) ( , a X a X x X a x = . 1 ) | ( = B x F
X
<
=
, , 1
, ,
) (
) (
) | (
a x
a x
a F
x F
B x F
X
X
X
For so that
For so that
Thus
and hence
(4-19)
(4-20)
<
= =
otherwise. , 0
, ,
) (
) (
) | ( ) | (
a x
a F
x f
B x F
dx
d
B x f
X
X
X X
(4-21)
PILLAI
17
) | ( B x F
X
) ( x F
X
x
a
1
(a)
Fig. 4.4
) | ( B x f
X
) ( x f
X
x
a
(b)
Example 4.6: Let B represent the event with
For a given determine and
Solution:
For we have and
hence
{ } b X a < ) (
. a b >
), ( x F
X
) | ( B x F
X
). | ( B x f
X
{ }
( ) ( ) { }
( )
( ) ( ) { }
.
) ( ) (
) ( ) (

) (
) ( ) (
| ) ( ) | (
a F b F
b X a x X P
b X a P
b X a x X P
B x X P B x F
X X
X
<
=
<
<
= =

(4-22)
, a x < { } { } , ) ( ) ( = < b X a x X
. 0 ) | ( = B x F
X
(4-23)
PILLAI
18
For we have
and hence
For we have
so that
Using (4-23)-(4-25), we get (see Fig. 4.5)
{ } { } } ) ( { ) ( ) ( x X a b X a x X < = <
( )
.
) ( ) (
) ( ) (
) ( ) (
) (
) | (
a F b F
a F x F
a F b F
x X a P
B x F
X X
X X
X X
X
<
=

, b x a <
, b x { } { } { } b X a b X a x X < = < ) ( ) ( ) (
. 1 ) | ( = B x F
X
(4-24)
(4-25)
<
=
otherwise. , 0
, ,
) ( ) (
) (
) | (
b x a
a F b F
x f
B x f
X X
X
X
(4-26)
) | ( B x f
X
) ( x f
X
x
Fig. 4.5
a
b
PILLAI
19
We can use the conditional p.d.f together with the Bayes
theorem to update our a-priori knowledge about the
probability of events in presence of new observations.
Ideally, any new information should be used to update our
knowledge. As we see in the next example, conditional p.d.f
together with Bayes theorem allow systematic updating. For
any two events A and B, Bayes theorem gives
Let so that (4-27) becomes (see (4-13) and
(4-17))
.
) (
) ( ) | (
) | (
B P
A P A B P
B A P =
(4-27)
{ }
2 1
) ( x X x B < =
{ }
( )
( )
). (
) (
) | (
) (
) ( ) (
) | ( ) | (

) (
) ( | ) ) ( (
) ) ( ( |
2
1
2
1
1 2
1 2
2 1
2 1
2 1
A P
dx x f
dx A x f
A P
x F x F
A x F A x F
x X x P
A P A x X x P
x X x A P
x
x
X
x
x
X
X X
X X
=
<
<
= <
(4-28)
PILLAI
20
Further, let so that in the limit as
or
From (4-30), we also get
or
and using this in (4-30), we get the desired result
, 0 , ,
2 1
> + = = x x x x , 0
{ } ( ) ). (
) (
) | (
) ( | ) ) ( ( | lim
0
A P
x f
A x f
x X A P x X x A P
X
X
= = = + <
(4-29)
.
) (
) ( ) | (
) | (
|
A P
x f x X A P
A x f
X
A X
=
=
(4-30)
(4-31)
, ) ( ) | ( ) | ( ) (
1
dx x f x X A P dx A x f A P
X X

+

+

= =

dx x f x X A P A P
X
) ( ) | ( ) (

+

= =
(4-32)
.
) ( ) | (
) ( ) | (
) | (
|
+

=
=
=
dx x f x X A P
x f x X A P
A x f
X
X
A X
(4-33)
PILLAI
21
To illustrate the usefulness of this formulation, let us
reexamine the coin tossing problem.
Example 4.7: Let represent the probability of
obtaining a head in a toss. For a given coin, a-priori p can
possess any value in the interval (0,1). In the absence of any
additional information, we may assume the a-priori p.d.f
to be a uniform distribution in that interval. Now suppose
we actually perform an experiment of tossing the coin n
times, and k heads are observed. This is new information.
How can we update
Solution: Let A= k heads in n specific tosses. Since these
tosses result in a specific sequence,
) (H P p =
) ( p f
P
? ) ( p f
P
) ( p f
P
p
0
1
Fig.4.6
, ) | (
k n k
q p p P A P

= = (4-34)
PILLAI
22
and using (4-32) we get
The a-posteriori p.d.f represents the updated
information given the event A, and from (4-30)
Notice that the a-posteriori p.d.f of p in (4-36) is not a
uniform distribution, but a beta distribution. We can use this
a-posteriori p.d.f to make further predictions, For example,
in the light of the above experiment, what can we say about
the probability of a head occurring in the next (n+1)th toss?
.
)! 1 (
! )! (
) 1 ( ) ( ) | ( ) (
1
0
1
0
+
= = = =

n
k k n
dp p p dp p f p P A P A P
k n k
P
(4-35)
) | ( A p f
P
). , ( 1 0 ,
! )! (
)! 1 (

) (
) ( ) | (
) | (
|
k n p q p
k k n
n
A P
p f p P A P
A p f
k n k
P
A P
< <
+
=
=
=
(4-36)
) | (
|
A p f
A P
p
Fig. 4.7
1
0
PILLAI
23
Let B= head occurring in the (n+1)th toss, given that k
heads have occurred in n previous tosses.
Clearly and from (4-32)
Notice that unlike (4-32), we have used the a-posteriori p.d.f
in (4-37) to reflect our knowledge about the experiment
already performed. Using (4-36) in (4-37), we get
Thus, if n =10, and k = 6, then
which is better than p = 0.5.
, ) | ( p p P B P = =
= =
1
0
. ) | ( ) | ( ) ( dp A p f p P B P B P
P
(4-37)
+
+
=
+
=

1
0
.
2
1
! )! (
)! 1 (
) (
n
k
dp q p
k k n
n
p B P
k n k
(4-38)
, 58 . 0
12
7
) ( = = B P
PILLAI
24
To summarize, if the probability of an event X is unknown,
one should make noncommittal judgement about its a-priori
probability density function Usually the uniform
distribution is a reasonable assumption in the absence of any
other information. Then experimental results (A) are
obtained, and out knowledge about X must be updated
reflecting this new information. Bayes rule helps to obtain
the a-posteriori p.d.f of X given A. From that point on, this
a-posteriori p.d.f should be used to make further
predictions and calculations.
). (x f
X
) | (
|
A x f
A X
PILLAI
1
5. Functions of a Random Variable
Let X be a r.v defined on the model and suppose
g(x) is a function of the variable x. Define
Is Y necessarily a r.v? If so what is its PDF pdf
Clearly if Y is a r.v, then for every Borel set B, the set of
for which must belong to F. Given that X is a r.v,
this is assured if is also a Borel set, i.e., if g(x) is a
Borel function. In that case if X is a r.v, so is Y, and for
every Borel set B
), , , ( P F
). ( X g Y =
(5-1)
), ( y F
Y
? ) ( y f
Y
B Y ) (
) (
1
B g
)). ( ( ) (
1
B g X P B Y P

=
(5-2)
PILLAI
2
In particular
Thus the distribution function as well of the density
function of Y can be determined in terms of that of X. To
obtain the distribution function of Y, we must determine the
Borel set on the x-axis such that for every
given y, and the probability of that set. At this point, we
shall consider some of the following functions to illustrate
the technical details.
( ) ( ). ] , ( ) ( )) ( ( ) ) ( ( ) (
1
y g X P y X g P y Y P y F
Y
= = =

(5-3)
) ( ) (
1
y g X

b aX +
) ( X g Y =
2
X
| | X
X
) ( | | x U X
X
e
X log
PILLAI
3
Example 5.1:
Solution: Suppose
and
On the other hand if then
and hence
b aX Y + =
(5-4)
. 0 > a
( ) ( ) . ) ( ) ( ) ( ) (
|
.
|
\
|

=
|
.
|
\
|

= + = =
a
b y
F
a
b y
X P y b aX P y Y P y F
X Y

(5-5)
.
1
) (
|
.
|
\
|

=
a
b y
f
a
y f
X Y
(5-6)
, 0 < a
( ) ( )
, 1
) ( ) ( ) ( ) (
|
.
|
\
|

=
|
.
|
\
|

> = + = =
a
b y
F
a
b y
X P y b aX P y Y P y F
X
Y

(5-7)
.
1
) (
|
.
|
\
|

=
a
b y
f
a
y f
X Y
(5-8)
PILLAI
4
From (5-6) and (5-8), we obtain (for all a)
Example 5.2:
If then the event and hence
For from Fig. 5.1, the event
is equivalent to
.
| |
1
) (
|
.
|
\
|

=
a
b y
f
a
y f
X Y
(5-9)
.
2
X Y =
( ) ( ). ) ( ) ( ) (
2
y X P y Y P y F
Y
= =
(5-10)
(5-11)
, 0 < y
{ } , ) (
2
= y X
. 0 , 0 ) ( < = y y F
Y
(5-12)
, 0 > y
} ) ( { } ) ( {
2
y X y Y =
}. ) ( {
2 1
x X x <
2
X Y =
X
y
2
x
1
x
Fig. 5.1
PILLAI
5
Hence
By direct differentiation, we get
If represents an even function, then (5-14) reduces to
In particular if so that
( )

. otherwise , 0
, 0 , ) ( ) (
2
1
) (
> +
=
y y f y f
y
y f
X X
Y
(5-14)
) (x f
X
( ) ). (
1
) ( y U y f
y
y f
X Y
=
(5-15)
), 1 , 0 ( N X
( )
. 0 ), ( ) (
) ( ) ( ) ( ) (
1 2 2 1
> =
= < =
y y F y F
x F x F x X x P y F
X X
X X Y

(5-13)
,
2
1
) (
2 /
2
x
X
e x f

=
(5-16)
PILLAI
6
and substituting this into (5-14) or (5-15), we obtain the
p.d.f of to be
On comparing this with (3-36), we notice that (5-17)
represents a Chi-square r.v with n = 1, since
Thus, if X is a Gaussian r.v with then
represents a Chi-square r.v with one degree of freedom
(n = 1).
Example 5.3: Let
2
X Y =
). (
2
1
) (
2 /
y U e
y
y f
y
Y

=
(5-17)
. ) 2 / 1 ( =
, 0 =
2
X Y =
+
<
>
= =
. ,
, , 0
, ,
) (
c X c X
c X c
c X c X
X g Y
PILLAI
7
In this case
For we have and so that
Similarly if and so that
Thus
). ( ) ( ) ) ( ( ) 0 ( c F c F c X c P Y P
X X
= < = = (5-18)
, 0 > y
, c x >
( )
( ) . 0 ), ( ) (
) ) ( ( ) ( ) (
> + = + =
= =
y c y F c y X P
y c X P y Y P y F
X
Y

(5-19)
, 0 < y
, c x <
( )
( ) . 0 ), ( ) (
) ) ( ( ) ( ) (
< = =
+ = =
y c y F c y X P
y c X P y Y P y F
X
Y

(5-20)
<
+
=
. 0 ), (
, 0 ), (
) (
y c y f
y c y f
y f
X
X
Y
c X Y = ) ( ) (
c X Y + = ) ( ) (
(5-21)
) ( X g
X
c
c
(a) (b)
x
) (x F
X
Fig. 5.2
(c)
) ( x F
Y
y
PILLAI
8
Example 5.4: Half-wave rectifier
In this case
and for since
Thus
>
= =
. 0 , 0
, 0 ,
) ( ); (
x
x x
x g X g Y
(5-22)
Y
X
Fig. 5.3
). 0 ( ) 0 ) ( ( ) 0 (
X
F X P Y P = = = (5-23)
, 0 > y
, X Y =
( ) ( ) ). ( ) ( ) ( ) ( y F y X P y Y P y F
X Y
= = =
(5-24)
). ( ) (
, 0 , 0
, 0 ), (
) ( y U y f
y
y y f
y f
X
X
Y
=
>
=
(5-25)
PILLAI
9
Note: As a general approach, given first sketch
the graph and determine the range space of y.
Suppose is the range space of
Then clearly for and for so
that can be nonzero only in Next, determine
whether there are discontinuities in the range space of y. If
so evaluate at these discontinuities. In the
continuous region of y, use the basic approach
and determine appropriate events in terms of the r.v X for
every y. Finally, we must have for and
obtain
, 0 ) ( , = < y F a y
Y
, 1 ) ( , = > y F b y
Y
) ( y F
Y
. b y a < <
( )
i
y Y P = ) (
( ) y X g P y F
Y
= )) ( ( ) (
) ( y F
Y
, + y
), ( X g Y =
), ( x g y =
b y a < <
). ( x g y =
. in

) (
) ( b y a
dy
y dF
y f
Y
Y
< < =
PILLAI
10
However, if is a continuous function, it is easy to
establish a direct procedure to obtain A continuos
function g(x) with nonzero at all but a finite number
of points, has only a finite number of maxima and minima,
and it eventually becomes monotonic as Consider a
specific y on the y-axis, and a positive increment as
shown in Fig. 5.4
) (x g
. | | x
y
) (X g Y =
). (y f
Y
), (X g Y =
for where is of continuous type.
) ( g
x
) (x g
1
x
1 1
x x +
y
2 2
x x +
3 3
x x +
3
x
y y +
2
x
Fig. 5.4
) (y f
Y
PILLAI
11
Using (3-28) we can write
But the event can be expressed in terms
of as well. To see this, referring back to Fig. 5.4, we
notice that the equation has three solutions
(for the specific y chosen there). As a result
when the r.v X could be in any one of the
three mutually exclusive intervals
Hence the probability of the event in (5-26) is the sum of
the probability of the above three events, i.e.,
{ } . ) ( ) ( ) ( y y f du u f y y Y y P
y y
y
Y Y
= + <

+
{ } y y Y y + < ) (
) ( X
) (x g y =
3 2 1
, , x x x
{ }, ) ( y y Y y + <
. } ) ( { or } ) ( { }, ) ( {
3 3 3 2 2 2 1 1 1
x x X x x X x x x x X x + < < + + <
(5-26)
{ }
. } ) ( { } ) ( {
} ) ( { ) (
3 3 3 2 2 2
1 1 1
x x X x P x X x x P
x x X x P y y Y y P
+ < + < + +
+ < = + <

(5-27)
PILLAI
12
For small making use of the approximation in (5-26),
we get
In this case, and so that (5-28) can be
rewritten as
and as (5-29) can be expressed as
The summation index i in (5-30) depends on y, and for every
y the equation must be solved to obtain the total
number of solutions at every y, and the actual solutions
all in terms of y.
, ,
i
x y
. ) ( ) )( ( ) ( ) (
3 3 2 2 1 1
x x f x x f x x f y y f
X X X Y
+ + =
(5-28)
0 , 0
2 1
< > x x , 0
3
> x
) (
/
1 | |
) ( ) (
i X
i
i
i
i
i X Y
x f
x y y
x
x f y f

=
=
(5-29)
, 0 y

= =
i
i X
i
i X
i
x
Y
x f
x g
x f
dx dy
y f
i
). (
) (
1
) (
/
1
) (
(5-30)
) (
i
x g y =
" , ,
2 1
x x
PILLAI
13
For example, if then for all and
represent the two solutions for each y. Notice that the
solutions are all in terms of y so that the right side of (5-30)
is only a function of y. Referring back to the example
(Example 5.2) here for each there are two solutions
given by and ( for ).
Moreover
and using (5-30) we get
which agrees with (5-14).
,
2
X Y =
y x y = >
1
, 0 y x + =
2
i
x
2
X Y =
, 0 > y
y x =
1
.
2
y x + = 0 ) ( = y f
Y
0 < y
y
dx
dy
x
dx
dy
i
x x
2 that so 2 = =
=
( )
> +
=
, otherwise , 0
, 0 , ) ( ) (
2
1
) (
y y f y f
y
y f
X X
Y (5-31)
2
X Y =
X
y
2
x
1
x
Fig. 5.5
PILLAI
14
Example 5.5: Find
Solution: Here for every y, is the only solution, and
and substituting this into (5-30), we obtain
In particular, suppose X is a Cauchy r.v as in (3-38) with
parameter so that
In that case from (5-33), has the p.d.f
.
1
X
Y =
). ( y f
Y
y x / 1
1
=
,
/ 1
1
that so
1
2
2 2
1
y
y dx
dy
x dx
dy
x x
= = =
=
.
1 1
) (
2 |
|
.
|
\
|
=
y
f
y
y f
X Y
(5-33)
(5-32)
. ,
/
) (
2 2
+ < <
+
= x
x
x f
X

(5-34)
X Y / 1 =
. ,
) / 1 (
/ ) / 1 (
) / 1 (
/ 1
) (
2 2 2 2 2
+ < <
+
=
+
= y
y y y
y f
Y

(5-35)
PILLAI
15
But (5-35) represents the p.d.f of a Cauchy r.v with
parameter Thus if then
Example 5.6: Suppose and
Determine
Solution: Since X has zero probability of falling outside the
interval has zero probability of falling outside
the interval Clearly outside this interval. For
any from Fig.5.6(b), the equation has an
infinite number of solutions where
is the principal solution. Moreover, using the symmetry we
also get etc. Further,
so that
. / 1 ), ( C X ). / 1 ( / 1 C X
, 0 , / 2 ) (
2
< < = x x x f
X
. sin X Y =
). ( y f
Y
x y sin ), , 0 ( =
). 1 , 0 ( 0 ) ( = y f
Y
, 1 0 < < y
x y sin =
, , , , ,
3 2 1
" " x x x
1 2
x x =
2 2
1 sin 1 cos y x x
dx
dy
= = =
. 1
2
y
dx
dy
i
x x
=
=
y x
1
1
sin
=
PILLAI
16
) ( x f
X
x
x
x y sin =
1
x
1
x
2
x
3
x
y
(a)
(b)
Fig. 5.6
3
x
Using this in (5-30), we obtain for
But from Fig. 5.6(a), in this case
(Except for and the rest are all zeros).
, 1 0 < < y
. ) (
1
1
) (
2
i X
i
Y
x f
y
y f

+
=
=
(5-36)
0 ) ( ) ( ) (
4 3 1
= = = =
" x f x f x f
X X X
) (
1
x f
X
) (
2
x f
X
PILLAI
17
Thus (Fig. 5.7)
Example 5.7: Let where
Determine
Solution: As x moves from y moves from
From Fig.5.8(b), the function is one-to-one
for For any y, is the principal
solution. Further
( )
< <
+
=
|
.
|
\
|
+
= +
=
otherwise. , 0
, 1 0 ,
1
2
1
) ( 2

2 2
1
1
) ( ) (
1
1
) (
2
2 2
1 1
2
2
2
1
2
2 1
2
y
y
y
x x
x x
y
x f x f
y
y f
X X Y

(5-37)
) ( y f
Y
y
Fig. 5.7
2
1
X Y tan =
( ). 2 / , 2 / U X
). ( y f
Y
( ), 2 / , 2 /
( ). , +
X Y tan =
. 2 / 2 / < < x
y x
1
1
tan
=
2 2 2
1 tan 1 sec

tan
y x x
dx
x d
dx
dy
+ = + = = =
PILLAI
18
so that using (5-30)
which represents a Cauchy density function with parameter
equal to unity (Fig. 5.9).
, ,
1
/ 1
) (
| / |
1
) (
2
1
1
+ < <
+
= =
=
y
y
x f
dx dy
y f
X
x x
Y

(5-38)
) (x f
X
x
2 /
2 /
(a)
Fig. 5.9
2
1
1
) (
y
y f
Y
+
=
y
(b)
2 /
2 /
x
y
1
x
Fig. 5.8
x y tan =
PILLAI
19
Functions of a discrete-type r.v
Suppose X is a discrete-type r.v with
and Clearly Y is also of discrete-type, and
when and for those
Example 5.8: Suppose so that
Define Find the p.m.f of Y.
Solution: X takes the values so that Y only
takes the value and
" " , , , , , ) (
2 1 i i i
x x x x p x X P = = =
(5-39)
). ( X g Y =
), ( ,
i i i
x g y x x = =
i
y
" " , , , , , ) ( ) (
2 1 i i i i
y y y y p x X P y Y P = = = = =
(5-40)
), ( P X
" , 2 , 1 , 0 ,
!
) ( = = =

k
k
e k X P
k
(5-41)
. 1
2
+ = X Y
" " , , , 2 , 1 , 0 k
" " , 1 , , 3 , 1
2
+ k
PILLAI
20
) ( ) 1 (
2
k X P k Y P = = + =
so that for 1
2
+ = k j
( ) . , 1 , , 3 , 1 ,
)! 1 (
1 ) (
2
1
" " + =
= = = =

k j
j
e j X P j Y P
j
(5-42)
PILLAI
1
6. Mean, Variance, Moments and
Characteristic Functions
For a r.v X, its p.d.f represents complete information
about it, and for any Borel set B on the x-axis
Note that represents very detailed information, and
quite often it is desirable to characterize the r.v in terms of
its average behavior. In this context, we will introduce two
parameters - mean and variance - that are universally used
to represent the overall properties of the r.v and its p.d.f.
( )

=
B
X
dx x f B X P . ) ( ) (
(6-1)
) (x f
X
) (x f
X
PILLAI
2
Mean or the Expected Value of a r.v X is defined as
If X is a discrete-type r.v, then using (3-25) we get
Mean represents the average (mean) value of the r.v in a
very large number of trials. For example if then
using (3-31) ,
is the midpoint of the interval (a,b).
+

= = =

. ) ( ) ( dx x f x X E X
X X
(6-2)
. ) (
) ( ) ( ) (
1

= = =
= = = =
i
i i
i
i i
i
i i i
i
i i X
x X P x p x
dx x x p x dx x x p x X E X

(6-3)
), , ( b a U X
(6-4)
+
=
=
b
a
b
a
b a
a b
a b x
a b
dx
a b
x
X E
2 ) ( 2 2
1
) (
2 2 2
PILLAI
3
On the other hand if X is exponential with parameter as in
(3-32), then
implying that the parameter in (3-32) represents the mean
value of the exponential r.v.
Similarly if X is Poisson with parameter as in (3-43),
using (6-3), we get
Thus the parameter in (3-43) also represents the mean of
the Poisson r.v.

= = =

0
/
0
, ) (

dy ye dx e
x
X E
y x
(6-5)
.
! )! 1 (

! !
) ( ) (
0 1
1 0 0

= = =
=
= = = =
=

e e
i
e
k
e
k
k e
k
ke k X kP X E
i
i
k
k
k
k
k
k
k
(6-6)
PILLAI
4
In a similar manner, if X is binomial as in (3-42), then its
mean is given by
Thus np represents the mean of the binomial r.v in (3-42).
For the normal r.v in (3-29),
. ) (
! )! 1 (
)! 1 (
)! 1 ( )! (
!

! )! (
!
) ( ) (
1 1
1
0 1
1 0 0
np q p np q p
i i n
n
np q p
k k n
n
q p
k k n
n
k q p
k
n
k k X kP X E
n i n i
n
i
k n k
n
k
k n k
n
k
k n k
n
k
n
k
= + =

=

=
=
|
|
.
|
\
|
= = =

= =

(6-7)
.
2
1
2
1

) (
2
1
2
1
) (
1

2 /
2
0

2 /
2

2 /
2

2 / ) (
2
2 2 2 2
2 2 2 2

= + =
+ = =

+

+

+

+

dy e dy ye
dy e y dx xe X E
y y
y x
(6-8)
PILLAI
5
Thus the first parameter in is infact the mean of
the Gaussian r.v X. Given suppose defines a
new r.v with p.d.f Then from the previous discussion,
the new r.v Y has a mean given by (see (6-2))
From (6-9), it appears that to determine we need to
determine However this is not the case if only is
the quantity of interest. Recall that for any y,
where represent the multiple solutions of the equation
But(6-10) can be rewritten as
) , (
2
N X
), ( x f X
X
) ( X g Y =
). ( y f
Y
Y
+

= =

. ) ( ) ( dy y f y Y E
Y Y
(6-9)
), (Y E
). ( y f
Y
) (Y E
0 > y
( ) ( ) ,
+ < = + <
i
i i i
x x X x P y y Y y P
(6-10)
i
x
). (
i
x g y =
, ) ( ) (
i
i
i X Y
x x f y y f =

(6-11)
PILLAI
6
where the terms form nonoverlapping intervals.
Hence
and hence as y covers the entire y-axis, the corresponding
xs are nonoverlapping, and they cover the entire x-axis.
Hence, in the limit as integrating both sides of (6-
12), we get the useful formula
In the discrete case, (6-13) reduces to
From (6-13)-(6-14), is not required to evaluate
for We can use (6-14) to determine the mean of
where X is a Poisson r.v. Using (3-43)
( )
i i i
x x x + ,
, ) ( ) ( ) ( ) (
i
i
i X i i
i
i X Y
x x f x g x x f y y y f y = =

(6-12)
( )

+

+

= = =

. ) ( ) ( ) ( ) ( ) ( dx x f x g dy y f y X g E Y E
X Y
(6-13)
). ( ) ( ) (
i
i
i
x X P x g Y E = =

(6-14)
) ( y f
Y
,
2
X Y =
) (Y E
). ( X g Y =
, 0 y
PILLAI
7
( )
( ) .
! )! 1 (

! ! !

!
) 1 (
)! 1 (

! !
) (
2
0
1
1
1 0 0
0
1
1
1
2
0
2
0
2 2

+ = + =
|
|
.
|
\
|
+ =
|
|
.
|
\
|
+
=
|
|
.
|
\
|
+ =
|
|
.
|
\
|
+ =
+ =
=
= = = =
=
+
=
+
=

e e e
e
m
e e
i
e
e
i
i e
i i
i e
i
i e
k
k e
k
k e
k
e k k X P k X E
m
m
i
i
i
i
i i
i i
i
i
k
k
k
k
k
k
k
(6-15)
In general, is known as the kth moment of r.v X.
Thus if its second moment is given by (6-15). , ) ( P X
( )
k
X E
PILLAI
8
Mean alone will not be able to truly represent the p.d.f of
any r.v. To illustrate this, consider the following scenario:
Consider two Gaussian r.vs and
Both of them have the same mean However, as
Fig. 6.1 shows, their p.d.fs are quite different. One is more
concentrated around the mean, whereas the other one
has a wider spread. Clearly, we need atleast an additional
parameter to measure this spread around the mean!
(0,1)
1
N X (0,10).
2
N X
. 0 =
Fig.6.1
) (
1
1
x f
X
1
x
1
2
= (a)
) (
2
2
x f
X
2
x
10
2
= (b)
) (
2
X
PILLAI
9
For a r.v X with mean represents the deviation of
the r.v from its mean. Since this deviation can be either
positive or negative, consider the quantity and its
average value represents the average mean
square deviation of X around its mean. Define
With and using (6-13) we get
is known as the variance of the r.v X, and its square
root is known as the standard deviation of
X. Note that the standard deviation represents the root mean
square spread of the r.v X around its mean
X ,
( ) ,
2
X
( ) ] [
2
X E
( ) . 0 ] [
2
2
> = X E
X
(6-16)
2
) ( ) ( = X X g
. 0 ) ( ) (

2 2
> =
+

dx x f x
X
X

(6-17)
2
X
2
) ( = X E
X
.
PILLAI
10
Expanding (6-17) and using the linearity of the integrals, we
get
Alternatively, we can use (6-18) to compute
Thus , for example, returning back to the Poisson r.v in (3-
43), using (6-6) and (6-15), we get
Thus for a Poisson r.v, mean and variance are both equal
to its parameter
( )
( ) ( ) | | . ) (
) ( 2 ) (
) ( 2 ) (
2
2
___
2 2 2 2
2

2

2 2 2
X X X E X E X E
dx x f x dx x f x
dx x f x x X Var
X X
X X
= = =
+ =
+ = =

+

+

+

(6-18)
.
2
X
( ) .
2 2
___
2 2
2
= + = = X X
X
(6-19)
.
PILLAI
11
To determine the variance of the normal r.v we
can use (6-16). Thus from (3-29)
To simplify (6-20), we can make use of the identity
for a normal p.d.f. This gives
Differentiating both sides of (6-21) with respect to we
get
or
( ) .
2
1
] ) [( ) (

2 / ) (
2
2
2
2 2
+

= = dx e x X E X Var
x

(6-20)
), , (
2
N

+

+

= =

2 / ) (
2

1
2
1
) (
2 2
dx e dx x f
x
X

+

=

2 / ) (
. 2
2 2

dx e
x
(6-21)
+

=

2 / ) (
3
2
2
) (
2 2

dx e
x
x
( ) ,
2
1
2

2 / ) (
2
2
2 2
+

dx e x
x
(6-22)
,
PILLAI
12
which represents the in (6-20). Thus for a normal r.v
as in (3-29)
and the second parameter in infact represents the
variance of the Gaussian r.v. As Fig. 6.1 shows the larger
the the larger the spread of the p.d.f around its mean.
Thus as the variance of a r.v tends to zero, it will begin to
concentrate more and more around the mean ultimately
behaving like a constant.
Moments: As remarked earlier, in general
are known as the moments of the r.v X, and
) , (
2
N
) ( X Var
2
) ( = X Var
(6-23)
,
1 ), (
___
= = n X E X m
n n
n
(6-24)
PILLAI
13
] ) [(
n
n
X E =
(6-25)
are known as the central moments of X. Clearly, the
mean and the variance It is easy to relate
and Infact
In general, the quantities
are known as the generalized moments of X about a, and
are known as the absolute moments of X.
,
1
m =
.
2
2
=
n
m
.
n
( ) . ) ( ) (
) ( ] ) [(
0 0
0
k n
k
n
k
k n k
n
k
k n k
n
k
n
n
m
k
n
X E
k
n
X
k
n
E X E
|
|
.
|
\
|
=
|
|
.
|
\
|
=
|
|
.
|
\
|
|
|
.
|
\
|
= =

(6-26)
] ) [(
n
a X E
(6-27)
] | [|
n
X E
(6-28)
PILLAI
14
For example, if then it can be shown that
Direct use of (6-2), (6-13) or (6-14) is often a tedious
procedure to compute the mean and variance, and in this
context, the notion of the characteristic function can be
quite helpful.
Characteristic Function
The characteristic function of a r.v X is defined as

=
even. , ) 1 ( 3 1
odd, , 0
) (
n n
n
X E
n
n
"
+ =

=
+
odd. ), 1 2 ( , / 2 ! 2
even, , ) 1 ( 3 1
) | (|
1 2
k n k
n n
X E
k k
n
n

"
(6-29)
(6-30)
), , 0 (
2
N X
PILLAI
15
Thus and for all
For discrete r.vs the characteristic function reduces to
Thus for example, if as in (3-43), then its
characteristic function is given by
Similarly, if X is a binomial r.v as in (3-42), its
characteristic function is given by
( )

+

= = . ) ( ) ( dx x f e e E
X
jx jX
X

(6-31)
, 1 ) 0 ( =
X
1 ) (
X
.
= =
k
jk
X
k X P e ). ( ) (

(6-32)
) ( P X
.
!
) (
!
) (
) 1 (
0 0

=

= = = =

j j
e e
k k
k j k
jk
X
e e e
k
e
e
k
e e
(6-33)
. ) ( ) ( ) (
0 0
n j
n
k
k n k j
n
k
k n k jk
X
q pe q pe
k
n
q p
k
n
e + =
|
|
.
|
\
|
=
|
|
.
|
\
|
=

=
(6-34)
PILLAI
16
To illustrate the usefulness of the characteristic function of a
r.v in computing its moments, first it is necessary to derive
the relationship between them. Towards this, from (6-31)
Taking the first derivative of (6-35) with respect to , and
letting it to be equal to zero, we get
Similarly, the second derivative of (6-35) gives
( )
.
!
) (
! 2
) (
) ( 1
!
) (
!
) (
) (
2
2
2
0 0
" " + + + + + =
=
(
= =

=
=
k
k
k
k
k
k
k
k
k
jX
X
k
X E
j
X E
j X jE
k
X E
j
k
X j
E e E

(6-35)
.
) ( 1
) ( or ) (
) (
0 0 = =

= =
X X
j
X E X jE
(6-36)
,
) ( 1
) (
0
2
2
2
2
=

X
j
X E
(6-37)
PILLAI
17
and repeating this procedure k times, we obtain the kth
moment of X to be
We can use (6-36)-(6-38) to compute the mean, variance
and other higher order moments of any r.v X. For example,
if then from (6-33)
so that from (6-36)
which agrees with (6-6). Differentiating (6-39) one more
time, we get
. 1 ,
) ( 1
) (
0

=
=
k
j
X E
k
X
k
k
k

(6-38)
,
) (

j e
X
je e e
j

(6-39)
, ) ( = X E
(6-40)
), ( P X
PILLAI
18
( ), ) (
) (
2 2
2
2

j e j e
X
e j e je e e
j j
+ =

(6-41)
so that from (6-37)
which again agrees with (6-15). Notice that compared to the
tedious calculations in (6-6) and (6-15), the efforts involved
in (6-39) and (6-41) are very minimal.
We can use the characteristic function of the binomial r.v
B(n, p) in (6-34) to obtain its variance. Direct differentiation
of (6-34) gives
so that from (6-36), as in (6-7).
, ) (
2 2
+ = X E
(6-42)
1
) (
) (

+ =

n j j
X
q pe jnpe

(6-43)
np X E = ) (
PILLAI
19
One more differentiation of (6-43) yields
and using (6-37), we obtain the second moment of the
binomial r.v to be
Together with (6-7), (6-18) and (6-45), we obtain the
variance of the binomial r.v to be
To obtain the characteristic function of the Gaussian r.v, we
can make use of (6-31). Thus if then
( )
2 2 1 2
2
2
) ( ) 1 ( ) (
) (

+ + + =

n j j n j j
X
q pe pe n q pe e np j

(6-44)
( ) . ) 1 ( 1 ) (
2 2 2
npq p n p n np X E + = + =
(6-45)
| | . ) ( ) (
2 2 2 2
2
2 2
npq p n npq p n X E X E
X
= + = =
(6-46)
), , (
2
N X
PILLAI
20
.
2
1

2
1

) that so (Let

2
1
2
1

) (Let
2
1
) (
) 2 / (

2 /
2
2 /

2 / ) )( (
2
2 2

) 2 ( 2 /
2

2 /
2

2 / ) (
2
2 2 2 2 2 2
2 2 2
2 2 2 2
2 2

+

+

+
+

+

+

= =
=
+ = =
= =
= =
j u j
j u j u j
j y y j y y j j
x x j
X
e du e e e
du e e
j u y u j y
dy e e dy e e e
y x dx e e
(6-47)
Notice that the characteristic function of a Gaussian r.v itself
has the Gaussian bell shape. Thus if then
and
), , 0 (
2
N X
,
2
1
) (
2 2
2 /
2

x
X
e x f

=
(6-48)
(6-49)
. ) (
2 /
2 2

= e
X
PILLAI
21
2 /
2 2

e
(b)
2 2
2 / x
e
x
(a)
Fig. 6.2
From Fig. 6.2, the reverse roles of in and are
noteworthy
In some cases, mean and variance may not exist. For
example, consider the Cauchy r.v defined in (3-38). With
clearly diverges to infinity. Similarly
2
) ( x f
X
) (
X
,
) / (
) (
2 2
x
x f
X
+
=

+

+

=
|
|
.
|
\
|
+
=
+
=

2 2
2

2 2
2
2
, 1 ) ( dx
x
dx
x
x
X E
(6-50)
. )
1
vs (
2
2
PILLAI
22
. ) (

2 2
+

+
= dx
x
x
X E

To compute (6-51), let us examine its one sided factor

With
indicating that the double sided integral in (6-51) does not
converge and is undefined. From (6-50)-(6-52), the mean
and variance of a Cauchy r.v are undefined.
We conclude this section with a bound that estimates the
dispersion of the r.v beyond a certain interval centered
around its mean. Since measures the dispersion of
(6-51)
.
0
2 2
+
+
dx
x
x
tan = x
,
2
cos log cos log
cos
) (cos

cos
sin
sec
sec
tan
2 /
0
2 /
0
2 /
0
2
0
2 /
0
2 2 2 2
= = = =
= =
+

d
d d dx
x
x
(6-52)
2
PILLAI
23
the r.v X around its mean , we expect this bound to
depend on as well.
Chebychev Inequality
Consider an interval of width 2 symmetrically centered
around its mean as in Fig. 6.3. What is the probability that
X falls outside this interval? We need
2
( ) ? | | X P
(6-53)
2

+
X
Fig. 6.3
X
PILLAI
24
To compute this probability, we can start with the definition
of
From (6-54), we obtain the desired probability to be
and (6-55) is known as the chebychev inequality.
Interestingly, to compute the above probability bound the
knowledge of is not necessary. We only need the
variance of the r.v. In particular with in (6-55) we
obtain
) ( x f
X
( ) , | |
2
2
X P
(6-54)
| |
( ). | | ) ( ) (
) ( ) ( ) ( ) ( ) (
2
| |
2
| |
2
| |
2

2 2 2

= =

+

X P dx x f dx x f
dx x f x dx x f x X E
x
X
x
X
x
X X
(6-55)
.
2
,
2
k =
( ) .
1
| |
2
k
k X P
(6-56)
PILLAI
25
Thus with we get the probability of X being outside
the 3 interval around its mean to be 0.111 for any r.v.
Obviously this cannot be a tight bound as it includes all r.vs.
For example, in the case of a Gaussian r.v, from Table 4.1
which is much tighter than that given by (6-56). Chebychev
inequality always underestimates the exact probability.
( ) . 0027 . 0 3 | | = X P
(6-57)
, 3 = k
) 1 , 0 ( = =
PILLAI
26
Moment Identities :
Suppose X is a discrete random variable that takes
only nonnegative integer values. i.e.,
Then
similarly
" , 2 , 1 , 0 , 0 ) ( = = = k p k X P
k
+ =
=
= = =
= = = = >
0
0 0 1
1
0 1
) ( ) (
1 ) ( ) ( ) (
i
k k k i
i
k i
X E i X P i
i X P i X P k X P
2
)} 1 ( {
) (
2
) 1 (
) ( ) (
1
1
0 1 0
= =
= = = >

=
=
X X E
i X P
i i
k i X P k X P k
i
i
k i k
PILLAI
(6-58)
27
which gives
Equations (6-58) (6-59) are at times quite useful in
simplifying calculations. For example, referring to the
Birthday Pairing Problem [Example 2-20., Text], let X
represent the minimum number of people in a group for
a birthday pair to occur. The probability that the first
n people selected from that group have different
birthdays is given by [P(B) in page 39, Text]
But the event the the first n people selected have
. ) ( ) 1 2 ( ) ( ) (
1 0
2 2

=
> + = = =
i k
k X P k i X P i X E
(6-59)
. ) 1 (
2 / ) 1 (
1
1
N n n
n
k
n
e
N
k
p

=
=
PILLAI
28
different birthdays is the same as the event X > n.
Hence
Using (6-58), this gives the mean value of X to be
Similarly using (6-59) we get
( 1) / 2
( ) .
n n N
P X n e

>
{ }
2
2 2

( 1) / 2 ( 1/ 4) / 2
1/ 2
0 0
1/ 2
(1/ 8 ) / 2 (1/ 8 ) / 2
1/ 2 0
1
2
( ) ( )
2
1
/ 2 24.44.
2
n n N x N
n n
N x N N x N
E X P X n e e dx
e e dx e N e dx
N
= =
= >
= = +
+ =

(6-60)
PILLAI
29
Thus
2
2 2 2

2
0
( 1) / 2 ( 1/ 4) / 2

0
1/ 2
1/ 2
(1/ 8 ) / 2 / 2 ( 1/ 4) / 2
0 0 1/ 2
( ) (2 1) ( )
(2 1) 2 ( 1)
2 2
2 2 1
2 2 ( )
2 8
1 5
2 2 1 2 2
4 4
779.139.
n
n n N x N
n
N x N x N x N
E X n P X n
n e x e dx
e xe dx xe dx e dx
N
N E X
N N N N
= + >
= + = +

= + +
`
)

= + +
`
)
= + + + = + +
=

82 . 181 )) ( ( ) ( ) (
2 2
= = X E X E X Var
PILLAI
30
which gives
Since the standard deviation is quite high compared to the
mean value, the actual number of people required for a
birthday coincidence could be anywhere from 25 to 40.
Identities similar to (6-58)-(6-59) can be derived in the
case of continuous random variables as well. For example,
if X is a nonnegative random variable with density function
f
X
(x) and distribution function F
X
(X), then
. 48 . 13
X
PILLAI
( )
( )

0 0 0

0 0 0

0 0
{ } ( ) ( )
( ) ( ) ( )
{1 ( )} ( ) ,
X X
X
X
x
y
E X x f x dx dy f x dx
f x dx dy P X y dy P X x dx
F x dx R x dx

= =
= = > = >
= =

(6-61)
31
where
Similarly
( ) 1 ( ) 0, 0.
X
R x F x x = >
( )
( )

2 2
0 0 0

0

0
.
{ } ( ) 2 ( )
2 ( )
2 ( )
X X
X
x
y
E X x f x dx ydy f x dx
f x dx ydy
x R x dx

= =

=

=

32
A Baseball Trivia (Pete Rose and Dimaggio):
In 1978 Pete Rose set a national league record by
hitting a string of 44 games during a 162 game baseball
season. How unusual was that event?
As we shall see, that indeed was a rare event. In that context,
we will answer the following question: What is the
probability that someone in major league baseball will
repeat that performance and possibly set a new record in
the next 50 year period? The answer will put Pete Roses
accomplishment in the proper perspective.
Solution: As example 5-32 (Text) shows consecutive
successes in n trials correspond to a run of length r in n
PILLAI
33
trials. From (5-133)-(5-134) text, we get the probability of
r successive hits in n games to be
where
and p represents the probability of a hit in a game. Pete
Roses batting average is 0.303, and on the average since
a batter shows up about four times/game, we get
r r n
r
r n n
p p
, ,
1

+ =
(6-62)

k r k
r n
k
k
kr n
r n
qp ) ( ) 1 (
) 1 ( /
0

,
=

+
=
|
.
|
\
|

PILLAI
(6-63)
76399 . 0 0.303) - (1 - 1
game) hit / P(no - 1
game) hit / one least at (
4
= =
=
= P p
(6-64)
34
Substituting this value for p into the expressions
(6-62)-(6-63) with r = 44 and n = 162, we can compute the
desired probability p
n.
However since n is quite large
compared to r, the above formula is hopelessly time
consuming in its implementation, and it is preferable to
obtain a good approximation for p
n.
Towards this, notice that the corresponding moment
generating function for in Eq. (5-130) Text,
is rational and hence it can be expanded in partial fraction as
where only r roots (out of r +1) are accounted for, since the
root z = 1/p is common to both the numerator and the
denominator of Here
PILLAI
) (z
,
1
1
) (
1
1

=
+
=
+

=
r
k
k
k
r r
r r
z z
a
z qp z
z p
z
). (z
(6-65)
n n
p q =1
35
r r
k
r r r r
z z
r r
k
r r
z z
k
z qp r
z z z rp z p
z qp z
z z z p
a
k
k
) 1 ( 1
) ( ) 1 (
lim
1
) )( 1 (
lim
1
1
+ +

=
+

=
PILLAI
From (6-65) (6-66)
where
r k
z qp r
z p
a
r
k
r
r
k
r
k
, , 2 , 1 ,
) 1 ( 1
1
" =
+

=
(6-66)
(6-67)

=
= =
+
=
= =

=
0 0 1

) 1 (

1
) (
/ 1
1

) (
) (
n
n
n
n
n
q
r
k
n
k k
k
r
k
k
k
z q z z A
z z z
a
z
n

r
k
r
r
k
r
k k
z qp r
z p
a A
) 1 ( 1
1
+

= =
or
36
PILLAI
and
However (fortunately), the roots in
(6-65)-(6-67) are all not of the same importance (in terms
of their relative magnitude with respect to unity). Notice
that since for large n, for only the roots
nearest to unity contribute to (6-68) as n becomes larger.
To examine the nature of the roots of the denominator
in (6-65), note that (refer to Fig 6.1)
implying that
for increases from 1 and reaches a positive
maximum at z
0
given by
. 1
1
) 1 (
=
+
= =
r
k
n
k k n n
z A p q
(6-68)
r k z
k
, 1,2, , " =
0
) 1 (
+ n
k
z , 1 | | >
k
z
1
1 ) (
+
=
r r
z qp z z A
, 0 1 ) 0 ( < = A
0 ) ( , 0 ) / 1 ( ), 0 ( ) 1 ( < = > = A p A A qp A
r
) ( , 0 z A z
37
which gives
There onwards A(z) decreases to Thus there are two
positive roots for the equation given by
and Since but negative, by
continuity has the form (see Fig 6.1)
PILLAI
, 0 ) 1 ( 1
) (
0
0
= + =
r r
z z
z r qp
dz
z dA
.
) 1 (
1
0
+
=
r qp
z
r
r
(6-69)
.
0 ) ( = z A
0 1
z z <
. 1 / 1
2
> = p z
0 ) 1 ( =
r
qp A
1
z . 0 , 1
1
> + = z
Fig 6.1 A(z) for r odd
) (z A
z
1
1
z
2
=1/p
z
1
z
0
38
It is possible to obtain a bound for in (6-69). When
P varies from 0 to 1, the maximum of is
attained for and it equals Thus
and hence substituting this into (6-69), we get
Hence it follows that the two positive roots of A(z) satisfy
Clearly, the remaining roots of are complex if r is
0
z
PILLAI
r r
p p qp ) 1 ( =
) 1 /( + = r r p
. ) 1 /(
1 +
+
r r
r r
1
) 1 (
+
+
r
r
r
r
r
qp
(6-70)
(6-72)
(6-71)
.
1
1
1
0
r r
r
z + =
+
. 1
1

1
1 1
2 1
> = < + < <
p
z
r
z
) (z A
39
odd , and there is one negative root if r is even (see
Fig 6.2). It is easy to show that the absolute value of every
such complex or negative root is greater than 1/p >1.

To show this when r is even, suppose represents the
negative root. Then

0 ) 1 ( ) (
1
= + =
+ r r
qp A
PILLAI
Fig 6.2 A(z) for r even
) (z A
z
z
1 z
0
z
2

40
so that the function
starts positive, for x > 0 and increases till it reaches once
again maximum at and then decreases to
through the root Since B(1/p) = 2, we
get > 1/p > 1, which proves our claim.
r z / 1 1
0
+

. 1
0
> > = z x
2 ) ( 1 ) (
1
+ = + =
+
x A x qp x x B
r r
PILLAI
(6-73)
Fig 6.3 Negative root
1
) (x B
0 ) ( = B
z
0 1/p
41
Finally if is a complex root of A(z), then
so that
or
Thus from (6-72), belongs to either the interval (0, z
1
)
or the interval in Fig 6.1. Moreover , by equating
the imaginary parts in (6-74) we get
j
e z =
0 1 ) (
) 1 ( 1
= =
+ +

r j r r j j
e qp e e A
(6-74)
1 ) 1 ( 1
1 | 1 |
+ + +
+ + =
r r r j r r
qp e qp

. 1
) 1 (
sin
sin
=
+
r
qp
r r
) , (
1

p
. 0 1 ) (
1
< =
+ r r
qp A
PILLAI
(6-75)
42
But
equality being excluded if Hence from (6-75)-(6-76)
and (6-70)
or
But As a result lies in the interval only.
Thus
.
0 1
z z <
) , (
1

p
, 1
) 1 (

sin
sin
+
+
r
r

. 0
r
r
r
r r r
r
r
z
qp r
qp r
|
.
|
\
|
+
> =
+
> > +
1
) 1 (
1
1 ) 1 (
0

.
1
1
0
r
z + >
. 1
1
> >
p
PILLAI
(6-76)
(6-77)
43
To summarize the two real roots of the polynomial
A(z) are given by
and all other roots are (negative or complex) of the form
Hence except for the first root z
1
(which is very close to
unity), for all other roots
As a result, the most dominant term in (6-68) is the first
term, and the contributions from all other terms to q
n
in
(6-68) can be bounded by
, 1
1
; 0 , 1
2 1
> = > + =
p
z z
. 1
1
where > > =
p
e z
j
k

. all for rapidly 0
) 1 (
k z
n
k

+
PILLAI
(6-78)
(6-79)
44
Thus from (6-68), to an excellent approximation
This gives the desired probability to be
.
) 1 (
1 1
+
=
n
n
z A q
(6-81)
0.
1
1

|) | ( ) 1 (
|) | (

|) | ( ) 1 ( 1
|) | ( 1

| || |
1 1
1
2
1
2
2
) 1 (
2
) 1 (

+
=
+
+ +
+
=
+
=
=
+
=
+

q
p
q
p
r
r
p
z p q r
z p
p
z p q r
z p
z A z A
n n
n
r
k
r
k
r
k
n
r
k
r
k
r
k
r
k
n
k k
r
k
n
k k
(6-80)
PILLAI
45
Notice that since the dominant root z
1
is very close to
unity, an excellent closed form approximation for z
1
can
be obtained by considering the first order Taylor series
expansion for A(z). In the immediate neighborhood of z =1
we get
so that gives
or
.
) ( ) 1 ( 1
) ( 1
1 1
) 1 (
1
1
1
+
|
.
|
\
|
+

= =
n
r
r
n n
z
pz q r
pz
q p
(6-82)
PILLAI
) ) 1 ( 1 ( ) 1 ( ) 1 ( ) 1 (
r r
qp r qp A A A + + =
+ = +
0 ) 1 ( ) (
1
= + = A z A
,
) 1 ( 1
r
r
qp r
qp
+
=
46
Returning back to Pete Roses case, p = 0.763989, r = 44
gives the smallest positive root of the denominator
polynomial
to be
(The approximation (6-83) gives ).
Thus with n = 162 in (6-82) we get
to be the probability for scoring 44 or more consecutive
45 44
1 ) ( z qp z z A =
. 0549 0000016936 . 1
1
= z
0548 0000016936 . 1
1
= z
0002069970 . 0
162
= p
(6-84)
.
) 1 ( 1
1
1
r
r
qp r
qp
z
+
+
(6-83)
47
hits in 162 games for a player of Pete Roses caliber a
very small probability indeed! In that sense it is a very
rare event.
Assuming that during any baseball season there are
on the average about (?) such players over
all major league baseball teams, we obtain [use Lecture #2,
Eqs.(2-3)-(2-6) for the independence of 50 players]
to be the probability that one of those players will hit the
desired event. If we consider a period of 50 years, then the
probability of some player hitting 44 or more consecutive
games during one of these game seasons turns out to be
PILLAI
50 25 2 =
. 40401874 . 0 ) 1 ( 1
50
1
= P
0102975349 . 0 ) 1 ( 1
50
162 1
= = p P
(6-85)
48
(We have once again used the independence of the 50
seasons.)
Thus Pete Roses 44 hit performance has a 60-40
chance of survival for about 50 years.From (6-85), rare
events do indeed occur. In other words, some unlikely
event is likely to happen.
However, as (6-84) shows a particular unlikely event
such as Pete Rose hitting 44 games in a sequence is
indeed rare.
Table 6.1 lists p
162
for various values of r. From there,
every reasonable batter should be able to hit at least 10
to 12 consecutive games during every season!
49
0.95257 10
0.48933 15
0.14937 20
0.03928 25
0.000207 44
p
n
; n = 162 r
Table 6.1 Probability of r runs in n trials for p=0.76399.
As baseball fans well know, Dimaggio holds the record of
consecutive game hitting streak at 56 games (1941). With
a lifetime batting average of 0.325 for Dimaggio, the above
calculations yield [use (6-64), (6-82)-(6-83)] the probability
for that event to be
50
Even over a 100 year period, with an average of 50
excellent hitters / season, the probability is only
(where ) that someone
will repeat or outdo Dimaggios performance.Remember,
60 years have already passed by, and no one has done it yet!
. 0000504532 . 0 =
n
p
PILLAI
(6-86)
(6-87)
2229669 . 0 ) 1 ( 1
100
0
= P
00251954 . 0 ) 1 ( 1
50
0
= =
n
p P
1
7. Two Random Variables
In many experiments, the observations are expressible not
as a single quantity, but as a family of quantities. For
example to record the height and weight of each person in
a community or the number of people and the total income
in a family, we need two numbers.
Let X and Y denote two random variables (r.v) based on a
probability model (, F, P). Then
( )

= = <
2
1
, ) ( ) ( ) ( ) (
1 2 2 1
x
x
X X X
dx x f x F x F x X x P
and
( ) . ) ( ) ( ) ( ) (
2
1
1 2 2 1

= = <
y
y
Y Y Y
dy y f y F y F y Y y P
PILLAI
2
What about the probability that the pair of r.vs (X,Y) belongs
to an arbitrary region D? In other words, how does one
estimate, for example,
Towards this, we define the joint probability distribution
function of X and Y to be
where x and y are arbitrary real numbers.
Properties
(i)
since we get
| | ? ) ) ( ( ) ) ( (
2 1 2 1
= < < y Y y x X x P
| |
, 0 ) , (
) ) ( ( ) ) ( ( ) , (
=
=
y Y x X P
y Y x X P y x F
XY

(7-1)
. 1 ) , ( , 0 ) , ( ) , ( = + + = =
XY XY XY
F x F y F
( ) ( ), ) ( ) ( , ) ( X y Y X
(7-2)
PILLAI
3
Similarly
we get
(ii)
To prove (7-3), we note that for
and the mutually exclusive property of the events on the
right side gives
which proves (7-3). Similarly (7-4) follows.
( ) . 0 ) ( ) , ( = X P y F
XY
( ) , ) ( , ) ( = + + Y X
. 1 ) ( ) , ( = = P F
XY
( ) ). , ( ) , ( ) ( , ) (
1 2 2 1
y x F y x F y Y x X x P
XY XY
= <
( ) ). , ( ) , ( ) ( , ) (
1 2 2 1
y x F y x F y Y y x X P
XY XY
= <
(7-3)
(7-4)
,
1 2
x x >
( ) ( ) ( ) y Y x X x y Y x X y Y x X < = ) ( , ) ( ) ( , ) ( ) ( , ) (
2 1 1 2

( ) ( ) ( ) y Y x X x P y Y x X P y Y x X P < + = ) ( , ) ( ) ( , ) ( ) ( , ) (
2 1 1 2

PILLAI
4
(iii)
This is the probability that (X,Y) belongs to the rectangle
in Fig. 7.1. To prove (7-5), we can make use of the
following identity involving mutually exclusive events on
the right side.
( )
). , ( ) , (
) , ( ) , ( ) ( , ) (
1 1 2 1
1 2 2 2 2 1 2 1
y x F y x F
y x F y x F y Y y x X x P
XY XY
XY XY
+
= < <
(7-5)
0
R
( ) ( ) ( ). ) ( , ) ( ) ( , ) ( ) ( , ) (
2 1 2 1 1 2 1 2 2 1
y Y y x X x y Y x X x y Y x X x < < < = <
1
y
2
y
1
x
2
x
X
Y
Fig. 7.1
0
R
PILLAI
5
( ) ( ) ( )
2 1 2 1 1 2 1 2 2 1
) ( , ) ( ) ( , ) ( ) ( , ) ( y Y y x X x P y Y x X x P y Y x X x P < < + < = <
2
y y =
1
y
.

) , (
) , (
2
y x
y x F
y x f
XY
XY

=
. ) , ( ) , (

dudv v u f y x F
x y
XY XY

=
This gives
and the desired result in (7-5) follows by making use of (7-
3) with and respectively.
Joint probability density function (Joint p.d.f)
By definition, the joint p.d.f of X and Y is given by
and hence we obtain the useful formula
Using (7-2), we also get
(7-6)
(7-7)
(7-8)
. 1 ) , (

=

+

+

dxdy y x f
XY
PILLAI
6
To find the probability that (X,Y) belongs to an arbitrary
region D, we can make use of (7-5) and (7-7). From (7-5)
and (7-7)
Thus the probability that (X,Y) belongs to a differential
rectangle x y equals and repeating this
procedure over the union of no overlapping differential
rectangles in D, we get the useful result
( )
. ) , ( ) , (
) , ( ) , ( ) , (
) , ( ) ( , ) (

y x y x f dudv v u f
y x F y x x F y y x F
y y x x F y y Y y x x X x P
XY
x x
x
y y
y
XY
XY XY XY
XY
= =
+ + +
+ + = + < + <

+ +

(7-9)
Y
, ) , ( y x y x f
XY

x
X
Fig. 7.2
y
D
PILLAI
7
( )

=
D y x
XY
dxdy y x f D Y X P
) , (
. ) , ( ) , (
(iv) Marginal Statistics
In the context of several r.vs, the statistics of each individual
ones are called marginal statistics. Thus is the
marginal probability distribution function of X, and is
the marginal p.d.f of X. It is interesting to note that all
marginals can be obtained from the joint p.d.f. In fact
Also
To prove (7-11), we can make use of the identity
. ) , ( ) ( ), , ( ) ( y F y F x F x F
XY Y XY X
+ = + =
(7-11)
. ) , ( ) ( , ) , ( ) (

+

+

= = dx y x f y f dy y x f x f
XY Y XY X
(7-12)
) (x F
X
) (x f
X
(7-10)
) ( ) ( ) ( + = Y x X x X
PILLAI
8
so that
To prove (7-12), we can make use of (7-7) and (7-11),
which gives
and taking derivative with respect to x in (7-13), we get
At this point, it is useful to know the formula for
differentiation under integrals. Let
Then its derivative with respect to x is given by
Obvious use of (7-16) in (7-13) gives (7-14).
( ) ( ) ). , ( , ) ( + = = = x F Y x X P x X P x F
XY X
dudy y u f x F x F
x
XY XY X
) , ( ) , ( ) (

+

= + = (7-13)
. ) , ( ) (

+

= dy y x f x f
XY X
(7-14)
. ) , ( ) (
) (
) (
=
x b
x a
dy y x h x H
(7-15)
.

) , (
) , (
) (
) , (
) ( ) (
) (
) (
+ =
x b
x a
dy
dx
y x dh
a x h
dx
x da
b x h
dx
x db
dx
x dH
(7-16)
PILLAI
9
If X and Y are discrete r.vs, then represents
their joint p.d.f, and their respective marginal p.d.fs are
given by
and
Assuming that is written out in the form of a
rectangular array, to obtain from (7-17), one need
to add up all entries in the i-th row.
) , (
j i ij
y Y x X P p = = =

= = = = =
j j
ij j i i
p y Y x X P x X P ) , ( ) (

= = = = =
i i
ij j i j
p y Y x X P y Y P ) , ( ) (
(7-17)
(7-18)
) , (
j i
y Y x X P = =
), (
i
x X P =
ij
p
mn mj m m
in ij i i
n j
n j
p p p p
p p p p
p p p p
p p p p
" "
# # # # # #
" "
# # # # # #
" "
" "
2 1
2 1
2 2 22 21
1 1 12 11
j
ij
p
i
Fig. 7.3
It used to be a practice for insurance
companies routinely to scribble out
these sum values in the left and top
margins, thus suggesting the name
marginal densities! (Fig 7.3).
PILLAI
10
From (7-11) and (7-12), the joint P.D.F and/or the joint p.d.f
represent complete information about the r.vs, and their
marginal p.d.fs can be evaluated from the joint p.d.f. However,
given marginals, (most often) it will not be possible to
compute the joint p.d.f. Consider the following example:
Example 7.1: Given
Obtain the marginal p.d.fs and
Solution: It is given that the joint p.d.f is a constant in
the shaded region in Fig. 7.4. We can use (7-8) to determine
that constant c. From (7-8)
< < <

=
. otherwise 0,
, 1 0 constant,
) , (
y x
y x f
XY
(7-19)
) , ( y x f
XY
) (x f
X
). ( y f
Y
0 1
1
X
Y
Fig. 7.4
y
. 1
2 2
) , (
1
0
2
1
0

1
0

0
= = = =
|
.
|
\
|
=

=
+

+
= =
c cy
cydy dy dx c dxdy y x f
y y
y
x
XY
(7-20)
PILLAI
11
Thus c = 2. Moreover from (7-14)
and similarly
Clearly, in this case given and as in (7-21)-(7-22),
it will not be possible to obtain the original joint p.d.f in (7-
19).
Example 7.2: X and Y are said to be jointly normal (Gaussian)
distributed, if their joint p.d.f has the following form:
, 1 0 ), 1 ( 2 2 ) , ( ) (
1

=
+

< < = = =
x y
XY X
x x dy dy y x f x f (7-21)
. 1 0 , 2 2 ) , ( ) (

0

=
+

< < = = =
y
x
XY Y
y y dx dx y x f y f
(7-22)
) (x f
X
) ( y f
Y
. 1 | | , ,
,
1 2
1
) , (
2
2
2
2
2
) (

) )( ( 2

) (
) 1 ( 2
1
2
< + < < + < <
=
|
|
.
|
\
|

+

y x
e y x f
Y
Y
Y X
Y X
X
X
y y x x
Y X
XY
(7-23)
PILLAI
12
By direct integration, using (7-14) and completing the
square in (7-23), it can be shown that
~
and similarly
~
Following the above notation, we will denote (7-23)
as Once again, knowing the marginals
in (7-24) and (7-25) alone doesnt tell us everything about
the joint p.d.f in (7-23).
As we show below, the only situation where the marginal
p.d.fs can be used to recover the joint p.d.f is when the
random variables are statistically independent.
), , (
2
1
) , ( ) (
2 2 / ) (
2

2 2
X X
x
X
XY X
N e dy y x f x f
X X

+

= =

(7-24)
(7-25)
), , (
2
1
) , ( ) (
2 2 / ) (
2

2 2
Y Y
y
Y
XY Y
N e dx y x f y f
Y Y

+

= =

). , , , , (
2 2

Y X Y X
N
PILLAI
13
Independence of r.vs
Definition: The random variables X and Y are said to be
statistically independent if the events and
are independent events for any two Borel sets A and B in x
and y axes respectively. Applying the above definition to
the events and we conclude that, if
the r.vs X and Y are independent, then
i.e.,
or equivalently, if X and Y are independent, then we must
have
{ } A X ) (
} ) ( { B Y
{ } x X ) (
{ }, ) ( y Y
( ) ) ) ( ( ) ) ( ( ) ) ( ( ) ) ( ( y Y P x X P y Y x X P =
(7-26)
) ( ) ( ) , ( y F x F y x F
Y X XY
=
). ( ) ( ) , ( y f x f y x f
Y X XY
=
(7-28)
(7-27)
PILLAI
14
If X and Y are discrete-type r.vs then their independence
implies
Equations (7-26)-(7-29) give us the procedure to test for
independence. Given obtain the marginal p.d.fs
and and examine whether (7-28) or (7-29) is valid. If
so, the r.vs are independent, otherwise they are dependent.
Returning back to Example 7.1, from (7-19)-(7-22), we
observe by direct verification that Hence
X and Y are dependent r.vs in that case. It is easy to see that
such is the case in the case of Example 7.2 also, unless
In other words, two jointly Gaussian r.vs as in (7-23) are
independent if and only if the fifth parameter
. , all for ) ( ) ( ) , ( j i y Y P x X P y Y x X P
j i j i
= = = = =
(7-29)
) (x f
X
) ( y f
Y
), , ( y x f
XY
). ( ) ( ) , ( y f x f y x f
Y X XY

. 0 =
. 0 =
PILLAI
15
Example 7.3: Given
Determine whether X and Y are independent.
Solution:
Similarly
In this case
and hence X and Y are independent random variables.

otherwise. , 0
, 1 0 , 0 ,
) , (
2
< < < <

=

x y e xy
y x f
y
XY
(7-30)
. 1 0 , 2 2 2
) , ( ) (

0 0

0
2
0
< < =
|
.
|
\
|
+ =
= =

+
x x dy ye ye x
dy e y x dy y x f x f
y y
y
XY X
(7-31)
. 0 ,
2
) , ( ) (
2
1
0
< < = =

y e
y
dx y x f y f
y
XY Y
(7-32)
), ( ) ( ) , ( y f x f y x f
Y X XY
=
PILLAI
1
8. One Function of Two Random
Variables
Given two random variables X and Y and a function g(x,y),
we form a new random variable Z as
Given the joint p.d.f how does one obtain
the p.d.f of Z ? Problems of this type are of interest from a
practical standpoint. For example, a receiver output signal
usually consists of the desired signal buried in noise, and
the above formulation in that case reduces to Z = X + Y.
). , ( Y X g Z =
), , ( y x f
XY
), ( z f
Z
(8-1)
PILLAI
2
It is important to know the statistics of the incoming signal
for proper receiver design. In this context, we shall analyze
problems of the following type:
Referring back to (8-1), to start with
) , ( Y X g Z =
Y X +
) / ( tan
1
Y X
Y X
XY
Y X /
) , max( Y X
) , min( Y X
2 2
Y X +
( ) ( ) | |

=
= = =
z
D y x
XY
z Z
dxdy y x f
D Y X P z Y X g P z Z P z F
,
, ) , (
) , ( ) , ( ) ( ) (
(8-2)
(8-3)
PILLAI
3
where in the XY plane represents the region such
that is satisfied. Note that need not be simply
connected (Fig. 8.1). From (8-3), to determine it is
enough to find the region for every z, and then evaluate
the integral there.
We shall illustrate this method through various examples.
z
D
z y x g ) , (
) ( z F
Z
z
D
z
D
X
Y
z
D
z
D
Fig. 8.1
PILLAI
4
Example 8.1: Z = X + Y. Find
Solution:
since the region of the xy plane where is the
shaded area in Fig. 8.2 to the left of the line
Integrating over the horizontal strip along the x-axis first
(inner integral) followed by sliding that strip along the y-axis
from to (outer integral) we cover the entire shaded
area.
( )

+
=
=
= + =

, ) , ( ) (
y
y z
x
XY Z
dxdy y x f z Y X P z F (8-4)
z
D z y x +
. z y x = +

+
y z x =
x
y
Fig. 8.2
). ( z f
Z
PILLAI
5
We can find by differentiating directly. In this
context, it is useful to recall the differentiation rule in (7-
15) - (7-16) due to Leibnitz. Suppose
Then
Using (8-6) in (8-4) we get
Alternatively, the integration in (8-4) can be carried out first
along the y-axis followed by the x-axis as in Fig. 8.3.
) ( z F
Z
) ( z f
Z
=
) (
) (
. ) , ( ) (
z b
z a
dx z x h z H (8-5)
( ) ( )

+ =
) (
) (
.
) , (
), (
) (
), (
) ( ) (
z b
z a
dx
z
z x h
z z a h
dz
z da
z z b h
dz
z db
dz
z dH
(8-6)

+

+

+

=
|
.
|
\
|
+ =
|
.
|
\
|
=

. ) , (
) , (
0 ) , ( 1 ) , ( ) (
dy y y z f
dy
z
y x f
y y z f dy dx y x f
z
z f
XY
XY
XY
y z
XY Z
(8-7)
PILLAI
6
In that case
and differentiation of (8-8)
gives

+
=
=
=

, ) , ( ) (
x
x z
y
XY Z
dxdy y x f z F
(8-8)

+
=
+
=
=
=
|
.
|
\
|
= =

. ) , (
) , (
) (
) (
x
XY
x
x z
y
XY
Z
Z
dx x z x f
dx dy y x f
z dz
z dF
z f
(8-9)
If X and Y are independent, then
and inserting (8-10) into (8-8) and (8-9), we get
) ( ) ( ) , ( y f x f y x f
Y X XY
=
. ) ( ) ( ) ( ) ( ) (

+
=
+
=
= =
x
Y X
y
Y X Z
dx x z f x f dy y f y z f z f
(8-10)
(8-11)
x z y =
x
y
Fig. 8.3
PILLAI
7
The above integral is the standard convolution of the
functions and expressed two different ways. We
thus reach the following conclusion: If two r.vs are
independent, then the density of their sum equals the
convolution of their density functions.
As a special case, suppose that for and
for then we can make use of Fig. 8.4 to determine the
new limits for
) ( z f
X
) ( z f
Y
0 ) ( = x f
X
0 < x
0 ) ( = y f
Y
, 0 < y
.
z
D
Fig. 8.4
y z x =
x
y
) 0 , (z
) , 0 ( z
PILLAI
8
In that case
or
On the other hand, by considering vertical strips first in
Fig. 8.4, we get
or
if X and Y are independent random variables.

=
=
=
z
y
y z
x
XY Z
dxdy y x f z F

0

0
) , ( ) (
>
=
|
.
|
\
|
=

=
=
. 0 , 0
, 0 , ) , (
) , ( ) (

0
0

0
z
z dy y y z f
dy dx y x f
z
z f
z
XY
z
y
y z
x
XY Z (8-12)
>
= =

=
=
, 0 , 0
, 0 , ) ( ) (
) , ( ) (

0
0
z
z dx x z f x f
dx x z x f z f
z
y
Y X
z
x
XY Z

=
=
=
z
x
x z
y
XY Z
dydx y x f z F

0

0
) , ( ) (
(8-13)
PILLAI
9
Example 8.2: Suppose X and Y are independent exponential
r.vs with common parameter , and let Z = X + Y.
Determine
Solution: We have
and we can make use of (13) to obtain the p.d.f of Z = X + Y.
As the next example shows, care should be taken in using
the convolution formula for r.vs with finite range.
Example 8.3: X and Y are independent uniform r.vs in the
common interval (0,1). Determine where Z = X + Y.
Solution: Clearly, here, and as Fig. 8.5
shows there are two cases of z for which the shaded areas are
quite different in shape and they should be considered
separately.
), ( ) ( ), ( ) ( y U e y f x U e x f
y
Y
x
X

= = (8-14)
2 0 < < + = z Y X Z
), ( z f
Z
). ( ) (
2
0
2
0
) ( 2
z U e z dx e dx e e z f
z
z
z
z
x z x
Z

= = =

(8-15)
). ( z f
Z
PILLAI
10
x
y
y z x =
1 0 ) ( < < z a
x
y
y z x =
2 1 ) ( < < z b
Fig. 8.5
For
For notice that it is easy to deal with the unshaded
region. In that case
, 1 0 < z
, 2 1 < z
. 1 0 ,
2
) ( 1 ) (
2
0

0

0
< = = =

= =
=
z
z
dy y z dxdy z F
z
y
z
y
y z
x
Z
(8-16)
( )
. 2 1 ,
2
) 2 (
1 ) 1 ( 1
1 1 1 ) (
2
1
1 z
1
1
1

<
= + =
= > =

=
= =
z
z
dy y z
dxdy z Z P z F
y
z y y z x
Z
(8-17)
PILLAI
11
Using (8-16) - (8-17), we obtain
By direct convolution of and we obtain the
same result as above. In fact, for (Fig. 8.6(a))
and for (Fig. 8.6(b))
Fig 8.6 (c) shows which agrees with the convolution
of two rectangular waveforms as well.
<
<
= =
. 2 1 , 2
, 1 0
) (
) (
z z
z z
dz
z dF
z f
Z
Z
(8-18)
) ( x f
X
), ( y f
Y
1 0 < z
2 1 < z
. 1 ) ( ) ( ) (

0
z dx dx x f x z f z f
z
Y X Z
= = =

. 2 1 ) (
1
1
z dx z f
z
Z
= =

(8-19)
(8-20)
) ( z f
Z
PILLAI
12
) (x f
Y
x
1
) ( x z f
X

x
z
) ( ) ( x f x z f
Y X

x
z
1 z
1 0 ) ( < z a
) (x f
Y
x
1
) ( x z f
X

x
) ( ) ( x f x z f
Y X

x
1
1 z
z
1 z
2 1 ) ( < z b
Fig. 8.6 (c)
) (z f
Z
z
2 0
1
PILLAI
13
Example 8.3: Let Determine its p.d.f
Solution: From (8-3) and Fig. 8.7
and hence
If X and Y are independent, then the above formula reduces
to
which represents the convolution of with
. Y X Z =
( )

+
=
+
=
= =

) , ( ) (
y
y z
x
XY Z
dxdy y x f z Y X P z F

+

+
=
+
=
+ =
|
.
|
\
|
= =

. ) , ( ) , (
) (
) ( dy y z y f dy dx y x f
z dz
z dF
z f
XY
y
x z
x
XY
Z
Z
(8-21)
), ( ) ( ) ( ) ( ) (

y f z f dx y f y z f z f
Y X Y X Z
= + =
+

(8-22)
) ( z f
X

). ( z f
Y
Fig. 8.7
y
x
z y x =
z y x + =
y
). ( z f
Z
PILLAI
14
As a special case, suppose
In this case, Z can be negative as well as positive, and that
gives rise to two situations that should be analyzed
separately, since the region of integration for and
are quite different. For from Fig. 8.8 (a)
and for from Fig 8.8 (b)
After differentiation, this gives

+
=
+
=
=

0

0
) , ( ) (
y
y z
x
XY Z
dxdy y x f z F

+
=
+
=
=

0
) , ( ) (
z y
y z
x
XY Z
dxdy y x f z F
. 0 , 0 ) ( and , 0 , 0 ) ( < = < = y y f x x f
Y X
0 z
0 < z
, 0 z
, 0 < z
< +
+
=
+
. 0 , ) , (
, 0 , ) , (
) (

0
z dy y y z f
z dy y y z f
z f
z
XY
XY
Z
(8-23)
Fig. 8.8 (b)
y
x
y z x + =
z
y
x
y z x + =
z
z
(a)
PILLAI
15
Example 8.4: Given Z = X / Y, obtain its density function.
Solution: We have
The inequality can be rewritten as if
and if Hence the event in (24) need to
be conditioned by the event and its compliment
Since by the partition theorem, we have
and hence by the mutually exclusive property of the later
two events
Fig. 8.9(a) shows the area corresponding to the first term,
and Fig. 8.9(b) shows that corresponding to the second term
in (8-25).
( ). / ) ( z Y X P z F
Z
=
(8-24)
z Y X / Yz X
, 0 > Y
Yz X
. 0 < Y ( ) z Y X /
( ) 0 > = Y A
.
__
A
,
__
S A A =
( ) ( ) ( )
( ) ( ). 0 , 0 ,
0 , / 0 , / /
< + > =
< + > =
Y Yz X P Y Yz X P
Y z Y X P Y z Y X P z Y X P
(8-25)
Fig. 8.9
y
x
yz x =
(a)
y
x
yz x =
(b)
{ } { } { } { } A z Y X A z Y X A A z Y X z Y X = = ) / ( ) / ( ) ( ) / ( /
PILLAI
16
Integrating over these two regions, we get
Differentiation with respect to z gives
Note that if X and Y are nonnegative random variables, then
the area of integration reduces to that shown in Fig. 8.10.
. ) , ( ) , ( ) (
0

0

=
=
+
= =
+ =
y yz x
XY
y
yz
x
XY Z
dxdy y x f dxdy y x f z F
(8-26)
. , ) , ( | |
) , ( ) ( ) , ( ) (

0

0
+ < < =
+ =

+

+
z dy y yz f y
dy y yz f y dy y yz yf z f
XY
XY XY Z
(8-27)
y
x
yz x =
Fig. 8.10
PILLAI
17
This gives
or
Example 8.5: X and Y are jointly normal random variables
with zero mean so that
Show that the ratio Z = X / Y has a Cauchy density function
centered at
Solution: Inserting (8-29) into (8-27) and using the fact
that we obtain

= =
=

0

0
) , ( ) (
y
yz
x
XY Z
dxdy y x f z F
>
=

+
otherwise. , 0
, 0 , ) , (
) (

0
z dy y yz f y
z f
XY
Z
(8-28)
.
1 2
1
) , (
2
2
2
2 1
2
1
2
2
2
) 1 ( 2
1
2
2 1
|
|
.
|
\
|
+
=

y rxy x
r
XY
e
r
y x f (8-29)
. /
2 1
r
,
1
) (
1 2
2
) (
2
2 1
2
0
0
2 /
2
2 1
2
0
2
r
z
dy ye
r
z f
y
Z
=

), , ( ) , ( y x f y x f
XY XY
=
PILLAI
18
where
Thus
which represents a Cauchy r.v centered at Integrating
(8-30) from to z, we obtain the corresponding
distribution function to be
Example 8.6: Obtain
Solution: We have
.
1 2
1
) (
2
2 2 1
2
1
2
2
2
0

+

=
rz z
r
z
,
) 1 ( ) / (
/ 1
) (
2 2
1
2
2 1
2
2
2
2 1
r r z
r
z f
Z
+

=

(8-30)
. /
2 1
r

.
1
arctan
1
2
1
) (
2
1
1 2
r
r z
z F
Z
+ =
(8-31)
( ) . ) , ( ) (
2 2
2 2

+
= + =
z Y X
XY Z
dxdy y x f z Y X P z F
.
2 2
Y X Z + =
(8-32)
). ( z f
Z
PILLAI
19
But, represents the area of a circle with radius
and hence from Fig. 8.11,
This gives after repeated differentiation
As an illustration, consider the next example.
z Y X +
2 2
, z
. ) , ( ) (

2
2

=
=
=
z
z y
y z
y z x
XY Z
dxdy y x f z F
(8-33)
( ) . ) , ( ) , (
2
1
) (

2 2
2
=
+
=
z
z y
XY XY Z
dy y y z f y y z f
y z
z f
(8-34)
Fig. 8.11
x
y
z
z Y X = +
2 2
z
z
PILLAI
20
Example 8.7 : X and Y are independent normal r.vs with zero
Mean and common variance Determine for
Solution: Direct substitution of (8-29) with
Into (8-34) gives
where we have used the substitution From (8-35)
we have the following result: If X and Y are independent zero
mean Gaussian r.vs with common variance then
is an exponential r.vs with parameter
Example 8.8 : Let Find
Solution: From Fig. 8.11, the present case corresponds to a
circle with radius Thus
) (z f
Z
.
2 2
Y X Z + = .
2
= = =
2 1
, 0 r
), (
2
1

cos
cos

1

2
1
2
2
1
) (
2
2
2
2 2 2
2 /
2
/2
0
2
2 /

0
2
2
2 /

2 / ) (
2
2
z U e d
z
z e
dy
y z
e
dy e
y z
z f
z
z
z
z
z
z y
y y z
Z
=
+
= =
=
|
.
|
\
|

(8-35)
. sin z y =
,
2
2 2
Y X + . 2
2
.
2 2
Y X Z + =
). (z f
Z
.
2
z
PILLAI
21
( ) . ) , ( ) , ( ) (

2 2 2 2
2 2
=
z
z
XY XY Z
dy y y z f y y z f
y z
z
z f
. ) , ( ) (

2 2
2 2

=
=
=
z
z y
y z
y z x
XY Z
dxdy y x f z F
), (
cos
cos 2

1 2
2
1
2 ) (
2 2 2 2
2 2 2 2 2 2
2 /
2
/2
0
2 /
2

0
2 2
2 /
2

0
2 / ) (
2
2 2
z U e
z
d
z
z
e
z
dy
y z
e
z
dy e
y z
z
z f
z z
z
z
z
y y z
Z

+
= =

And by repeated differentiation, we obtain
Now suppose X and Y are independent Gaussian as in
Example 8.7. In that case, (8-36) simplifies to
which represents a Rayleigh distribution. Thus, if
where X and Y are real, independent normal r.vs with zero
mean and equal variance, then the r.v has a
Rayleigh density. Wis said to be a complex Gaussian r.v
with zero mean, whose real and imaginary parts are
independent r.vs. From (8-37), we have seen that its
magnitude has Rayleigh distribution.
(8-36)
(8-37)
, iY X W + =
2 2
Y X W + =
PILLAI
22
What about its phase
Clearly, the principal value of lies in the interval
If we let then from example 8.5, U has a
Cauchy distribution with (see (8-30) with )
As a result
To summarize, the magnitude and phase of a zero mean
complex Gaussian r.v has Rayleigh and uniform distributions
respectively. Interestingly, as we will show later, these two
derived r.vs are also independent of each other!
? tan
1
|
.
|
\
|
=

Y
X
(8-38)
. ,
1
/ 1
) (
2
< <
+
= u
u
u f
U

(8-39)
). 2 / , 2 / (
, / tan Y X U = =
0 ,
2 1
= = r
< <
=
+
= =
. otherwise , 0
, 2 / 2 / , / 1
1 tan
/ 1
) sec / 1 (
1
) (tan
| / |
1
) (
2 2

U
f
du d
f
(8-40)
PILLAI
23
Let us reconsider example 8.8 where X and Y have nonzero
means and respectively. Then is said to
be a Rician r.v. Such a scene arises in fading multipath
situation where there is a dominant constant component
(mean) in addition to a zero mean Gaussian r.v. The constant
component may be the line of sight signal and the zero mean
Gaussian r.v part could be due to random multipath
components adding up incoherently (see diagram below).
The envelope of such a signal is said to have a Rician p.d.f.
Example 8.9: Redo example 8.8, where X and Y have
nonzero means and respectively.
Solution: Since
substituting this into (8-36) and letting
X
2 2
Y X Z + =
X
,
2
1
) , (
2 2 2
2 / ] ) ( ) [(
2

Y X
y x
XY
e y x f
+
=
Rician
Output
Line of sight
signal (constant)
a
Multipath/Gaussian
noise
PILLAI
24
we get the Rician probability density function to be
where
is the modified Bessel function of the first kind and zeroth
order.
Example 8.10: Determine
Solution: The functions max and min are nonlinear
, sin , cos , , sin
2 2
= = + = =
Y X Y X
z y
( )
,
2

2

2
) (
2
0
2
2 / ) (
/2 3
/2
/ ) cos(
/2
/2
/ ) cos(
2
2 / ) (
/2
/2
/ ) cos( / ) cos(
2
2 / ) (
2 2 2
2 2
2 2 2
2 2
2 2 2
|
.
|
\
|
=
|
.
|
\
|
+ =
+ =
+
+
+

z
I
ze
d e d e
ze
d e e
ze
z f
z
z z
z
z z
z
Z

= =

0
cos
2
0
) cos(
0
1
2
1
) ( d e d e I
(8-41)
). , min( ), , max( Y X W Y X Z = =
). (z f
Z
(8-42)
PILLAI
25
operators and represent special cases of the more general
order statistics. In general, given any n-tuple
we can arrange them in an increasing order of magnitude
such that
where and is the second smallest
value among and finally
If represent r.vs, the function that takes on
the value in each possible sequence is
known as the k-th order statistic. represent
the set of order statistics among n random variables. In this
context
represents the range, and when n = 2, we have the max and
min statistics.
, , , ,
2 1 n
X X X
,
) ( ) 2 ( ) 1 ( n
X X X
( ), , , , min
2 1 ) 1 ( n
X X X X =
) 2 (
X
, , , ,
2 1 n
X X X
( ). , , , max
2 1 ) ( n n
X X X X =
n
X X X , , ,
2 1

) ( k
X
) ( k
x ( )
n
x x x , , ,
2 1

) 1 ( ) (
X X R
n
=
(8-44)
(8-43)
PILLAI
(1) (2) ( )
( , , , )
n
X X X
26
Returning back to that problem, since
we have (see also (8-25))
since and are mutually exclusive sets that
form a partition. Figs 8.12 (a)-(b) show the regions
satisfying the corresponding inequalities in each term
above.
>
= =
, ,
, ,
) , max(
Y X Y
Y X X
Y X Z
(8-45)
( ) ( ) ( ) | |
( ) ( ), , ,
, , ) , max( ) (
Y X z Y P Y X z X P
Y X z Y Y X z X P z Y X P z F
Z
+ > =
> = =
) ( Y X > ) ( Y X
x
y
z x =
y x =
z X
Y X >
) , ( ) ( Y X z X P a >
Fig. 8.12
x
y
z Y
Y X
y x =
z y =
) , ( ) ( Y X z Y P b
x
y
) , ( z z
) (c
+
=
PILLAI
27
(8-46)
Fig. 8.12 (c) represents the total region, and from there
If X and Y are independent, then
and hence
Similarly
Thus
( ) ). , ( , ) ( z z F z Y z X P z F
XY Z
= =
) ( ) ( ) ( y F x F z F
Y X Z
=
). ( ) ( ) ( ) ( ) ( z F z f z f z F z f
Y X Y X Z
+ =
(8-47)
>
= =
. ,
, ,
) , min(
Y X X
Y X Y
Y X W
(8-48)
( ) ( ) ( ) | |. , , ) , min( ) ( Y X w X Y X w Y P w Y X P w F
W
> = =
PILLAI
28
Once again, the shaded areas in Fig. 8.13 (a)-(b) show the
regions satisfying the above inequalities and Fig 8.13 (c)
shows the overall region.
From Fig. 8.13 (c),
where we have made use of (7-5) and (7-12) with
and
( ) ( )
, ) , ( ) ( ) (
, 1 1 ) (
w w F w F w F
w Y w X P w W P w F
XY Y X
W
+ =
> > = > =
(8-49)
,
2 2
+ = = y x
.
1 1
w y x = =
x
y
y x =
w y =
(a)
Fig. 8.13
x
y
y x =
w x =
(b)
x
y
) , ( w w
(c)
PILLAI
29
Example 8.11: Let X and Y be independent exponential r.vs
with common parameter . Define Find
Solution: From (8-49)
and hence
But and so that
Thus min ( X, Y ) is also exponential with parameter 2.
Example 8.12: Suppose X and Y are as give in the above
example. Define Determine
). , min( Y X W =
? ) (w f
W
) ( ) ( ) ( ) ( ) ( w F w F w F w F w F
Y X Y X W
+ =
). ( ) ( ) ( ) ( ) ( ) ( ) ( w f w F w F w f w f w f w f
Y X Y X Y X W
+ =
, ) ( ) (
w
Y X
e w f w f

= =
, 1 ) ( ) (
w
Y X
e w F w F

= =
). ( 2 ) 1 ( 2 2 ) (
2
w U e e e e w f
w w w w
W

= = (8-50)
| |. ) , max( / ) , min( Y X Y X Z =
). (z f
Z
PILLAI
30
Solution: Although represents a complicated
function, by partitioning the whole space as before, it is
possible to simplify this function. In fact
As before, this gives
Since X and Y are both positive random variables in this case,
we have The shaded regions in Figs 8.14 (a)-(b)
represent the two terms in the above sum.
Fig. 8.14
) max( / ) min(
>
=
. , /
, , /
Y X X Y
Y X Y X
Z
(8-51)
( ) ( ) ( )
( ) ( )
( ) / , / ,
, , .
Z
F z P Z z P X Y z X Y P Y X z X Y
P X Yz X Y P Y Xz X Y
= = + >
= + >
. 1 0 < < z
x
y
y x =
yz x =
(a)
x
y
y x =
xz y =
(b)
(8-52)
PILLAI
31
From Fig. 8.14
Hence
Example 8.13 (Discrete Case): Let X and Y be independent
Poisson random variables with parameters and
respectively. Let Determine the p.m.f of Z.
. ) , ( ) , ( ) (

0
z
0

0

0

=
=
+ =
x
y
XY
yz
x
XY Z
dydx y x f dxdy y x f z F
{ }
{ }
< <
+
=
+
= = + =
+ = + =

+ +

. otherwise , 0
, 1 0 ,
) 1 (
2

) 1 (
2
2
) , ( ) , ( ) , ( ) , ( ) (
2

0
2

0
) 1 ( 2
0
) ( ) ( 2

0

0

0
z
z
dy ue
z
dy ye dy e e y
dy yz y f y yz f y dx xz x f x dy y yz f y z f
u y z yz y y yz
XY XY XY XY Z

(8-54)
1
. Y X Z + =
(8-53)
) (z f
Z
z
Fig. 8.15
1
2
PILLAI
32
Solution: Since X and Y both take integer values
the same is true for Z. For any gives
only a finite number of options for X and Y. In fact, if X = 0,
then Y must be n; if X = 1, then Y must be n-1, etc. Thus the
event is the union of (n + 1) mutually exclusive
events given by
As a result
If X and Y are also independent, then
and hence
{ }, , 2 , 1 , 0
n Y X n = + = , , 2 , 1 , 0
( )
. ) , (
, ) ( ) (
0
0
=
=
= = =
|
|
.
|
\
|
= = = = + = =
n
k
n
k
k n Y k X P
k n Y k X P n Y X P n Z P

(8-55)
{ }, , k n Y k X A
k
= = =
( ) ) ( ) ( , k n Y P k X P k n Y k X P = = = = =
(8-56)
} { n Y X = +
k
A
. , , 2 , 1 , 0 n k =
PILLAI
33
Thus Z represents a Poisson random variable with
parameter indicating that sum of independent Poisson
random variables is also a Poisson random variable whose
parameter is the sum of the parameters of the original
random variables.
As the last example illustrates, the above procedure for
determining the p.m.f of functions of discrete random
variables is somewhat tedious. As we shall see in Lecture 10,
the joint characteristic function can be used in this context
to solve problems of this type in an easier fashion.
,
2 1
+
. , , 2 , 1 , 0 ,
!
) (

)! ( !
!
! )! ( !

) , ( ) (
2 1
) (
0
2 1
) (
2
0
1
0
2 1
2 1
2 1
=
+
=
=
= = = =
+
=
n
n
e
k n k
n
n
e
k n
e
k
e
k n Y k X P n Z P
n
n
k
k n k
k n
n
k
k
n
k

(8-57)
PILLAI
1
9. Two Functions of Two Random
Variables
In the spirit of the previous section, let us look at an
immediate generalization: Suppose X and Y are two random
variables with joint p.d.f Given two functions
and define the new random variables
How does one determine their joint p.d.f Obviously
with in hand, the marginal p.d.fs and
can be easily determined.
(9-1)
). , ( y x f
XY
). , (
) , (
Y X h W
Y X g Z
=
=
) , ( y x g
), , ( y x h
(9-2)
? ) , ( w z f
ZW
) , ( w z f
ZW
) (z f
Z
) (w f
W
PILLAI
2
The procedure is the same as that in (8-3). In fact for given z
and w,
where is the region in the xy plane such that the
inequalities and are simultaneously
satisfied.
We illustrate this technique in the next example.
( ) ( )
( )

= =
= =
w z
D y x
XY w z
ZW
dxdy y x f D Y X P
w Y X h z Y X g P w W z Z P w z F
,
) , (
,
, ) , ( ) , (
) , ( , ) , ( ) ( , ) ( ) , (
(9-3)
w z
D
,
z y x g ) , (
w y x h ) , (
x
y
w z
D
,
Fig. 9.1
w z
D
,
PILLAI
3
Example 9.1: Suppose X and Y are independent uniformly
distributed random variables in the interval
Define Determine
Solution: Obviously both w and z vary in the interval
Thus
We must consider two cases: and since they
give rise to different regions for (see Figs. 9.2 (a)-(b)).
). , max( ), , min( Y X W Y X Z = =
. 0 or 0 if , 0 ) , ( < < = w z w z F
ZW
). , ( w z f
ZW
). , 0 (
). , 0 (
(9-4)
( ) ( ). ) , max( , ) , min( , ) , ( w Y X z Y X P w W z Z P w z F
ZW
= =
(9-5)
z w
, z w<
w z
D
,
X
Y
w y =
) , ( w w
) , ( z z
z w a ) (
X
Y
) , ( w w
) , ( z z
z w b < ) (
Fig. 9.2
PILLAI
4
For from Fig. 9.2 (a), the region is represented
by the doubly shaded area. Thus
and for from Fig. 9.2 (b), we obtain
With
we obtain
Thus
, , ) , ( ) , ( ) , ( ) , ( z w z z F z w F w z F w z F
XY XY XY ZW
+ =
w z
D
,
(9-6)
, z w
, z w <
. , ) , ( ) , ( z w w w F w z F
XY ZW
< =
(9-7)
, ) ( ) ( ) , (
2

xy y x
y F x F y x F
Y X XY
= = =
(9-8)
< < <

< < <
=
. 0 , /
, 0 , / ) ( 2
) , (
2 2
2

z w w
w z z z w
w z F
ZW
(9-9)
< < <

=
. otherwise , 0
, 0 , / 2
) , (
2
w z
w z f
ZW
(9-10)
PILLAI
5
From (9-10), we also obtain
and
If and are continuous and differentiable
functions, then as in the case of one random variable (see (5-
30)) it is possible to develop a formula to obtain the joint
p.d.f directly. Towards this, consider the equations
For a given point (z,w), equation (9-13) can have many
solutions. Let us say
, 0 , 1
2
) , ( ) (

< <
|
.
|
\
|
= =

z
z
dw w z f z f
z
ZW Z
(9-11)
. ) , ( , ) , ( w y x h z y x g = =
(9-13)
. 0 ,
2
) , ( ) (

0
2

< < = =

w
w
dz w z f w f
w
ZW W
(9-12)
) , ( y x g
) , ( y x h
) , ( w z f
ZW
), , ( , ), , ( ), , (
2 2 1 1 n n
y x y x y x "
PILLAI
6
represent these multiple solutions such that (see Fig. 9.3)
(9-14)
. ) , ( , ) , ( w y x h z y x g
i i i i
= =
Fig. 9.3
(b)
x
y
1
) , (
1 1
y x
) , (
2 2
y x
) , (
i i
y x
) , (
n n
y x
z
(a)
w
) , ( w z
z
w
w w +
z z +
Consider the problem of evaluating the probability
( )
( ). ) , ( , ) , (
,
w w Y X h w z z Y X g z P
w w W w z z Z z P
+ < + < =
+ < + <
(9-15)
PILLAI
7
Using (7-9) we can rewrite (9-15) as
But to translate this probability in terms of we need
to evaluate the equivalent region for in the xy plane.
Towards this referring to Fig. 9.4, we observe that the point
A with coordinates (z,w) gets mapped onto the point with
coordinates (as well as to other points as in Fig. 9.3(b)).
As z changes to to point B in Fig. 9.4 (a), let
represent its image in the xy plane. Similarly as w changes
to to C, let represent its image in the xy plane.
(9-16)
), , ( y x f
XY
( ) . ) , ( , w z w z f w w W w z z Z z P
ZW
= + < + <
w z
A
) , (
i i
y x
z z + B
w w +
C
(a)
w
A
z
w
w
B
z
z
C
D
Fig. 9.4
(b)
y
A
i
y
B
x
i
x
C
D
i

PILLAI
8
Finally D goes to and represents the equivalent
parallelogram in the XY plane with area Referring back
to Fig. 9.3, the probability in (9-16) can be alternatively
expressed as
Equating (9-16) and (9-17) we obtain
To simplify (9-18), we need to evaluate the area of the
parallelograms in Fig. 9.3 (b) in terms of Towards
this, let and denote the inverse transformation in (9-14),
so that
D C B A

.
i
, D
( ) . ) , ( ) , (

=
i
i i i XY
i
i
y x f Y X P
(9-17)
. ) , ( ) , (

=
i
i
i i XY ZW
w z
y x f w z f
(9-18)
. w z
i
1
g
1
h
). , ( ), , (
1 1
w z h y w z g x
i i
= =
(9-19)
PILLAI
9
As the point (z,w) goes to the point
the point and the point
Hence the respective x and y coordinates of are given by
and
Similarly those of are given by
The area of the parallelogram in Fig. 9.4 (b) is
given by
, ) , ( A y x
i i

, ) , ( B w z z

+
, ) , ( C w w z

+ . ) , ( D w w z z

+ +
B
, ) , ( ) , (
1 1
1 1
z
z
g
x z
z
g
w z g w z z g
i

+ =
+ = +
. ) , ( ) , (
1 1
1 1
z
z
h
y z
z
h
w z h w z z h
i

+ =
+ = +
(9-20)
(9-21)
C
. ,
1 1
w
w
h
y w
w
g
x
i i

+
(9-22)
D C B A

( ) ( )
( ) ( ) ( ) ( ). cos sin sin cos
) sin(

C A B A C A B A
C A B A
i

=

=
(9-23)
PILLAI
10
But from Fig. 9.4 (b), and (9-20) - (9-22)
so that
and
The right side of (9-27) represents the Jacobian of
the transformation in (9-19). Thus
. cos , sin
, sin , cos
1 1
1 1
w
w
g
C A z
z
h
B A
w
w
h
C A z
z
g
B A
=

(9-25)
(9-24)
w z
z
h
w
g
w
h
z
g
i

|
.
|
\
|
=
1 1 1 1
(9-26)
|
|
|
|
|
.
|
\
|
=
|
.
|
\
|
=

w
h
z
h
w
g
z
g
z
h
w
g
w
h
z
g
w z
i
1 1
1 1
1 1 1 1
det
(9-27)
| ) , ( | w z J
PILLAI
11
. det | ) , ( |
1 1
1 1
|
|
|
|
|
.
|
\
|
=
w
h
z
h
w
g
z
g
w z J
(9-28)
Substituting (9-27) - (9-28) into (9-18), we get
since
where represents the Jacobian of the original
transformation in (9-13) given by
, ) , (
| ) , ( |
1
) , ( | ) , ( | ) , (

= =
i
i i XY
i i
i
i i XY ZW
y x f
y x J
y x f w z J w z f
(9-29)
| ) , ( |
1
| ) , ( |
i i
y x J
w z J =
(9-30)
| ) , ( |
i i
y x J
. det ) , (
,
i i
y y x x
i i
y
h
x
h
y
g
x
g
y x J
= =
|
|
|
|
|
.
|
\
|
=
(9-31)
PILLAI
12
Next we shall illustrate the usefulness of the formula in
(9-29) through various examples:
Example 9.2: Suppose X and Y are zero mean independent
Gaussian r.vs with common variance
Define where
Obtain
Solution: Here
Since
if is a solution pair so is From (9-33)
), / ( tan ,
1 2 2
X Y W Y X Z

= + =
.
2
). , ( w z f
ZW
.
2
1
) , (
2 2 2
2 / ) (
2

y x
XY
e y x f
+
=
(9-32)
, 2 / | | ), / ( tan ) , ( ; ) , (
1 2 2
= = + = =

w x y y x h w y x y x g z
(9-33)
) , (
1 1
y x
). , (
1 1
y x
. tan or , tan w x y w
x
y
= =
. 2 / | | w
(9-34)
PILLAI
13
Substituting this into z, we get
and
Thus there are two solution sets
We can use (9-35) - (9-37) to obtain From (9-28)
so that
. cos or , sec tan 1
2 2 2
w z x w x w x y x z = = + = + =
(9-35)
. sin tan w z w x y = =
(9-36)
. sin , cos , sin , cos
2 2 1 1
w z y w z x w z y w z x = = = =
(9-37)
). , ( w z J
,
cos sin
sin cos
) , ( z
w z w
w z w
w
y
z
y
w
x
z
x
w z J =
=
(9-38)
. | ) , ( | z w z J = (9-39)
PILLAI
14
We can also compute using (9-31). From (9-33),
Notice that agreeing with (9-30).
Substituting (9-37) and (9-39) or (9-40) into (9-29), we get
Thus
which represents a Rayleigh r.v with parameter and
) , ( y x J
.
1 1
) , (
2 2
2 2 2 2
2 2 2 2
z
y x
y x
x
y x
y
y x
y
y x
x
y x J =
+
=
+ +
+ +
=
(9-40)
|, ) , ( | / 1 | ) , ( |
i i
y x J w z J =
( )
.
2
| | , 0 ,
) , ( ) , ( ) , (
2 2
2 /
2
2 2 1 1

< < < =
+ =
w z e
z
y x f y x f z w z f
z
XY XY ZW
(9-41)
, 0 , ) , ( ) (
2 2
2 /
2
2 /
2 /
< < = =

z e
z
dw w z f z f
z
ZW Z

(9-42)
,
2
,
2
| | ,
1
) , ( ) (

0
< = =

w dz w z f w f
ZW W
(9-43)
PILLAI
15
which represents a uniform r.v in the interval
Moreover by direct computation
implying that Z and Ware independent. We summarize these
results in the following statement: If X and Y are zero mean
independent Gaussian random variables with common
variance, then has a Rayleigh distribution and
has a uniform distribution. Moreover these two derived r.vs
are statistically independent. Alternatively, with X and Y as
independent zero mean r.vs as in (9-32), X + jY represents a
complex Gaussian r.v. But
where Z and W are as in (9-33), except that for (9-45) to hold
good on the entire complex plane we must have
and hence it follows that the magnitude and phase of
). 2 / , 2 / (
2 2
Y X +
) / ( tan
1
X Y
,
jW
Ze jY X = +
(9-45)
) ( ) ( ) , ( w f z f w z f
W Z ZW
=
(9-44)
, < < W
PILLAI
16
a complex Gaussian r.v are independent with Rayleigh
and uniform distributions ~ respectively. The
statistical independence of these derived r.vs is an interesting
observation.
Example 9.3: Let X and Y be independent exponential
random variables with common parameter .
Define U = X + Y, V = X - Y. Find the joint and marginal
p.d.f of U and V.
Solution: It is given that
Now since u = x + y, v = x - y, always and there is
only one solution given by
Moreover the Jacobian of the transformation is given by
. 0 , 0 ,
1
) , (
/ ) (
2
> > =
+
y x e y x f
y x
XY

(9-46)
.
2
,
2
v u
y
v u
x

=
+
=
(9-47)
, | | u v <
( ) ) , ( U
PILLAI
17
and hence
represents the joint p.d.f of U and V. This gives
and
Notice that in this case the r.vs U and V are not independent.
As we show below, the general transformation formula in
(9-29) making use of two functions can be made useful even
when only one function is specified.
2
1 1
1 1
) , ( =
= y x J
, | | 0 ,
2
1
) , (
/
2
< < < =

u v e v u f
u
UV

(9-48)
, 0 ,
2
1
) , ( ) (
/
2

/
2

< < = = =

u e
u
dv e dv v u f u f
u
u
u
u
u
u
UV U

(9-49)
. ,
2
1
2
1
) , ( ) (
/ | |
| |
/
2

| |
< < = = =

v e du e du v u f v f
u
v
u
v
UV V

(9-50)
PILLAI
18
Auxiliary Variables:
Suppose
where X and Y are two random variables. To determine
by making use of the above formulation in (9-29), we can
define an auxiliary variable
and the p.d.f of Z can be obtained from by proper
integration.
Example 9.4: Suppose Z = X + Y and let W = Y so that the
transformation is one-to-one and the solution is given
by
), , ( Y X g Z =
Y W X W = = or
(9-51)
(9-52)
) , ( w z f
ZW
. ,
1 1
w z x w y = =
) (z f
Z
PILLAI
19
The Jacobian of the transformation is given by
and hence
or
which agrees with (8.7). Note that (9-53) reduces to the
convolution of and if X and Y are independent
random variables. Next, we consider a less trivial example.
Example 9.5: Let and be independent.
Define
1
1 0
1 1
) , ( = = y x J
) , ( ) , ( ) , (
1 1
w w z f y x f y x f
XY XY ZW
= =

+

= =

, ) , ( ) , ( ) ( dw w w z f dw w z f z f
XY ZW Z
(9-53)
) (z f
X
) (z f
Y
) 1 , 0 ( U X ) 1 , 0 ( U Y
( ) ). 2 cos( ln 2
2 / 1
Y X Z =
(9-54)
PILLAI
20
Find the density function of Z.
Solution: We can make use of the auxiliary variable W= Y
in this case. This gives the only solution to be
and using (9-28)
Substituting (9-55) - (9-57) into (9-29), we obtain
( )
,
,
1
2 / ) 2 sec(
1
2
w y
e x
w z
=
=

(9-55)
(9-56)
( )
( )
. ) 2 ( sec
1 0
) 2 ( sec
) , (
2 / ) 2 sec( 2
1
2 / ) 2 sec( 2
1 1
1 1
2
2
w z
w z
e w z
w
x
e w z
w
y
z
y
w
x
z
x
w z J
=
(9-57)
( )
, 1 0 ,
, ) 2 ( sec ) , (
2 / ) 2 sec( 2
2
< < + < <
=

w z
e w z w z f
w z
ZW

(9-58)
PILLAI
21
and
Let so that Notice
that as w varies from 0 to 1, u varies from to
Using this in (9-59), we get
which represents a zero mean Gaussian r.v with unit
variance. Thus Equation (9-54) can be used as
a practical procedure to generate Gaussian random variables
from those of two independent uniformly distributed random
sequences.
( )
. ) 2 ( sec ) , ( ) (
1
0
2 / ) 2 tan( 2 2 /
1
0
2 2

= = dw e w z e dw w z f z f
w z z
ZW Z

) 2 tan( w z u = . ) 2 ( sec 2
2
dw w z du =
. +
, ,
2
1
2

2
1
) (
2 /
1

2 / 2 /
2 2 2
< < = =

+

z e
du
e e z f
z u z
Z

(9-59)
(9-60)
). 1 , 0 ( N Z
PILLAI
22
Example 9.6 : Let X and Y be independent identically distributed
Geometric random variables with
(a) Show that min (X , Y ) and X Y are independent random variables.
(b) Show that min (X , Y ) and max (X , Y ) min (X , Y ) are also
independent random variables.
Solution: (a) Let
Z = min (X , Y ) , and W = X Y.
Note that Z takes only nonnegative values while W takes
both positive, zero and negative values We have
P(Z = m, W = n) = P{min (X , Y ) = m, X Y = n}. But
Thus
}, , 2 1, 0, { "
}. , 2 1, 0, { "
. , 2 , 1 , 0 , ) ( ) ( " = = = = = k pq K Y P k X P
k
= <
=
= =
negative. is
e nonnegativ is
) , min(
Y X W Y X X
Y X W Y X Y
Y X Z
) , , ) , (min(
) , , ) , (min(
)} ( , , ) , {min( ) , (
Y X n Y X m Y X P
Y X n Y X m Y X P
Y X Y X n Y X m Y X P n W m Z P
< = = +
= = =
< = = = = =
(9-61)
PILLAI
(9-62)
23
" " , 2 , 1 , 0 , 2 , 1 , 0 ,
0 , 0 , ) ( ) (
0 , 0 , ) ( ) (

) , , (
) , , ( ) , (
| | 2 2
= = =

< = = =
= = + =
=
< = = +
+ = = = = =
+
+
n m q p
n m pq pq n m Y P m X P
n m pq pq m Y P n m X P
Y X n m Y m X P
Y X n m X m Y P n W m Z P
n m
n m m
m n m
(9-63)
PILLAI
represents the joint probability mass function of the random variables
Z and W. Also
Thus Z represents a Geometric random variable since
and
), 1 ( 1
2
q p q + =
. , 2 , 1 , 0 , ) 1 (
) 1 ( ) 1 (
) 2 2 1 (
) , ( ) (
2
2 2 2
2 2 2
| | 2 2
1
2
"
"
= + =
+ = + =
+ + + =
= = = = =

m q q p
q pq q p
q q q p
q q p n W m Z P m Z P
m
m m
m
n n
n m
q
q
(9-64)
24
. , 2 , 1 , 0 ,
) 1 (
) , ( ) (
| |
2
1
| | 2 4 2 | | 2
0
| | 2 2
0
1
1
"
"
= =
= + + + =
= = = = =
+
=
n q
q p q q q p
q q p n W m Z P m W P
n
n n
m
n m
m
q
p
q
PILLAI
(9-65)
Note that
establishing the independence of the random variables Z and W.
The independence of X Y and min (X , Y ) when X and Y are
independent Geometric random variables is an interesting observation.
(b) Let
Z = min (X , Y ) , R = max (X , Y ) min (X , Y ).
In this case both Z and R take nonnegative integer values
Proceeding as in (9-62)-(9-63) we get
), ( ) ( ) , ( n W P m Z P n W m Z P = = = = =
(9-66)
(9-67)
. , 2 1, 0, "
25
= =
= =
=
= =
= = +
=
< + = = + = + = =
< + = = + + = = =
< = = +
= = = = =
+
+
+ +
. 0 , , 2 , 1 , 0 ,
, 2 , 1 , , 2 , 1 , 0 , 2

0 , , 2 , 1 , 0 ,
, 2 , 1 , , 2 , 1 , 0 ,

} , , ( ) , , {
} , , ( ) , , {
} , ) , min( ) , max( , ) , {min(
} , ) , min( ) , max( , ) , {min( } , {
2 2
2 2
n m q p
n m q p
n m pq pq
n m pq pq pq pq
Y X n m Y m X P Y X m Y n m X P
Y X n m Y m X P Y X n m X m Y P
Y X n Y X Y X m Y X P
Y X n Y X Y X m Y X P n R m Z P
m
n m
m n m
n m m m n m
"
" "
"
" "
(9-68)
Eq. (9-68) represents the joint probability mass function of Z and R
in (9-67). From (9-68),
( )
" , 2 , 1 , 0 , ) 1 (
1 ) 2 1 ( } , { ) (
2
0
2 2
1
2 2
2
= + =
+ = + = = = = =

=
m q q p
q p q q p n R m Z P m Z P
m
n
m
n
n m
p
q
(9-69)
PILLAI
26
PILLAI
(9-71)
and
From (9-68)-(9-70), we get
which proves the independence of the random variables Z and R
defined in (9-67) as well.
) ( ) ( ) , ( n R P m Z P n R m Z P = = = = =
=
=
= = = = =
+
+
=
. , 2 , 1 ,
0 ,
} , { ) (
1
2
1
0
" n q
n
n R m Z P n R P
n
m
q
p
q
p
(9-70)
1
10. Joint Moments and Joint Characteristic
Functions
Following section 6, in this section we shall introduce
various parameters to compactly represent the information
contained in the joint p.d.f of two r.vs. Given two r.vs X and
Y and a function define the r.v
Using (6-2), we can define the mean of Z to be
(10-1)
) , ( Y X g Z =
), , ( y x g
. ) ( ) (

+

= = dz z f z Z E
Z Z
(10-2)
PILLAI
2
However, the situation here is similar to that in (6-13), and
it is possible to express the mean of in terms
of without computing To see this, recall from
(5-26) and (7-10) that
where is the region in xy plane satisfying the above
inequality. From (10-3), we get
As covers the entire z axis, the corresponding regions
are nonoverlapping, and they cover the entire xy plane.
( ) ( )

=
+ < = = + <
z
D y x
XY
Z
y x y x f
z z Y X g z P z z f z z Z z P
) , (
) , (
) , ( ) (
(10-3)
z
D
) , ( Y X g Z =
) , ( y x f
XY
). (z f
Z
. ) , ( ) , ( ) (
) , (

=
z
D y x
Z
y x y x f y x g z z f z
(10-4)
z
D
z
PILLAI
3
By integrating (10-4), we obtain the useful formula
or
If X and Y are discrete-type r.vs, then
Since expectation is a linear operator, we also get
. ) , ( ) , ( ) ( ) (

+

+

+

= = dxdy y x f y x g dz z f z z E
XY Z
(10-5)
(10-6)
. ) , ( ) , ( )] , ( [

+

+

= dxdy y x f y x g y x g E
XY
= = =
i j
j i j i
y Y x X P y x g y x g E ). , ( ) , ( )] , ( [
(10-7)

= |
.
|
\
|
k
j i k k
k
k k
y x g E a y x g a E )]. , ( [ ) , (
(10-8)
PILLAI
4
If X and Y are independent r.vs, it is easy to see that
and are always independent of each other. In that
case using (10-7), we get the interesting result
However (10-9) is in general not true (if X and Y are not
independent).
In the case of one random variable (see (10- 6)), we defined
the parameters mean and variance to represent its average
behavior. How does one parametrically represent similar
cross-behavior between two random variables? Towards
this, we can generalize the variance definition given in
(6-16) as shown below:
) ( X g Z =
)]. ( [ )] ( [ ) ( ) ( ) ( ) (
) ( ) ( ) ( ) ( )] ( ) ( [

Y h E X g E dy y f y h dx x f x g
dxdy y f x f y h x g Y h X g E
Y X
Y X
= =
=

+

+

+

+

(10-9)
) (Y h W =
PILLAI
5
Covariance: Given any two r.vs X and Y, define
By expanding and simplifying the right side of (10-10), we
also get
It is easy to see that
To see (10-12), let so that
.
) ( ) ( ) ( ) ( ) , (
__ __ ____
Y X XY
Y E X E XY E XY E Y X Cov
Y X
=
= =
(10-10)
(10-12)
| |. ) ( ) ( ) , (
Y X
Y X E Y X Cov =
(10-11)
, Y aX U + =
. ) ( ) ( ) , ( Y Var X Var Y X Cov
{ } | |
. 0 ) ( ) , ( 2 ) (
) ( ) ( ) (
2
2
+ + =
+ =
Y Var Y X Cov a X Var a
Y X a E U Var
Y X

(10-13)
PILLAI
6
The right side of (10-13) represents a quadratic in the
variable a that has no distinct real roots (Fig. 10.1). Thus the
roots are imaginary (or double) and hence the discriminant
must be non-positive, and that gives (10-12). Using (10-12),
we may define the normalized parameter
or
and it represents the correlation
coefficient between X and Y.
| | ) ( ) ( ) , (
2
Y Var X Var Y X Cov
, 1 1 ,
) , (
) ( ) (
) , (
= =
XY
Y X
XY
Y X Cov
Y Var X Var
Y X Cov

(10-14)
a
) (U Var
(10-15)
Y X XY
Y X Cov = ) , (
Fig. 10.1
PILLAI
7
Uncorrelated r.vs: If then X and Y are said to be
uncorrelated r.vs. From (11), if X and Y are uncorrelated,
then
Orthogonality: X and Y are said to be orthogonal if
From (10-16) - (10-17), if either X or Y has zero mean, then
orthogonality implies uncorrelatedness also and vice-versa.
Suppose X and Y are independent r.vs. Then from (10-9)
with we get
and together with (10-16), we conclude that the random
variables are uncorrelated, thus justifying the original
definition in (10-10). Thus independence implies
uncorrelatedness.
(10-16)
, 0 =
XY
, ) ( , ) ( Y Y h X X g = =
). ( ) ( ) ( Y E X E XY E =
. 0 ) ( = XY E
(10-17)
), ( ) ( ) ( Y E X E XY E =
PILLAI
8
Naturally, if two random variables are statistically
independent, then there cannot be any correlation between
them However, the converse is in general not
true. As the next example shows, random variables can be
uncorrelated without being independent.
Example 10.1: Let Suppose X and Y
are independent. Define Z = X + Y, W = X - Y . Show that Z
and W are dependent, but uncorrelated r.vs.
Solution: gives the only solution set to be
Moreover
and
), 1 , 0 ( U X ). 1 , 0 ( U Y
| | , 2 , 2 , 1 1 , 2 0 w z w z w z w z > + < < < <
. 2 / 1 | ) , ( | = w z J
.
2
,
2
w z
y
w z
x

=
+
=
y x w y x z = + = ,
). 0 ( =
XY
PILLAI
9
< + < < < <

=
, otherwise , 0
, | | , 2 , 2 , 1 1 , 2 0 , 2 / 1
) , (
z w w z w z w z
w z f
ZW
(10-18)
Thus (see the shaded region in Fig. 10.2)
and hence
or by direct computation ( Z = X + Y )
Fig. 10.2
1
1
w
z
2
< < =
< < =
= =

, 2 1 , 2
2
1
, 1 0 ,
2
1
) , ( ) (
2
2

z z dw
z z dw
dw w z f z f
-z
z-
z
z
ZW Z
PILLAI
10
and
Clearly Thus Z and Ware not
independent. However
and
and hence
implying that Z and Ware uncorrelated random variables.
< <
< <
= =
, otherwise , 0
, 2 1 , 2
, 1 0 ,
) ( ) ( ) ( z z
z z
z f z f z f
Y X Z
(10-20)
). ( ) ( ) , ( w f z f w z f
W Z ZW

(10-21) | | , 0 ) ( ) ( ) )( ( ) (
2 2
= = + = Y E X E Y X Y X E ZW E
0 ) ( ) ( ) ( ) , ( = = W E Z E ZW E W Z Cov
(10-22)
< <
= = =

. otherwise , 0
, 1 1 |, | 1

2
1
) , ( ) (
| | 2

w w
dz dz w z f w f
w
|w|
ZW W
, 0 ) ( ) ( = = Y X E W E
(10-19)
PILLAI
11
Example 10.2: Let Determine the variance of Z
in terms of and
Solution:
and using (10-15)
In particular if X and Y are independent, then and
(10-23) reduces to
Thus the variance of the sum of independent r.vs is the sum
of their variances
(10-23)
, 0 =
XY
. bY aX Z + =
Y X
,
.
XY
Y X Z
b a bY aX E z E + = + = = ) ( ) (
| | ( ) | |
( )
. 2
) ( ) )( ( 2 ) (
) ( ) ( ) ( ) (
2 2 2 2
2 2 2 2
2
2 2
Y Y X XY X
Y Y X X
Y X Z Z
b ab a
Y E b Y X abE X E a
Y b X a E Z E z Var

+ + =
+ + =
+ = = =
.
2 2 2 2 2
Y X Z
b a + =
(10-24)
). 1 ( = = b a
PILLAI
12
Moments:
represents the joint moment of order (k,m) for X and Y.
Following the one random variable case, we can define the
joint characteristic function between two random variables
which will turn out to be useful for moment calculations.
Joint characteristic functions:
The joint characteristic function between X and Y is defined
as
Note that
, ) , ( ] [

+

+

= dy dx y x f y x Y X E
XY
m k m k
(10-25)
( )

( ) ( )

( , ) ( , ) .
j Xu Yv j Xu Yv
XY XY
u v E e e f x y dxdy
+ +
+ +

= =

(10-26)
. 1 ) 0 , 0 ( ) , ( =
XY XY
v u
PILLAI
13
It is easy to show that
If X and Y are independent r.vs, then from (10-26), we
obtain
Also
More on Gaussian r.vs :
From Lecture 7, X and Y are said to be jointly Gaussian
as if their joint p.d.f has the form in (7-
23). In that case, by direct substitution and simplification,
we obtain the joint characteristic function of two jointly
Gaussian r.vs to be
.
) , ( 1
) (
0 , 0
2
2
= =

=
v u
XY
v u
v u
j
XY E
(10-27)
). ( ) ( ) ( ) ( ) , ( v u e E e E v u
Y X
jvY juX
XY
= =
(10-28)
( ) ( , 0), ( ) (0, ).
X XY Y XY
u u v v = =
(10-29)
), , , , , (
2 2

Y X Y X
N
PILLAI
14
. ) ( ) , (
) 2 (
2
1
) (
) (
2 2 2 2
v uv u v u j
Yv Xu j
XY
Y Y X X Y X
e e E v u
+ + +
+
= =
(10-30)
Equation (10-14) can be used to make various conclusions.
Letting in (10-30), we get
and it agrees with (6-47).
From (7-23) by direct computation using (10-11), it is easy
to show that for two jointly Gaussian random variables
Hence from (10-14), in represents
the actual correlation coefficient of the two jointly Gaussian
r.vs in (7-23). Notice that implies
0 = v
, ) 0 , ( ) (
2 2
2
1
u u j
XY X
X X
e u u

= =
(10-31)
) , , , , (
2 2

Y X Y X
N
. ) , (
Y X
Y X Cov =
0 =
PILLAI
15
). ( ) ( ) , ( y f x f Y X f
Y X XY
=
Thus if X and Y are jointly Gaussian, uncorrelatedness does
imply independence between the two random variables.
Gaussian case is the only exception where the two concepts
imply each other.
Example 10.3: Let X and Y be jointly Gaussian r.vs with
parameters Define
Determine
Solution: In this case we can make use of characteristic
function to solve this problem.
). , , , , (
2 2

Y X Y X
N . bY aX Z + =
). (z f
Z
). , (
) ( ) ( ) ( ) (
) (
bu au
e E e E e E u
XY
jbuY jauX u bY aX j jZu
Z
=
= = =
+ +
(10-32)
PILLAI
16
From (10-30) with u and v replaced by au and bu
respectively we get
where
Notice that (10-33) has the same form as (10-31), and hence
we conclude that is also Gaussian with mean and
variance as in (10-34) - (10-35), which also agrees with (10-
23).
From the previous example, we conclude that any linear
combination of jointly Gaussian r.vs generate a Gaussian r.v.
, ) (
2 2 2 2 2 2 2
2
1
) 2 (
2
1
) ( u u j u b ab a u b a j
Z
Z Z Y Y X X Y X
e e u
+ + +
= =
(10-33)
(10-34)
(10-35)
bY aX Z + =
. 2
,
2 2 2 2 2
Y Y X X Z
Y X Z
b ab a
b a

+ + =
+ =
PILLAI
17
In other words, linearity preserves Gaussianity. We can use
the characteristic function relation to conclude an even more
general result.
Example 10.4: Suppose X and Y are jointly Gaussian r.vs as
in the previous example. Define two linear combinations
what can we say about their joint distribution?
Solution: The characteristic function of Z and Wis given by
As before substituting (10-30) into (10-37) with u and v
replaced by au + cv and bu + dv respectively, we get
. , dY cX W bY aX Z + = + =
(10-36)
). , ( ) (
) ( ) ( ) , (
) ( ) (
) ( ) ( ) (
dv bu cv au e E
e E e E v u
XY
dv bu jY cv au jX
v dY cX j u bY aX j Wv Zu j
ZW
+ + = =
= =
+ + +
+ + + +
(10-37)
PILLAI
18
, ) , (
) 2 (
2
1
) (
2 2 2 2
v uv u v u j
ZW
W Y X ZW Z W Z
e v u
+ + +
=
(10-38)
where
and
From (10-38), we conclude that Z and W are also jointly
distributed Gaussian r.vs with means, variances and
correlation coefficient as in (10-39) - (10-43).
(10-39)
, 2
, 2
,
,
2 2 2 2 2
2 2 2 2 2
Y Y X X W
Y Y X X Z
Y X W
Y X Z
d cd c
b ab a
d c
b a

+ + =
+ + =
+ =
+ =
(10-40)
(10-41)
(10-42)
.
) (
2 2
W Z
Y Y X X
ZW
bd bc ad ac

+ + +
=
(10-43)
PILLAI
19
To summarize, any two linear combinations of jointly
Gaussian random variables (independent or dependent) are
also jointly Gaussian r.vs.
Of course, we could have reached the same conclusion by
deriving the joint p.d.f using the technique
developed in section 9 (refer (7-29)).
Gaussian random variables are also interesting because of
the following result:
Central Limit Theorem: Suppose are a set of
zero mean independent, identically distributed (i.i.d) random
Linear
operator
Gaussian input Gaussian output
) , ( w z f
ZW
n
X X X , , ,
2 1
"
Fig. 10.3
PILLAI
20
variables with some common distribution. Consider their
scaled sum
Then asymptotically (as )
Proof: Although the theorem is true under even more
general conditions, we shall prove it here under the
independence assumption. Let represent their common
variance. Since
we have
.
2 1
n
X X X
Y
n
+ + +
=
"
(10-44)
n
). , 0 (
2
N Y
2
, 0 ) ( =
i
X E
. ) ( ) (
2 2
= =
i i
X E X Var
(10-45)
(10-46)
(10-47)
PILLAI
21
Consider
where we have made use of the independence of the
r.vs But
where we have made use of (10-46) - (10-47). Substituting
(10-49) into (10-48), we obtain
and as
( )

=
=
+ + +
=
= = =
n
i
X
n
i
n u jX n u X X X j jYu
Y
n u
e E e E e E u
i
i n
1
1
/ / ) (
) / (
) ( ) ( ) (
2 1
"
(10-48)
. , , ,
2 1 n
X X X "
,
1
2
1
! 3 ! 2
1 ) (
2 / 3
2 2
2 / 3
3 3 3 2 2 2
/
|
.
|
\
|
+ =
|
|
.
|
\
|
+ + + =
n
o
n
u
n
u X j
n
u X j
n
u jX
E e E
i i i
n u jX
i

"
(10-49)
,
1
2
1 ) (

2 / 3
2 2
n
Y
n
o
n
u
u
(
|
.
|
\
|
+ =

(10-50)
(51)
, ) ( lim
2 /
2 2
u
Y
n
e n

PILLAI
22
since
. 1 lim
x
n
n
e
n
x

|
.
|
\
|
(10-52)
[Note that terms in (10-50) decay faster than
But (10-51) represents the characteristic function of a zero
mean normal r.v with variance and (10-45) follows.
The central limit theorem states that a large sum of
independent random variables each with finite variance
tends to behave like a normal random variable. Thus the
individual p.d.fs become unimportant to analyze the
collective sum behavior. If we model the noise phenomenon
as the sum of a large number of independent random
variables (eg: electron motion in resistor components), then
this theorem allows us to conclude that noise behaves like a
Gaussian r.v.
3/ 2
(1/ ) o n
3/ 2
1/ ]. n
2
PILLAI
23
It may be remarked that the finite variance assumption is
necessary for the theorem to hold good. To prove its
importance, consider the r.vs to be Cauchy distributed, and
let
where each Then since
substituting this into (10-48), we get
which shows that Y is still Cauchy with parameter

In other words, central limit theorem doesnt hold good for
a set of Cauchy r.vs as their variances are undefined.
.
2 1
n
X X X
Y
n
+ + +
=
"
(10-53)
). ( C X
i
, ) (
| |u
X
e u
i

=
(10-54)
), / ( ) / ( ) (
/ | |
1
n C e n u u
n u
n
i
X Y

=
= =

(10-55)
. / n
PILLAI
24
Joint characteristic functions are useful in determining the
p.d.f of linear combinations of r.vs. For example, with X
and Y as independent Poisson r.vs with parameters and
respectively, let
Then
But from (6-33)
so that
i.e., sum of independent Poisson r.vs is also a Poisson

random variable.
. Y X Z + =
(10-56)
). ( ) ( ) ( u u u
Y X Z
=
(10-57)
) 1 ( ) 1 (
2 1
) ( , ) (

= =
ju ju
e
Y
e
X
e u e u

(10-58)
) ( ) (
2 1
) 1 )( (
2 1

+ =
+
P e u
ju
e
Z
(10-59)
1
PILLAI
1
11. Conditional Density Functions and
Conditional Expected Values
As we have seen in section 4 conditional probability density
functions are useful to update the information about an
event based on the knowledge about some other related
event (refer to example 4.7). In this section, we shall
analyze the situation where the related event happens to be a
random variable that is dependent on the one of interest.
From (4-11), recall that the distribution function of X given
an event B is
(11-1)
( )
( )
.
) (
) ) ( (
| ) ( ) | (
B P
B x X P
B x X P B x F
X

= =

PILLAI
2
Suppose, we let
Substituting (11-2) into (11-1), we get
where we have made use of (7-4). But using (3-28) and (7-7)
we can rewrite (11-3) as
To determine, the limiting case we can let
and in (11-4).
{ }. ) (
2 1
y Y y B < =
(11-3)
(11-2)
( )
,
) ( ) (
) , ( ) , (

) ) ( (
) ( , ) (
) | (
1 2
1 2
2 1
2 1
2 1
y F y F
y x F y x F
y Y y P
y Y y x X P
y Y y x F
Y Y
XY XY
X
=
<
<
= <

.
) (
) , (
) | (
2
1
2
1

2 1

= <
y
y
Y
x y
y
XY
X
dv v f
dudv v u f
y Y y x F
(11-4)
), | ( y Y x F
X
=
y y =
1
y y y + =
2
PILLAI
3
This gives
and hence in the limit
(To remind about the conditional nature on the left hand
side, we shall use the subscript X | Y (instead of X) there).
Thus
Differentiating (11-7) with respect to x using (8-7), we get
(11-5)
(11-6)
.
) (
) , (
) | ( lim ) | (

0
y f
du y u f
y y Y y x F y Y x F
Y
x
XY
X
y
X

= + < = =
(11-7)
y y f
y du y u f
dv v f
dudv v u f
y y Y y x F
Y
x
XY
y y
y
Y
x y y
y
XY
X

= + <

+

+
) (
) , (
) (
) , (
) | (

.
) (
) , (
) | (

|
y f
du y u f
y Y x F
Y
x
XY
Y X

= =
.
) (
) , (
) | (
|
y f
y x f
y Y x f
Y
XY
Y X
= =
(11-8)
PILLAI
4
It is easy to see that the left side of (11-8) represents a valid
probability density function. In fact
and
where we have made use of (7-14). From (11-9) - (11-10),
(11-8) indeed represents a valid p.d.f, and we shall refer to it
as the conditional p.d.f of the r.v X given Y = y. We may
also write
From (11-8) and (11-11), we have
(11-9)
0
) (
) , (
) | ( = =
y f
y x f
y Y x f
Y
XY
X
, 1
) (
) (
) (
) , (
) | (

|
= = = =

+

+

y f
y f
y f
dx y x f
dx y Y x f
Y
Y
Y
XY
Y X
(11-10)
(11-11) ). | ( ) | (
| |
y x f y Y x f
Y X Y X
= =
,
) (
) , (
) | (
|
y f
y x f
y x f
Y
XY
Y X
=
(11-12)
PILLAI
5
and similarly
If the r.vs X and Y are independent, then
and (11-12) - (11-13) reduces to
implying that the conditional p.d.fs coincide with their
unconditional p.d.fs. This makes sense, since if X and Y are
independent r.vs, information about Y shouldnt be of any
help in updating our knowledge about X.
In the case of discrete-type r.vs, (11-12) reduces to
.
) (
) , (
) | (
|
x f
y x f
x y f
X
XY
X Y
=
(11-13)

) ( ) ( ) , ( y f x f y x f
Y X XY
=
(11-14)
), ( ) | ( ), ( ) | (
| |
y f x y f x f y x f
Y X Y X Y X
= =
( ) .
) (
) , (
|
j
j i
j i
y Y P
y Y x X P
y Y x X P
=
= =
= = =
(11-15)
PILLAI
6
Next we shall illustrate the method of obtaining conditional
p.d.fs through an example.
Example 11.1: Given
determine and
Solution: The joint p.d.f is given to be a constant in the
shaded region. This gives
Similarly
and
(11-16)
< < <

=
, otherwise , 0
, 1 0 ,
) , (
y x k
y x f
XY
. 2 1
2
) , (
1
0
1
0

0
= = = = =

k
k
dy y k dy dx k dxdy y x f
y
XY
) | (
|
y x f
Y X
). | (
|
x y f
X Y
x
y
1
1
Fig. 11.1
, 1 0 ), 1 ( ) , ( ) (
1

< < = = =

x x k dy k dy y x f x f
x
XY X
(11-17)
. 1 0 , ) , ( ) (

0
< < = = =

y y k dx k dx y x f y f
y
XY Y
(11-18)
PILLAI
7
From (11-16) - (11-18), we get
and
We can use (11-12) - (11-13) to derive an important result.
From there, we also have
or
But
and using (11-23) in (11-22), we get
, 1 0 ,
1
) (
) , (
) | (
|
< < < = = y x
y y f
y x f
y x f
Y
XY
Y X
(11-19)
. 1 0 ,
1
1
) (
) , (
) | (
|
< < <
= = y x
x x f
y x f
x y f
X
XY
X Y
(11-20)
(11-21)
) ( ) | ( ) ( ) | ( ) , (
| |
x f x y f y f y x f y x f
X X Y Y Y X XY
= =
.
) (
) ( ) | (
) | (
|
|
x f
y f y x f
x y f
X
Y Y X
X Y
=
(11-22)

+

+

= =

|

) ( ) | ( ) , ( ) ( dy y f y x f dy y x f x f
Y Y X XY X
(11-23)
PILLAI
8
Equation (11-24) represents the p.d.f version of Bayes
theorem. To appreciate the full significance of (11-24), one
need to look at communication problems where
observations can be used to update our knowledge about
unknown parameters. We shall illustrate this using a simple
example.
Example 11.2: An unknown random phase is uniformly
distributed in the interval and where
n Determine
Solution: Initially almost nothing about the r.v is known,
so that we assume its a-priori p.d.f to be uniform in the
interval
.
) ( ) | (
) ( ) | (
) | (

|
|
+

=
dy y f y x f
y f y x f
x y f
Y Y X
Y Y X
YX
), 2 , 0 ( , n r + =
). , 0 (
2
N
). | ( r f
). 2 , 0 (
PILLAI
(24)
9
In the equation we can think of n as the noise
contribution and r as the observation. It is reasonable to
assume that and n are independent. In that case
since it is given that is a constant, behaves

like n. Using (11-24), this gives the a-posteriori p.d.f of
given r to be (see Fig. 11.2 (b))
where
, n r + =
) , ( ) | (
2
N r f =
(11-25)
=
, 2 0 , ) (
2
1
) ( ) | (
) ( ) | (
) | (
2 2
2 2
2 2
2 / ) (
2
0
2 / ) (
2 / ) (
2
0

< < =
= =

r
r
r
e r
d e
e
d f r f
f r f
r f
.
2
) (
2
0
2 / ) (
2 2
d e
r
r
(11-26)
n r + =
PILLAI
10
Notice that the knowledge about the observation r is
reflected in the a-posteriori p.d.f of in Fig. 11.2 (b). It is
no longer flat as the a-priori p.d.f in Fig. 11.2 (a), and it
shows higher probabilities in the neighborhood of . r =
) | (
|
r f
r

(b) a-posteriori p.d.f of

r =
Fig. 11.2
) (
(a) a-priori p.d.f of

2
1
2
Conditional Mean:
We can use the conditional p.d.fs to define the conditional
mean. More generally, applying (6-13) to conditional p.d.fs
we get
PILLAI
11
( ) . ) | ( ) ( | ) (

+

= dx B x f x g B x g E
X
(11-27)
and using a limiting argument as in (11-2) - (11-8), we get
to be the conditional mean of X given Y = y. Notice
that will be a function of y. Also
In a similar manner, the conditional variance of X given
Y = y is given by
we shall illustrate these calculations through an example.
( )

+

= = =

| |
) | ( | dx y x f x y Y X E
Y X Y X
(11-28)
) | ( y Y X E =
( ) . ) | ( |

| |

+

= = = dy x y f y x X Y E
X Y X Y
(11-29)
( ) ( )
( ). | ) (
) | ( | ) | (
2
|
2
2 2
|
y Y X E
y Y X E y Y X E Y X Var
Y X
Y X
= =
= = = =
(11-30)
PILLAI
12
Example 11.3: Let
Determine and
Solution: As Fig. 11.3 shows,
in the shaded area, and zero elsewhere.
From there
and
This gives
and
< < <

=
. otherwise , 0
, 1 | | 0 , 1
) , (
x y
y x f
XY
(11-31)
) | ( Y X E ). | ( X Y E
, 1 0 , 2 ) , ( ) (

< < = =
x x dy y x f x f
x
x
XY X
, 1 | | |, | 1 1 ) (
1
| |
< = =
y y dx y f
y
Y
, 1 | | 0 ,
| | 1
1
) (
) , (
) | (
|
< < <
= = x y
y y f
y x f
y x f
Y
XY
Y X
(11-32)
x
y
1
Fig. 11.3
1 ) , ( = y x f
XY
. 1 | | 0 ,
2
1
) (
) , (
) | (
|
< < < = = x y
x x f
y x f
x y f
X
XY
X Y
(11-33)
PILLAI
13
Hence
It is possible to obtain an interesting generalization of the
conditional mean formulas in (11-28) - (11-29). More
generally, (11-28) gives
But
. 1 | | ,
2
| | 1
|) | 1 ( 2
| | 1
2 |) | 1 (
1

|) | 1 (
) | ( ) | (
2
1
| |
2
1
| |
|
<
+
=
= =

y
y
y
y x
y
dx
y
x
dx y x f x Y X E
y
y
Y X
. 1 0 , 0
2 2
1
2
) | ( ) | (

2

|
< < = = = =

x
y
x
dy
x
y
dy x y yf X Y E
x
x
x
x
X Y
(11-34)
(11-35)
( )
( )
( ) ( ) { }. | ) ( ) ( | ) (
) ( ) | ( ) ( ) , ( ) (
) , ( ) ( ) ( ) ( ) (

| ) (

|

y Y x g E E dy y f y Y x g E
dy y f dx y x f x g dxdy y x f x g
dydx y x f x g dx x f x g x g E
Y
Y
y Y x g E
Y X XY
XY X
= = = =
= =
= =

+

+

=
+

+

+

+

+

+

( ) . ) | ( ) ( | ) (

|
+

= dx y x f x g y x g E
Y X
(11-36)
(11-37)
PILLAI
14
Obviously, in the right side of (11-37), the inner
expectation is with respect to X and the outer expectation is
with respect to Y. Letting g( X ) = X in (11-37) we get the
interesting identity
where the inner expectation on the right side is with respect
to X and the outer one is with respect to Y. Similarly, we
have
Using (11-37) and (11-30), we also obtain
{ }, ) | ( ) ( y Y X E E X E = =
(11-38)
{ }. ) | ( ) ( x X Y E E Y E = = (11-39)
( ). ) | ( ) ( y Y X Var E X Var = = (11-40)
PILLAI
15
Conditional mean turns out to be an important concept in
estimation and prediction theory. For example given an
observation about a r.v X, what can be say about a related
r.v Y ? In other words what is the best predicted value of Y
given that X = x ? It turns out that if best is meant in the
sense of minimizing the mean square error between Y and
its estimate , then the conditional mean of Y given X = x,
i.e., is the best estimate for Y (see Lecture 16
for more on Mean Square Estimation).
We conclude this lecture with yet another application
of the conditional density formulation.
Example 11.4 : Poisson sum of Bernoulli random variables
Let represent independent, identically
distributed Bernoulli random variables with
Y
) | ( x X Y E =
" 3, 2, 1, , = i X
i
q p X P p X P
i i
= = = = = 1 ) 0 ( , ) 1 (
16
and N a Poisson random variable with parameter that is
independent of all . Consider the random variables
Show that Y and Z are independent Poisson random variables.
Solution : To determine the joint probability mass function
of Y and Z, consider
. ,
1
Y N Z X Y
N
i
i
= =
=
(11-41)
i
X

PILLAI
+
=
=
+ = = =
+ = + = = =
+ = + = = =
+ = = = = = = = =
n m
i
i
N
i
i
n m N P m X P
n m N P n m N m X P
n m N P n m N m Y P
n m N m Y P n Y N m Y P n Z m Y P
1
1
) ( ) (
) ( ) (
) ( ) (
) , ( ) , ( ) , (
(11-42)
17
) ) , ( ~ ( of t independen are and that Note
1
N s X p n m B X
i
n m
i
i
+
+
=
|
.
|
\
|
+
|
.
|
\
|
+
=
+
) (

! !
)! (
n m
e q p
n m
n m
n m
n m

|
.
|
\
|
|
.
|
\
|
=

!
) (

!
) (
n
q
e
m
p
e
n
q
m
p

(11-43)
). ( ) ( n Z P m Y P = = =
Thus
and Y and Z are independent random variables.
Thus if a bird lays eggs that follow a Poisson random
variable with parameter , and if each egg survives
) ( ~ ) ( ~ and q P Z p P Y (11-44)
PILLAI
18
with probability p, then the number of chicks that survive
also forms a Poisson random variable with parameter . p
PILLAI
1
12. Principles of Parameter Estimation
The purpose of this lecture is to illustrate the usefulness of
the various concepts introduced and studied in earlier
lectures to practical problems of interest. In this context,
consider the problem of estimating an unknown parameter
of interest from a few of its noisy observations. For
example, determining the daily temperature in a city, or the
depth of a river at a particular spot, are problems that fall
into this category.
Observations (measurement) are made on data that contain
the desired nonrandom parameter and undesired noise.
Thus, for example,
PILLAI
2
(12-1) noise, part) (desired signal n Observatio + =
or, the i th observation can be represented as
Here represents the unknown nonrandom desired
parameter, and represent random variables
that may be dependent or independent from observation to
observation. Given n observations
the estimation problem is to obtain the best estimator for
the unknown parameter in terms of these observations.
Let us denote by the estimator for . Obviously is
a function of only the observations. Best estimator in what
sense? Various optimization strategies can be used to define
the term best.
. , , 2 , 1 , n i n X
i i
" = + = (12-2)
, , , ,
2 2 1 1 n n
x X x X x X = = = "
) (
X ) (
X
n i n
i
, , 2 , 1 , " =
PILLAI
3
Ideal solution would be when the estimate coincides
with the unknown . This of course may not be possible,
and almost always any estimate will result in an error given
by
One strategy would be to select the estimator so as to
minimize some function of this error - such as -
minimization of the mean square error (MMSE), or
minimization of the absolute value of the error etc.
A more fundamental approach is that of the principle of
Maximum Likelihood (ML).
The underlying assumption in any estimation problem is
(12-3)
. ) (
= X e
) (
X
) (
X
PILLAI
4
that the available data has something to do
with the unknown parameter . More precisely, we assume
that the joint p.d.f of given by
depends on . The method of maximum likelihood assumes
that the given sample data set is representative of the
population and chooses that value for
that most likely caused the observed data to occur, i.e., once
observations are given, is a
function of alone, and the value of that maximizes the
above p.d.f is the most likely value for , and it is chosen as
the ML estimate for (Fig. 12.1).
n
X X X , , ,
2 1
"
n
X X X , , ,
2 1
" ) ; , , , (
2 1

n X
x x x f "
), ; , , , (
2 1

n X
x x x f "
n
x x x , , ,
2 1
" ) ; , , , (
2 1

n X
x x x f "
) (
X
ML
) (
X
ML
) ; , , , (
2 1

n X
x x x f "
Fig. 12.1
PILLAI
5
Given the joint p.d.f
represents the likelihood function, and the ML estimate can
be determined either from the likelihood equation
or using the log-likelihood function (sup in (12-4)
represents the supremum operation)
If is differentiable and a supremum
exists in (12-5), then that must satisfy the equation
We will illustrate the above procedure through several
examples:
, , , ,
2 2 1 1 n n
x X x X x X = = = "
) ; , , , (
2 1

n X
x x x f "
) ; , , , ( sup
2 1
n X
x x x f
ML
"
(12-4)
). ; , , , ( log ) ; , , , (
2 1 2 1

n X n
x x x f x x x L " " =
(12-5)
) ; , , , (
2 1

n
x x x L "
ML
. 0
) ; , , , ( log
2 1
=
=
ML
n X
x x x f

"
(12-6)
PILLAI
6
Example 12.1: Let represent n
observations where is the unknown parameter of interest,
and are zero mean independent normal r.vs with
common variance Determine the ML estimate for .
Solution: Since are independent r.vs and is an unknown
constant, we have s are independent normal random
variables. Thus the likelihood function takes the form
Moreover, each is Gaussian with mean and variance
(Why?). Thus
Substituting (12-8) into (12-7) we get the likelihood function
to be
, 1 , n i w X
i i
= + =
.
2
, 1 , n i w
i
=
i
X
i
w
. ) ; ( ) ; , , , (
1
2 1
=
=
n
i
i X n X
x f x x x f
i
"
(12-7)
.
2
1
) ; (
2 2
2 / ) (
2

=
i
i
x
i X
e x f
i
X
2
(12-8)
PILLAI
7
.
) 2 (
1
) ; , , , (
1
2 2
2 / ) (
2 / 2
2 1
=
=

n
i
i
x
n
n X
e x x x f

"
It is easier to work with the log-likelihood function
in this case. From (12-9)
and taking derivative with respect to as in (12-6), we get
or
Thus (12-12) represents the ML estimate for , which
happens to be a linear estimator (linear function of the data)
in this case.
, 0
2
) (
2
) ; , , , ( ln
1
2
2 1
=
=
=
=

ML
ML
n
i
i n X
x x x x f

"
(12-10)
.
1
) (
=
=
n
i
i ML
X
n
X
(12-11)
(12-12)
,
2
) (
) 2 ln(
2
) ; , , , ( ln ) ; (
1
2
2
2
2 1
=
= =
n
i
i
n X
x n
x x x f X L
"
) ; ( X L
PILLAI
(12-9)
8
Notice that the estimator is a r.v. Taking its expected value,
we get
i.e., the expected value of the estimator does not differ from
the desired parameter, and hence there is no bias between
the two. Such estimators are known as unbiased estimators.
Thus (12-12) represents an unbiased estimator for .
Moreover the variance of the estimator is given by
The later terms are zeros since and are independent
r.vs.
, ) (
1
)] (
[
1
= =

=
n
i
i ML
X E
n
x E
(12-13)
. ) )( ( ) (
1

1
] )
[( )
(
1 , 1 1
2
2
2
1
2
2
)
`
+ =
|
.
|
\
|
= =

= = =
=
n
i
n
j i j
j i
n
i
i
n
i
i ML ML
X X E X E
n
X E
n
E Var

i
X
j
X
PILLAI
9
Then
Thus
another desired property. We say such estimators (that
satisfy (12-15)) are consistent estimators.
Next two examples show that ML estimator can be highly
nonlinear.
Example 12.2: Let be independent, identically
distributed uniform random variables in the interval
with common p.d.f
(12-14)
. ) (
1
)
(
2
2
2
1
2
n n
n
X Var
n
Var
n
i
i ML

= = =

=
, as 0 )
( n Var
ML
n
X X X , , ,
2 1
"
) , 0 (
(12-15)
, 0 ,
1
) ; (
< < =
i i X
x x f
i
(12-16)
PILLAI
10
where is an unknown parameter. Find the ML estimate
for .
Solution: The likelihood function in this case is given by
From (12-17), the likelihood function in this case is
maximized by the minimum value of , and since
we get
to be the ML estimate for . Notice that (18) represents a
nonlinear function of the observations. To determine
whether (12-18) represents an unbiased estimate for , we
need to evaluate its mean. To accomplish that in this case, it
is easier to determine its p.d.f and proceed directly. Let
. ) , , , max( 0 ,
1

1 , 0 ,
1
) ; , , , (
2 1
2 2 1 1
=
= < = = = =
n
n
i
n
n n X
x x x
n i x x X x X x X f
"
"
(12-17)
), , , , max(
2 1 n
X X X "
) , , , max( ) (
2 1 n ML
X X X X " = (12-18)
PILLAI
11
(12-19)
) , , , max(
2 1 n
X X X Z " =
with as in (12-16). Then
so that
Using (12-21), we get
In this case and hence the ML estimator is not
an unbiased estimator for . However, from (12-22) as
i
X
, 0 , ) ( ) (
) , , , ( ] ) , , , [max( ) (
1 1
2 1 2 1
< <
|
.
|
\
|
= = =
= =

= =
z
z
z F z X P
z X z X z X P z X X X P z F
n
n
i
X
n
i
i
n n Z
i
" "
(12-20)

. otherwise , 0
, 0 ,
) (
1
< <
=

z
nz
z f
n
n
Z
(12-21)
.
) / 1 1 ( 1
) ( ) ( )] (
[
1
0

0
n n
n
dz z
n
dz z f z Z E X E
n
n
n
n
Z ML
+
=
+
= = = =
+

(12-22)
, )] (
[ X E
ML
n
PILLAI
12
(12-23)
,
) / 1 1 (
lim )] (
[ lim
=
+
=

n
X E
n
ML
n
i.e., the ML estimator is an asymptotically unbiased
estimator. From (12-21), we also get
so that
Once again as implying that the
estimator in (12-18) is a consistent estimator.
Example 12.3: Let be i.i.d Gamma random
variables with unknown parameters and . Determine the
ML estimator for and .
2
) ( ) (
2
0
1
0
2 2
+
= = =

+
n
n
dz z
n
dz z f z Z E
n
n
Z

(12-24)
.
) 2 ( ) 1 ( ) 1 ( 2
)] ( [ ) ( )] (
[
2
2
2
2 2 2
2 2
+ +
=
+
+
= =
n n
n
n
n
n
n
Z E Z E X Var
ML

0 )] (
[ X Var
ML
, n
n
X X X , , ,
2 1
"
(12-25)
PILLAI
13
Solution: Here and
This gives the log-likelihood function to be
Differentiating L with respect to and we get
Thus from (12-29)
, 0
i
x
.
)) ( (
) , ; , , , (
1
1
2 1
1
=
=
n
i
x
i
n
n
n X
n
i
i
e x x x x f

"
(12-26)
. log ) 1 ( ) ( log log
) , ; , , , ( log ) , ; , , , (
1 1
2 1 2 1

= =
|
.
|
\
|
+ =
=
n
i
i
n
i
i
n X n
x x n n
x x x f x x x L

" "
(12-27)
, 0 log ) (
) (
log
, ,
1
= +
=
=
n
i
i
x
n
n
L
(12-28)
. 0
, ,
1
= =
=
=
n
i
i
x
n L
(12-29)
,
1
) (
=
=
n
i
i
ML
ML
x
n
X

(12-30)
PILLAI
14
and substituting (12-30) into (12-28), it gives
Notice that (12-31) is highly nonlinear in
In general the (log)-likelihood function can have more than
one solution, or no solutions at all. Further, the (log)-
likelihood function may not be even differentiable, or it can
be extremely complicated to solve explicitly
(see example 12.3, equation (12-31)).
Best Unbiased Estimator:
Referring back to example 12.1, we have seen that (12-12)
represents an unbiased estimator for with variance given
by (12-14). It is possible that, for a given n, there may be
other
.
1 1
log
)
(
)
log
1 1

= =
|
.
|
\
|
=
n
i
i
n
i
i
ML
ML
ML
x
n
x
n
ML
(12-31)
PILLAI
15
unbiased estimators to this problem with even lower
variances. If such is indeed the case, those estimators will be
naturally preferable compared to (12-12). In a given scenario,
is it possible to determine the lowest possible value for the
variance of any unbiased estimator? Fortunately, a theorem
by Cramer and Rao (Rao 1945; Cramer 1948) gives a
complete answer to this problem.
Cramer - Rao Bound: Variance of any unbiased estimator
based on observations for must
satisfy the lower bound
This important result states that the right side of (12-32) acts
as a lower bound on the variance of all unbiased estimator for
, provided their joint p.d.f satisfies certain regularity
restrictions. (see (8-79)-(8-81), Text).
n n
x X x X x X = = = , , ,
2 2 1 1
"
.

) ; , , , ( ln
1

) ; , , , ( ln
1
)
(
2
2 1
2
2
2 1
|
|
.
|
\
|

=
|
.
|
\
|
n X
n X
x x x f
E
x x x f
E
Var
"
"
(12-32)
PILLAI
16
Naturally any unbiased estimator whose variance coincides
with that in (12-32), must be the best. There are no better
solutions! Such estimates are known as efficient estimators.
Let us examine whether (12-12) represents an efficient
estimator. Towards this using (12-11)
and
and substituting this into the first form on the right side of
(12-32), we obtain the Cramer - Rao lower bound for this
problem to be
; ) (
1

) ; , , , ( ln
2
1
4
2
2 1
|
.
|
\
|
=
|
.
|
\
|
=
n
i
i
n X
X
x x x f

"
(12-33)
,
1

)] )( [( ] ) [(
1

) ; , , , ( ln
2
1
2
4
1 , 1 1
2
4
2
2 1

n
X X E X E
x x x f
E
n
i
n
i
n
j i j
j i
n
i
i
n X
= =
+ =
|
.
|
\
|

=
= = =
"
(12-34)
PILLAI
)
`
17
.
2
n
(12-35)
PILLAI
But from (12-14) the variance of the ML estimator in (12-12)
is the same as (12-35), implying that (12-12) indeed represents
an efficient estimator in this case, the best of all possibilities!
It is possible that in certain cases there are no unbiased
estimators that are efficient. In that case, the best estimator
will be an unbiased estimator with the lowest possible
variance.
How does one find such an unbiased estimator?
Fortunately Rao-Blackwell theorem (page 335-337, Text)
gives a complete answer to this problem.
Cramer-Rao bound can be extended to multiparameter case
as well (see page 343-345,Text).
18
So far, we discussed nonrandom parameters that are
unknown. What if the parameter of interest is a r.v with
a-priori p.d.f How does one obtain a good estimate
for based on the observations
One technique is to use the observations to compute its
a-posteriori probability density function
Of course, we can use the Bayes theorem in (11.22) to
obtain this a-posteriori p.d.f. This gives
Notice that (12-36) is only a function of , since
represent given observations. Once again, we can look for
? ) (
f
? , , ,
2 2 1 1 n n
x X x X x X = = = "
). , , , | (
2 1 | n X
x x x f "
.
) , , , (
) ( ) | , , , (
) , , , | (
2 1
2 1 |
2 1 |
n X
n X
n X
x x x f
f x x x f
x x x f
"
"
"

= (12-36)
n
x x x , , ,
2 1
"
PILLAI
19
the most probable value of suggested by the above
a-posteriori p.d.f. Naturally, the most likely value for is
that corresponding to the maximum of the a-posteriori p.d.f
(see Fig. 12.2). This estimator - maximum of the a-posteriori
p.d.f is known as the MAP estimator for .
It is possible to use other optimality criteria as well. Of
course, that should be the subject matter of another course!
MAP
) , , , | (
2 1 n
x x x f "
Fig. 12.2
PILLAI
1
13. The Weak Law and the Strong
Law of Large Numbers
James Bernoulli proved the weak law of large numbers (WLLN)
around 1700 which was published posthumously in 1713 in his
treatise Ars Conjectandi. Poisson generalized Bernoullis theorem
around 1800, and in 1866 Tchebychev discovered the method bearing
his name. Later on one of his students, Markov observed that
Tchebychevs reasoning can be used to extend Bernoullis theorem
to dependent random variables as well.
In 1909 the French mathematician Emile Borel proved a
deeper theorem known as the strong law of large numbers that further
generalizes Bernoullis theorem. In 1926 Kolmogorov derived
conditions that were necessary and sufficient for a set of mutually
independent random variables to obey the law of large numbers.
PILLAI
2
Let be independent, identically distributed Bernoulli random
Variables such that
and let represent the number of successes
in n trials. Then the weak law due to Bernoulli states that [see
Theorem 3-1, page 58, Text]
i.e., the ratio total number of successes to the total number of
trials tends to p in probability as n increases.
A stronger version of this result due to Borel and Cantelli
states that the above ratio k/n tends to p not only in probability, but
with probability 1. This is the strong law of large numbers (SLLN).
i
X
, 1 ) 0 ( , ) ( q p X P p X P
i i
= = = =
n
X X X k + + + =
2 1

{ } .
2
n
pq
p P
n
k
>
(13-1)
PILLAI
3
What is the difference between the weak law and the strong
law?
The strong law of large numbers states that if is a
sequence of positive numbers converging to zero, then
From Borel-Cantelli lemma [see (2-69) Text], when (13-2) is
satisfied the events can occur only for a finite
number of indices n in an infinite sequence, or equivalently, the
events occur infinitely often, i.e., the event k/n
converges to p almost-surely.
Proof: To prove (13-2), we proceed as follows. Since
} {
n
{ } .
1
<
= n
n
p P
n
k

(13-2)
{ }
n
p
n
k
<
4 4
4
n np k p
n
k

PILLAI
{ }
=
n n
k
n
A p
4
we have
and hence
where
By direct computation
{ } { } ( ) < + =
=
p P p P n n k p np k
n
k
n
k
n
n
k
) ( ) (
4 4 4 4
0
4
{ }
4 4
0
4
) ( ) (

n
k p np k
p P
n
k
n
n
k

=

(13-3)
k n k
n
i
i n
q p k X P k p

|
.
|
\
|
=
=
)
`
= =
k
n
1
) (
} { } {
1
4 4
1 0
4
) ( ) ( ) ( ) ( ) (

= = =
= =
n
i
i
n
i
i
n
k
n
p X E np X E k p np k
PILLAI
5
since
Substituting (13-4) also (13-3) we obtain
Let so that the above integral reads
and hence
, 3
)] 1 ( 3 [ ) )( 1 ( 3 ) (
) ( ) ( ) 1 ( 3 ) ( ) ( ) 1 ( 4 ) (
) ( } ) {(
2
2 3 3
2
1 1
2
1 1
3
1
4
1 1 1 1
4
1
pq n
pq n n n pq n n pq q p n
Y E Y E n n Y E Y E n n Y E
Y Y Y Y E Y E
j
n
i
n
j
i j
n
i
n
j
i
n
i
i
n
i
n
k
n
j
n
l
l j k i
n
i
i
=
+ + + =
+ + =
= =

= = = = =
= = = = =
(13-4)
can coincide with
j, k or l, and the second variable
takes (n-1) values
n i = 1
1 2 / 1 , 1 3 3 ) (
2 2 3 3 3
< < + = + pq pq q p q p q p
{ }
2 4
3
k
n
pq
P p
n

(13-5)
0
{ }

3/ 2
1/ 8 3/ 2
1
1 1
1 1
3 3 1
3 (1 2) 9 ,
( )
n n
k
n
P p pq pq x dx
n n
pq pq

= =
+
= + = <

1/ 8
1
n
=
PILLAI
6
thus proving the strong law by exhibiting a sequence of positive
numbers that converges to zero and satisfies (13-2).
We return back to the same question: What is the difference
between the weak law and the strong law?.
The weak law states that for every n that is large enough, the
ratio is likely to be near p with certain probability that
tends to 1 as n increases. However, it does not say that k/n is bound
to stay near p if the number of trials is increased. Suppose (13-1) is
satisfied for a given in a certain number of trials If additional
trials are conducted beyond the weak law does not guarantee that
the new k/n is bound to stay near p for such trials. In fact there can
be events for which for in some regular manner.
The probability for such an event is the sum of a large number of
very small probabilities, and the weak law is unable to say anything
specific about the convergence of that sum.
However, the strong law states (through (13-2)) that not only
all such sums converge, but the total number of all such events
1/ 8
1/
n
n =
1
( ) / /
n
i
X n k n
i
=
=
,
0
n
, / + > p n k
0
n n >
.
0
n
PILLAI
7
where is in fact finite! This implies that the probability
of the events as n increases becomes and remains
small, since with probability 1 only finitely many violations to
the above inequality takes place as
Interestingly, if it possible to arrive at the same conclusion
using a powerful bound known as Bernsteins inequality that is
based on the WLLN.
Bernsteins inequality : Note that
and for any this gives
Thus
} { > p
n
k
. n
+ > p n k /
) ( + > > p n k p
n
k
, 0 >
. 1
)) ( (
>
+ p n k
e
( )
(
( )
(
( )
=
+
+ =
+
+ =
= >
n
k
k n k
k
n
p n k
k n k
n
p n k
k
n
p n k
n
p n k
k n k
k
n
q p e
q p e
q p p P
n
k
0
)) ( (
) (
)) ( (
) (

} {

PILLAI
8
Since for any real x,
But is minimum for and hence
Similarly
( )
. ) (
) ( ) ( } {
0
n p q n
k n p k q
n
k
n
qe pe e
qe pe e p P
k
n
n
k

+ =
= >

2
x x
e x e +
.
) ( ) (
2 2 2 2 2
2 2 2 2

e qe pe
e p q e q p qe pe
p q
p q p q
+ =
+ + + +

. } {
2

n n
e p P
n
k

>
n n
2
2 / =
(13-6)
(13-7)
0. , } {
4 /
2
> >

n
e p P
n
k
(13-8)
4 /
2
} {

n
e p P
n
k

<
PILLAI
9
and hence we obtain Bernsteins inequality
Bernsteins inequality is more powerful than Tchebyshevs inequality
as it states that the chances for the relative frequency k /n exceeding
its probability p tends to zero exponentially fast as
Chebyshevs inequality gives the probability of k /n to lie
between and for a specific n. We can use Bernsteins
inequality to estimate the probability for k /n to lie between
and for all large n.
Towards this, let
so that
To compute the probability of the event note that its
complement is given by
2
/ 4
{ } 2 .
n
k
n
P p e

>
(13-9)
. n
p
+ p
+ p
p
} { + < = p p y
n
k
n
2
/ 4
( ) { } 2
c n
n
n
k
P y P p e

= >
,
n
m n
y
c
n
m n
c
n
m n
y y

=
=
= ) (
PILLAI
10
and using Eq. (2-68) Text,
This gives
or,
Thus k /n is bound to stay near p for all large enough n, in probability,
a conclusion already reached by the SLLN.
Discussion: Let Thus if we toss a fair coin 1,000 times,
from the weak law
2
2
/ 4

/ 4
2
( ) {1 ( )} 1 1 as
1
m
n n
n m n m
e
P y P y m
e
= =
=

. as 1 } , { + m m n all for p p P
n
k

{ } .
40
1
01 . 0
2
1

n
k
P
. 1 . 0 =
2
2
2
/ 4
/ 4

/ 4
2
( ) ( ) 2 .
1
m
c c n
n n
n m
n m n m
e
P y P y e
e
=
= =
=
PILLAI
11
Thus on the average 39 out of 40 such events each with 1000 or more
trials will satisfy the inequality or, it is quite possible
that one out of 40 such events may not satisfy it. As a result if we
continue the coin tossing experiment for an additional 1000 more
trials, with k representing the total number of successes up to the
current trial n, for it is quite possible that for few
such n the above inequality may be violated. This is still consistent
with the weak law, but not so often says the strong law. According
to the strong law such violations can occur only a finite number of
times each with a finite probability in an infinite sequence of trials,
and hence almost always the above inequality will be satisfied, i.e.,
the sample space of k /n coincides with that of p as
Next we look at an experiment to confirm the strong law:
Example: 2n red cards and 2n black cards (all distinct) are shuffled
together to form a single deck, and then split into half. What is
the probability that each half will contain n red and n black cards?
} { 1 . 0
2
1

n
k
, 2000 1000 = n
. n
12
Solution: From a deck of 4n cards, 2n cards can be chosen in
different ways. To determine the number of favorable draws of n red
and n black cards in each half, consider the unique draw consisting
of 2n red cards and 2n black cards in each half. Among those 2n red
cards, n of them can be chosen in different ways; similarly for
each such draw there are ways of choosing n black cards.Thus
the total number of favorable draws containing n red and n black
cards in each half are among a total of draws. This gives
the desired probability to be
For large n, using Stinglings formula we get
|
|
.
|
\
|
n
n
2
4
|
|
.
|
\
|
n
n 2
|
|
.
|
\
|
n
n 2
|
|
.
|
\
|
n
n
2
4
|
|
.
|
\
|
n
n 2
|
|
.
|
\
|
n
n 2
n
p
.
) ! ( )! 4 (
) ! 2 (
4
4
2
4
2 2
n n
n
p
n
n
n
n
n
n
n
=
|
.
|
\
|
|
.
|
\
|
|
.
|
\
|
PILLAI
13
For a full deck of 52 cards, we have which gives
and for a partial deck of 20 cards (that contains 10 red and 10 black
cards), we have and
One summer afternoon, 20 cards (containing 10 red and 10 black
cards) were given to a 5 year old child. The child split that partial
deck into two equal halves and the outcome was declared a success
if each half contained exactly 5 red and 5 black cards. With adult
supervision (in terms of shuffling) the experiment was repeated 100
times that very same afternoon. The results are tabulated below
in Table 13.1, and the relative frequency vs the number of trials
plot in Fig 13.1 shows the convergence of k /n to p.
n e n n e n n
e n n
p
n n n n
n n
n

2
] 2 [ ) 4 ( ) 4 ( 2
] ) 2 ( ) 2 ( 2 [
4 4 4
4 2 2

=

, 13 = n
221 . 0
n
p
5 = n
. 3568 . 0
n
p
PILLAI
14
Table 13.1
35 100 29 80 22 60 14 40 8 20
34 99 29 79 22 59 14 39 7 19
34 98 29 78 22 58 13 38 7 18
34 97 28 77 22 57 12 37 6 17
34 96 27 76 22 56 12 36 6 16
34 95 27 75 21 55 11 35 6 15
34 94 26 74 21 54 10 34 5 14
33 93 26 73 20 53 10 33 5 13
33 92 26 72 20 52 10 32 5 12
33 91 26 71 19 51 10 31 5 11
32 90 26 70 18 50 10 30 5 10
32 89 26 69 17 49 10 29 5 9
32 88 25 68 17 48 10 28 4 8
31 87 25 67 17 47 9 27 3 7
31 86 25 66 16 46 8 26 2 6
30 85 25 65 15 45 8 25 2 5
30 84 24 64 14 44 8 24 1 4
30 83 23 63 14 43 8 23 1 3
29 82 23 62 14 42 8 22 0 2
29 81 23 61 14 41 8 21 0 1
Number of
successes
Expt Number of
successes
Expt Number of
successes
Expt Number of
successes
Expt Number of
successes
Expt
PILLAI
15
The figure below shows results of an experiment of
100 trials.
0.3437182
Fig 13.1
n
n
p
1
14. Stochastic Processes
Let denote the random outcome of an experiment. To every such
outcome suppose a waveform
is assigned.
The collection of such
waveforms form a
stochastic process. The
set of and the time
index t can be continuous
or discrete (countably
infinite or finite) as well.
For fixed (the set of
all experimental outcomes), is a specific time function.
For fixed t,
is a random variable. The ensemble of all such realizations
over time represents the stochastic
) , ( t X
} {
k
S
i

) , (
1 1 i
t X X =
) , ( t X
PILLAI/Cha
t
1
t
2
t
) , (
n
t X
) , (
k
t X
) , (
2
t X
) , (
1
t X
.
.
.
Fig. 14.1
) , ( t X
0
) , ( t X
Introduction
2
process X(t). (see Fig 14.1). For example
where is a uniformly distributed random variable in
represents a stochastic process. Stochastic processes are everywhere:
Brownian motion, stock market fluctuations, various queuing systems
all represent stochastic phenomena.
If X(t) is a stochastic process, then for fixed t, X(t) represents
a random variable. Its distribution function is given by
Notice that depends on t, since for a different t, we obtain
a different random variable. Further
represents the first-order probability density function of the
process X(t).
), cos( ) (
0
+ = t a t X
} ) ( { ) , ( x t X P t x F
X
=
) , ( t x F
X
(14-1)
(14-2)
PILLAI/Cha
(0, 2 ),
dx
t x dF
t x f
X
X
) , (
) , ( =
3
For t = t
1
and t = t
2
, X(t) represents two different random variables
X
1
= X(t
1
) and X
2
= X(t
2
) respectively. Their joint distribution is
given by
and
represents the second-order density function of the process X(t).
Similarly represents the n
th
order density
function of the process X(t). Complete specification of the stochastic
process X(t) requires the knowledge of
for all and for all n. (an almost impossible task
in reality).
} ) ( , ) ( { ) , , , (
2 2 1 1 2 1 2 1
x t X x t X P t t x x F
X
= (14-3)
(14-4)
) , , , , , (
2 1 2 1 n n
t t t x x x f
X

) , , , , , (
2 1 2 1 n n
t t t x x x f
X

n i t
i
, , 2 , 1 , =
PILLAI/Cha
2
1 2 1 2
1 2 1 2
1 2
( , , , )
( , , , )

X
X
F x x t t
f x x t t
x x
4
Mean of a Stochastic Process:
represents the mean value of a process X(t). In general, the mean of
a process can depend on the time index t.
Autocorrelation function of a process X(t) is defined as
and it represents the interrelationship between the random variables
X
1
= X(t
1
) and X
2
= X(t
2
) generated from the process X(t).
Properties:
1.
2.
(14-5)
(14-6)
*
1
*
2 1 2
*
2 1
)}] ( ) ( { [ ) , ( ) , ( t X t X E t t R t t R
XX XX
= =
(14-7)
. 0 } | ) ( {| ) , (
2
> = t X E t t R
XX
PILLAI/Cha
(Average instantaneous power)

( ) { ( )} ( , )
X
t E X t x f x t dx
+
= =
* *
1 2 1 2 1 2 1 2 1 2 1 2
( , ) { ( ) ( )} ( , , , )
XX X
R t t E X t X t x x f x x t t dx dx = =

5
3. represents a nonnegative definite function, i.e., for any
set of constants
Eq. (14-8) follows by noticing that
The function
represents the autocovariance function of the process X(t).
Example 14.1
Let
Then
. ) ( for 0 } | {|
1
2
=
=
n
i
i i
t X a Y Y E
) ( ) ( ) , ( ) , (
2
*
1 2 1 2 1
t t t t R t t C
X X XX XX
=
(14-9)
. ) (

=
T
T
dt t X z

=
=
T
T
T
T
T
T
T
T
dt dt t t R
dt dt t X t X E z E
XX

2 1 2 1

2 1 2
*
1
2
) , (
)} ( ) ( { ] | [|
(14-10)
n
i i
a
1
} {
=
) , (
2 1
t t R
XX
= =

n
i
n
j
j i j i
t t R a a
XX
1 1
*
. 0 ) , (
(14-8)
PILLAI/Cha
6
Similarly
, 0 } {sin sin } {cos cos
)} {cos( )} ( { ) (

0 0
0
= =
+ = =

E t a E t a
t aE t X E t
X
). ( cos
2
)} 2 ) ( cos( ) ( {cos
2
)} cos( ) {cos( ) , (
2 1 0
2
2 1 0 2 1 0
2
2 0 1 0
2
2 1
t t
a
t t t t E
a
t t E a t t R
XX
=
+ + + =
+ + =

(14-12)
(14-13)
Example 14.2
). 2 , 0 ( ~ ), cos( ) (
0
U t a t X + = (14-11)
This gives
PILLAI/Cha
= = =

2
0
}. {sin 0 cos } {cos since
2
1
E d E
7
Stationary Stochastic Processes
Stationary processes exhibit statistical properties that are
invariant to shift in the time index. Thus, for example, second-order
stationarity implies that the statistical properties of the pairs
{X(t
1
) , X(t
2
) } and {X(t
1
+c) , X(t
2
+c)} are the same for any c.
Similarly first-order stationarity implies that the statistical properties
of X(t
i
) and X(t
i
+c) are the same for any c.
In strict terms, the statistical properties are governed by the
joint probability density function. Hence a process is n
th
-order
Strict-Sense Stationary (S.S.S) if
for any c, where the left side represents the joint density function of
the random variables and
the right side corresponds to the joint density function of the random
variables
A process X(t) is said to be strict-sense stationary if (14-14) is
true for all
) , , , , , ( ) , , , , , (
2 1 2 1 2 1 2 1
c t c t c t x x x f t t t x x x f
n n n n X X
+ + +
(14-14)
) ( , ), ( ), (
2 2 1 1 n n
t X X t X X t X X = = =
). ( , ), ( ), (
2 2 1 1
c t X X c t X X c t X X
n n
+ =
+ =
+ =

. and , 2 , 1 , , , 2 , 1 , c any n n i t
i
= =
PILLAI/Cha
8
For a first-order strict sense stationary process,
from (14-14) we have
for any c. In particular c = t gives
i.e., the first-order density of X(t) is independent of t. In that case
Similarly, for a second-order strict-sense stationary process
we have from (14-14)
for any c. For c = t
2
we get
) , ( ) , ( c t x f t x f
X X
+
(14-16)
(14-15)
(14-17)
) ( ) , ( x f t x f
X X
=

[ ( )] ( ) , E X t x f x dx a constant.
+
= =
) , , , ( ) , , , (
2 1 2 1 2 1 2 1
c t c t x x f t t x x f
X X
+ +
) , , ( ) , , , (
2 1 2 1 2 1 2 1
t t x x f t t x x f
X X
(14-18)
PILLAI/Cha
9
i.e., the second order density function of a strict sense stationary
process depends only on the difference of the time indices
In that case the autocorrelation function is given by
i.e., the autocorrelation function of a second order strict-sense
stationary process depends only on the difference of the time
indices
Notice that (14-17) and (14-19) are consequences of the stochastic
process being first and second-order strict sense stationary.
On the other hand, the basic conditions for the first and second order
stationarity Eqs. (14-16) and (14-18) are usually difficult to verify.
In that case, we often resort to a looser definition of stationarity,
known as Wide-Sense Stationarity (W.S.S), by making use of
.
2 1
= t t
.
2 1
t t =
(14-19)
PILLAI/Cha
*
1 2 1 2
*
1 2 1 2 1 2 1 2
*
1 2
( , ) { ( ) ( )}
( , , )
( ) ( ) ( ),
XX
X
XX XX XX
R t t E X t X t
x x f x x t t dx dx
R t t R R

=
= =
= = =

10
(14-17) and (14-19) as the necessary conditions. Thus, a process X(t)
is said to be Wide-Sense Stationary if
(i)
and
(ii)
i.e., for wide-sense stationary processes, the mean is a constant and
the autocorrelation function depends only on the difference between
the time indices. Notice that (14-20)-(14-21) does not say anything
about the nature of the probability density functions, and instead deal
with the average behavior of the process. Since (14-20)-(14-21)
follow from (14-16) and (14-18), strict-sense stationarity always
implies wide-sense stationarity. However, the converse is not true in
general, the only exception being the Gaussian process.
This follows, since if X(t) is a Gaussian process, then by definition
are jointly Gaussian random
variables for any whose joint characteristic function
is given by
= )} ( { t X E
(14-21)
(14-20)
), ( )} ( ) ( {
2 1 2
*
1
t t R t X t X E
XX
=
) ( , ), ( ), (
2 2 1 1 n n
t X X t X X t X X = = =
PILLAI/Cha
n
t t t , ,
2 1

11
where is as defined on (14-9). If X(t) is wide-sense
stationary, then using (14-20)-(14-21) in (14-22) we get
and hence if the set of time indices are shifted by a constant c to
generate a new set of jointly Gaussian random variables
then their joint characteristic
function is identical to (14-23). Thus the set of random variables
and have the same joint probability distribution for all n and
all c, establishing the strict sense stationarity of Gaussian processes
from its wide-sense stationarity.
To summarize if X(t) is a Gaussian process, then
wide-sense stationarity (w.s.s) strict-sense stationarity (s.s.s).
Notice that since the joint p.d.f of Gaussian random variables depends
only on their second order statistics, which is also the basis
) , (
k i
t t C
XX
1 ,
( ) ( , ) / 2
1 2
( , , , )
XX
n n
k k i k i k
k l k
X
j t C t t
n
e

=

=
(14-22)
1
2
1 1 1 1
( )
1 2
( , , , )
XX
n n n
k i k i k
k k
X
j C t t
n
e

= = =

=
(14-23)
n
i i
X
1
} {
=
n
i i
X
1
} {
=
PILLAI/Cha
), (
1 1
c t X X + =
) ( , ), (
2 2
c t X X c t X X
n n
+ =
+ =

12
for wide sense stationarity, we obtain strict sense stationarity as well.
From (14-12)-(14-13), (refer to Example 14.2), the process
in (14-11) is wide-sense stationary, but
not strict-sense stationary.
Similarly if X(t) is a zero mean wide
sense stationary process in Example 14.1,
then in (14-10) reduces to
As t
1
, t
2
varies from T to +T, varies
from 2T to + 2T. Moreover is a constant
over the shaded region in Fig 14.2, whose area is given by
and hence the above integral reduces to
), cos( ) (
0
+ = t a t X
PILLAI/Cha
2
z
. ) ( } | {|

2 1 2 1
2 2

= =
T
T
T
T
z
dt dt t t R z E
XX
2 1
t t =
) (
XX
R
) 0 ( >
d T d T T ) 2 ( ) 2 (
2
1
) 2 (
2
1
2 2
=
. ) 1 )( ( |) | 2 )( (
2
2
2
| |
2
1
2
2
2

= =
T
t
T T
T
t
z
d R d T R
XX XX

(14-24)
T
T
T
T 2
2
t
1
t
Fig. 14.2
2 1
t t =
13
Systems with Stochastic Inputs
A deterministic system
1
transforms each input waveform into
an output waveform by operating only on the
time variable t. Thus a set of realizations at the input corresponding
to a process X(t) generates a new set of realizations at the
output associated with a new process Y(t).
) , (
i
t X
)] , ( [ ) , (
i i
t X T t Y =
)} , ( { t Y
Our goal is to study the output process statistics in terms of the input
process statistics and the system function.
1
A stochastic system on the other hand operates on both the variables t and
.
PILLAI/Cha
] [ T

) (t X

) (t Y
t
t
) , (
i
t X
) , (
i
t Y
Fig. 14.3
14
Deterministic Systems
Systems with Memory
Time-Invariant
systems
Linear systems
Linear-Time Invariant
(LTI) systems
Memoryless Systems
)] ( [ ) ( t X g t Y =
)] ( [ ) ( t X L t Y =
PILLAI/Cha
Time-varying
systems
Fig. 14.3
. ) ( ) (
) ( ) ( ) (

+

+

=
=

d t X h
d X t h t Y ( ) h t ( ) X t
LTI system
15
Memoryless Systems:
The output Y(t) in this case depends only on the present value of the
input X(t). i.e.,
(14-25)
PILLAI/Cha
)} ( { ) ( t X g t Y =
Memoryless
system
Memoryless
system
Memoryless
system
Strict-sense
stationary input
Wide-sense
stationary input
X(t) stationary
Gaussian with
) (
XX
R
Strict-sense
stationary output.
Need not be
stationary in
any sense.
Y(t) stationary,but
not Gaussian with
(see (14-26)).
). ( ) (
XX XY
R R =
(see (9-76), Text for a proof.)
Fig. 14.4
16
Theorem: If X(t) is a zero mean stationary Gaussian process, and
Y(t) = g[X(t)], where represents a nonlinear memoryless device,
then
Proof:
where are jointly Gaussian random
variables, and hence
) ( g
)}. ( { ), ( ) ( X g E R R
XX XY

= = (14-26)
2 1 2 1 2 1
) , ( ) (
)}] ( { ) ( [ )} ( ) ( { ) (
2 1
dx dx x x f x g x
t X g t X E t Y t X E R
X X
XY

=
= =
(14-27)
) ( ), (
2 1
= = t X X t X X
PILLAI/Cha
* 1
1 2
/ 2
1 2
1 2 1 2
*
*

1
2 | |
(0) ( )
( ) (0)
( , )
( , ) , ( , )
{ }
XX XX
XX XX
X X
x A x
T T
A
R R
R R
f x x e
X X X x x x
A E X X LL
=
= =
| |
= = =
|
\ .

17
where L is an upper triangular factor matrix with positive diagonal
entries. i.e.,
Consider the transformation
so that
and hence Z
1
, Z
2
are zero mean independent Gaussian random
variables. Also
and hence
The Jacobaian of the transformation is given by
.
0

22
12 11
|
.
|
\
|
=
l
l l
L
I AL L L X X E L Z Z E = = =

1 1
* 1 * * 1
*
} { } {
* * *
1 * 1 2 2
1 2
. x A x z L A Lz z z z z

= = = +
2 22 2 2 12 1 11 1
, z l x z l z l x z L x = + = =
PILLAI/Cha
1 1

1 2 1 2
( , ) , ( , )
T T
Z L X Z Z z L x z z

= = = =

18
Hence substituting these into (14-27), we obtain
where This gives
. | | | | | |
2 / 1 1
= = A L J
2 2
1 2
1/ 2
11 1 12 2 22 2
11 1 22 2 1 2
1 2
12 2 22 2 1 2
1 2

/ 2 / 2
1 1
| |
2 | |

1 2

1 2

( ) ( ) ( )
( ) ( ) ( )
( ) ( ) ( )

XY J
A
z z
z z
z z
R l z l z g l z e e
l z g l z f z f z dz dz
l z g l z f z f z dz dz
+ +

+ +

+ +

= +
=
+
=

2
2
2
12
22
22
11 1 1 22 2 2
1 2
12 2 22 2 2
2
/ 2
2
2

1 2

2
1
2

/ 2
1
2
( ) ( ) ( )
( ) ( )
( ) ,
z z
z
z
u l
l
l
l z f z dz g l z f z dz
l z g l z f z dz
e
ug u e du
+ +

+
+
=

_
0
PILLAI/Cha
22 2
. u l z =
19
2
22
2
22
22
2
2
( )

/ 2
1
12 22

2
( )
( )

( ) ( )
( ) ( ) ( ) ,
u
XY
u
u
XX
f u
u
df u
f u
du
u
l
l
u
l
R l l g u e du
R g u f u du
=
+
_
Hence ). ( gives since
22 12
*
XX
R l l LL A = =
the desired result, where Thus if the input to
a memoryless device is stationary Gaussian, the cross correlation
function between the input and the output is proportional to the
input autocorrelation function.
PILLAI/Cha
), ( )} ( { ) (
} ) ( ) ( | ) ( ) ( ){ ( ) (

XX XX
XX XY
R X g E R
du u f u g u f u g R R
u u
=
=

+ =

+

+

0
)]. ( [ X g E

=
20
Linear Systems: represents a linear system if
Let
represent the output of a linear system.
Time-Invariant System: represents a time-invariant system if
i.e., shift in the input results in the same shift in the output also.
If satisfies both (14-28) and (14-30), then it corresponds to
a linear time-invariant (LTI) system.
LTI systems can be uniquely represented in terms of their output to
a delta function
] [ L
)} ( { ) ( t X L t Y =
)}. ( { )} ( { )} ( ) ( {
2 2 1 1 2 2 1 1
t X L a t X L a t X a t X a L + = + (14-28)
] [ L
) ( )} ( { )} ( { ) (
0 0
t t Y t t X L t X L t Y = =
(14-29)
(14-30)
] [ L
PILLAI/Cha
LTI
) (t ) (t h
Impulse
Impulse
response of
the system
t
) (t h
Impulse
response
Fig. 14.5
21
Eq. (14-31) follows by expressing X(t) as
and applying (14-28) and (14-30) to Thus )}. ( { ) ( t X L t Y =
+

=

) ( ) ( ) ( d t X t X
(14-31)
(14-32)
(14-33)
PILLAI/Cha
. ) ( ) ( ) ( ) (
)} ( { ) (
} ) ( ) ( {
} ) ( ) ( { )} ( { ) (

+

+

+

+

+

= =
=
=
= =

d t X h d t h X
d t L X
d t X L
d t X L t X L t Y
By Linearity
By Time-invariance
then
LTI
+

+

=
=

) ( ) (
) ( ) ( ) (

d t X h
d X t h t Y
arbitrary
input
t
) (t X
t
) (t Y
Fig. 14.6
) (t X ) (t Y
22
Output Statistics: Using (14-33), the mean of the output process
is given by
Similarly the cross-correlation function between the input and output
processes is given by
Finally the output autocorrelation function is given by
). ( ) ( ) ( ) (
} ) ( ) ( { )} ( { ) (

t h t d t h
d t h X E t Y E t
X X
Y
= =
= =

+

+

(14-34)
). ( ) , (
) ( ) , (
) ( )} ( ) ( {
} ) ( ) ( ) ( {
)} ( ) ( { ) , (
2
*
2 1

*
2 1

*
2 1

*
2 1
2
*
1 2 1
t h t t R
d h t t R
d h t X t X E
d h t X t X E
t Y t X E t t R
XX
XX
XY
=
=
=
=
=

+

+

+

*
*
(14-35)
PILLAI/Cha
23
or
), ( ) , (
) ( ) , (
) ( )} ( ) ( {
} ) ( ) ( ) ( {
)} ( ) ( { ) , (
1 2 1

2 1

2 1

2
*
1
2
*
1 2 1
t h t t R
d h t t R
d h t Y t X E
t Y d h t X E
t Y t Y E t t R
XY
XY
YY
=
=
=
=
=

+

+

+

*
). ( ) ( ) , ( ) , (
1 2
*
2 1 2 1
t h t h t t R t t R
XX YY
=
(14-36)
(14-37)
PILLAI/Cha
h(t)
) (t
X
) (t
Y
h*(t
2
) h(t
1
)

) , (
2 1
t t R
XY

) , (
2 1
t t R
YY
) , (
2 1
t t R
XX
(a)
(b)
Fig. 14.7
24
In particular if X(t) is wide-sense stationary, then we have
so that from (14-34)
Also so that (14-35) reduces to
Thus X(t) and Y(t) are jointly w.s.s. Further, from (14-36), the output
autocorrelation simplifies to
From (14-37), we obtain
X X
t = ) (
constant. a c d h t
X X Y
, ) ( ) (

= =

+

(14-38)
) ( ) , (
2 1 2 1
t t R t t R
XX XX
=
(14-39)
). ( ) ( ) (
, ) ( ) ( ) , (
2 1

2 1 2 1

YY XY
XY YY
R h R
t t d h t t R t t R
= =
= =
+

(14-40)
). ( ) ( ) ( ) (
*
h h R R
XX YY
=
(14-41)
PILLAI/Cha
. ), ( ) ( ) (
) ( ) ( ) , (
2 1
*

*
2 1 2 1
t t R h R
d h t t R t t R
XY XX
XX XY
= = =
+ =
25
From (14-38)-(14-40), the output process is also wide-sense stationary.
This gives rise to the following representation
PILLAI/Cha
LTI system
h(t)
Linear system
wide-sense
stationary process
strict-sense
stationary process
Gaussian
process (also
stationary)
wide-sense
stationary process.
strict-sense
stationary process
(see Text for proof )
Gaussian process
(also stationary)
) (t X
) (t Y
LTI system
h(t)
) (t X
) (t X
) (t Y
) (t Y
(a)
(b)
(c)
Fig. 14.8
26
White Noise Process:
W(t) is said to be a white noise process if
i.e., E[W(t
1
) W
*
(t
2
)] = 0 unless t
1
= t
2
.
W(t) is said to be wide-sense stationary (w.s.s) white noise
if E[W(t)] = constant, and
If W(t) is also a Gaussian process (white Gaussian process), then all of
its samples are independent random variables (why?).
For w.s.s. white noise input W(t), we have
), ( ) ( ) , (
2 1 1 2 1
t t t q t t R
WW
= (14-42)
). ( ) ( ) , (
2 1 2 1
q t t q t t R
WW
= = (14-43)
White noise
W(t)
LTI
h(t)
Colored noise
( ) ( ) ( ) N t h t W t =
PILLAI/Cha
Fig. 14.9
27
and
where
Thus the output of a white noise process through an LTI system
represents a (colored) noise process.
Note: White noise need not be Gaussian.
White and Gaussian are two different concepts!
) ( ) ( ) (
) ( ) ( ) ( ) (
*
*

q h qh
h h q R
nn
= =
=
(14-45)
. ) ( ) ( ) ( ) ( ) (

* *
+

+ = = d h h h h
(14-46)
PILLAI/Cha
(14-44)

[ ( )] ( ) ,
W
E N t h d
+
=

a constant
28
Upcrossings and Downcrossings of a stationary Gaussian process:
Consider a zero mean stationary Gaussian process X(t) with
autocorrelation function An upcrossing over the mean value
occurs whenever the realization X(t)
passes through zero with
positive slope. Let
represent the probability
of such an upcrossing in
the interval
We wish to determine
Since X(t) is a stationary Gaussian process, its derivative process
is also zero mean stationary Gaussian with autocorrelation function
(see (9-101)-(9-106), Text). Further X(t) and
are jointly Gaussian stationary processes, and since (see (9-106), Text)
). (
XX
R
t
). , ( t t t +
.
Fig. 14.10
) (t X
) ( ) (
XX X X
R R

=

) (t X
,
) (
) (
d
dR
R
XX
X X
=
PILLAI/Cha
Upcrossings
t
) (t X
Downcrossing
29
we have
which for gives
i.e., the jointly Gaussian zero mean random variables
are uncorrelated and hence independent with variances
respectively. Thus
To determine the probability of upcrossing rate,
0 =
) (
) (
) (
) (
) (
X X
XX XX
X X
R
d
dR
d
dR
R

= =
=
(14-48)
(14-47)
(0) 0 [ ( ) ( )] 0
XX
R E X t X t

= =
) ( and ) (
2 1
t X X t X X

= = (14-49)
,
0 ) 0 ( ) 0 ( and ) 0 (
2
2
2
1
>

= = =
XX X X XX
R R R
(14-50)
2 2
1 1
2 2
1 2
1 2
1 2 1 2
1 2
2 2
1
( , ) ( ) ( ) .
2
X X X X
x x
f x x f x f x e

| |
|
\ .
+
= =
(14-51)
PILLAI/Cha
30
PILLAI/Cha
we argue as follows: In an interval the realization moves
from X(t) = X
1
to
and hence the realization intersects with the zero level somewhere
in that interval if
i.e.,
Hence the probability of upcrossing
in is given by
Differentiating both sides of (14-53) with respect to we get
and letting Eq. (14-54) reduce to
), , ( t t t +
, ) ( ) ( ) (
2 1
t X X t t X t X t t X + =
+ = +
1 2
. X X t >
(14-52)
) , ( t t t +
(14-53)
t
) (t X
) (t X
) ( t t X +
t
t t +
Fig. 14.11
. ) ( ) (
) , (
1

1 2
0
2

0
2 1
0

2 1
2
1 2
2 2 1
2 1
x d x f x d x f
dx x d x x f t
t x
x t x x
X X
X X

= =
=
=
, t
(14-54)
2 1

2 2 2 2
0
( ) ( )
X X
f x x f x t dx

=
, 0 t
1 2 1 2
0, 0, and ( ) 0 X X X t t X X t < > + = + >
31
PILLAI/Cha
[where we have made use of (5-78), Text]. There is an equal
probability for downcrossings, and hence the total probability for
crossing the zero line in an interval equals where
It follows that in a long interval T, there will be approximately
crossings of the mean value. If is large, then the
autocorrelation function decays more rapidly as moves
away from zero, implying a large random variation around the origin
(mean value) for X(t), and the likelihood of zero crossings should
increase with increase in agreeing with (14-56).
) 0 (
) 0 (
2
1
) / 2 (
2
1
) 0 ( 2
1
) (
) 0 ( 2
1
) 0 ( ) (
2

0
2 2 2
0
2 2 2
XX
XX
XX
X
XX
X X
R
R
R
dx x f x
R
dx f x f x

= =
= =

(14-55)
) , ( t t t +
,
0
t
. 0 ) 0 ( / ) 0 (
1
0
>

=
XX XX
R R
(14-56)
T
0
) 0 (
XX
R

) (
XX
R
(0),
XX
R
32
Discrete Time Stochastic Processes:
A discrete time stochastic process X
n
= X(nT) is a sequence of
random variables. The mean, autocorrelation and auto-covariance
functions of a discrete-time process are gives by
and
respectively. As before strict sense stationarity and wide-sense
stationarity definitions apply here also.
For example, X(nT) is wide sense stationary if
and
)} ( ) ( { ) , (
)} ( {
2
*
1 2 1
T n X T n X E n n R
nT X E
n
=
=
*
2 1 2 1
2 1
) , ( ) , (
n n
n n R n n C =
(14-57)
(14-58)
(14-59)
constant a nT X E , )} ( { =
(14-60)
PILLAI/Cha
(14-61)
* *
[ {( ) } {( ) }] ( )
n n
E X k n T X k T R n r r
+ = = =
33
i.e., R(n
1
, n
2
) = R(n
1
n
2
) = R
*
(n
2
n
1
). The positive-definite
property of the autocorrelation sequence in (14-8) can be expressed
in terms of certain Hermitian-Toeplitz matrices as follows:
Theorem: A sequence forms an autocorrelation sequence of
a wide sense stationary stochastic process if and only if every
Hermitian-Toeplitz matrix T
n
given by
is non-negative (positive) definite for
Proof: Let represent an arbitrary constant vector.
Then from (14-62),
since the Toeplitz character gives Using (14-61),
Eq. (14-63) reduces to
+

} {
n
r
0, 1, 2, , . n =
*
0
*
1
*
1
*
1 1 0
*
1
2 1 0

n
n n
n
n
n
T
r r r r
r r r r
r r r r
T =
|
|
|
|
|
.
|
\
|
=
T
n
a a a a ] , , , [
1 0
=
(14-62)
PILLAI/Cha
= =

=
n
i
n
k
i k k i n
r a a a T a
0 0
*
*
(14-63)
. ) (
, i k k i n
r T

=
34
From (14-64), if X(nT) is a wide sense stationary stochastic process
then T
n
is a non-negative definite matrix for every
Similarly the converse also follows from (14-64). (see section 9.4, Text)
If X(nT) represents a wide-sense stationary input to a discrete-time
system {h(nT)}, and Y(nT) the system output, then as before the cross
correlation function satisfies
and the output autocorrelation function is given by
or
Thus wide-sense stationarity from input to output is preserved
for discrete-time systems also.
. , , 2 , 1 , 0 = n
(14-64)
2
*
* * *
0 0 0
{ ( ) ( )} ( ) 0.
n n n
n i k k
i k k
a T a a a E X kT X iT E a X kT
= = =

= =
`

)

PILLAI/Cha
) ( ) ( ) (
*
n h n R n R
XX XY
=
) ( ) ( ) ( n h n R n R
XY YY
=
). ( ) ( ) ( ) (
*
n h n h n R n R
XX YY
=
(14-65)
(14-66)
(14-67)
35
Auto Regressive Moving Average (ARMA) Processes
Consider an input output representation
where X(n) may be considered as the output of a system {h(n)}
driven by the input W(n).
Z transform of
(14-68) gives
or
, ) ( ) ( ) (
0 1

= =
+ =
q
k
k
p
k
k
k n W b k n X a n X
(14-68)
(14-69)
h(n)
W(n) X(n)
0
0 0
( ) ( ) , 1
p q
k k
k k
k k
X z a z W z b z a

= =
=

1 2
0 1 2
1 2
0
1 2
( ) ( )
( ) ( )
( ) ( ) 1
q
q
k
p
k
p
b b z b z b z
X z B z
H z h k z
W z A z a z a z a z

=
+ + + +
= = = =
+ + + +
(14-70)
PILLAI/Cha
Fig.14.12
36
represents the transfer function of the associated system response {h(n)}
in Fig 14.12 so that
Notice that the transfer function H(z) in (14-70) is rational with p poles
and q zeros that determine the model order of the underlying system.
From (14-68), the output undergoes regression over p of its previous
values and at the same time a moving average based on
of the input over (q + 1) values is added to it, thus
generating an Auto Regressive Moving Average (ARMA (p, q))
process X(n). Generally the input {W(n)} represents a sequence of
uncorrelated random variables of zero mean and constant variance
so that
If in addition, {W(n)} is normally distributed then the output {X(n)}
also represents a strict-sense stationary normal process.
If q = 0, then (14-68) represents an AR(p) process (all-pole
process), and if p = 0, then (14-68) represents an MA(q)
PILLAI/Cha
(14-72)
(14-71)
. ) ( ) ( ) (
0
=
=
k
k W k n h n X
), 1 ( ), ( n W n W
2
W
). ( ) (
2
n n R
W WW
=
) ( , q n W
37
process (all-zero process). Next, we shall discuss AR(1) and AR(2)
processes through explicit calculations.
AR(1) process: An AR(1) process has the form (see (14-68))
and from (14-70) the corresponding system transfer
provided | a | < 1. Thus
represents the impulse response of an AR(1) stable system. Using
(14-67) together with (14-72) and (14-75), we get the output
autocorrelation sequence of an AR(1) process to be
PILLAI/Cha
) ( ) 1 ( ) ( n W n aX n X + =
(14-73)
1 | | , ) ( < = a a n h
n
(14-75)
(14-74)
=
0
1
1
1
) (
n
n n
z a
az
z H
2
| |
2
0
| | 2 2
1
} { } { ) ( ) (
a
a
a a a a n n R
n
k
k k n n n
W W W XX
= = =

=
+

(14-76)
38
where we have made use of the discrete version of (14-46). The
normalized (in terms of R
XX
(0)) output autocorrelation sequence is
given by
It is instructive to compare an AR(1) model discussed above by
superimposing a random component to it, which may be an error
term associated with observing a first order AR process X(n). Thus
where X(n) ~ AR(1) as in (14-73), and V(n) is an uncorrelated random
sequence with zero mean and variance that is also uncorrelated
with {W(n)}. From (14-73), (14-78) we obtain the output
autocorrelation of the observed process Y(n) to be
PILLAI/Cha
) ( ) ( ) ( n V n X n Y + =
. 0 | | ,
) 0 (
) (
) (
| |
= = n a
R
n R
n
n
XX
XX
X
(14-78)
(14-77)
2
V
) (
1
) ( ) ( ) ( ) ( ) (
2
2
| |
2
2
n
a
a
n n R n R n R n R
V W
V XX VV XX YY
n

+
=
+ = + =
(14-79)
39
so that its normalized version is given by
where
Eqs. (14-77) and (14-80) demonstrate the effect of superimposing
an error sequence on an AR(1) model. For non-zero lags, the
autocorrelation of the observed sequence {Y(n)}is reduced by a constant
factor compared to the original process {X(n)}.
From (14-78), the superimposed
error sequence V(n) only affects
the corresponding term in Y(n)
(term by term). However,
a particular term in the input sequence
W(n) affects X(n) and Y(n) as well as
all subsequent observations.
PILLAI/Cha
(14-80)
. 1
) 1 (
2 2 2
2
<
+
=
a
c
V W
W

(14-81)
Fig. 14.13
n
k
) ( ) ( k k
Y X
>
1 ) 0 ( ) 0 ( = =
Y X

0

| |
1 0
( )
( )
(0)
1, 2,
YY
Y
YY
n
n
R n
n
R
c a n
= =

=
40
AR(2) Process: An AR(2) process has the form
and from (14-70) the corresponding transfer function is given by
so that
and in term of the poles of the transfer function,
from (14-83) we have
that represents the impulse response of the system.
From (14-84)-(14-85), we also have
From (14-83),
PILLAI/Cha
) ( ) 2 ( ) 1 ( ) (
2 1
n W n X a n X a n X + + = (14-82)
(14-83)
(14-84)
(14-85)
1
2
2
1
1
1
0
2
2
1
1
1 1 1
1
) ( ) (

=

= =
z
b
z
b
z a z a
z n h z H
n
n

2 ), 2 ( ) 1 ( ) ( , ) 1 ( , 1 ) 0 (
2 1 1
+ = = = n n h a n h a n h a h h
0 , ) (
2 2 1 1
+ = n b b n h
n n

. , 1
1 2 2 1 1 2 1
a b b b b = + = +
, ,
2 2 1 1 2 1
a a = = +
(14-86)
2 1
and
41
and H(z) stable implies
Further, using (14-82) the output autocorrelations satisfy the recursion
and hence their normalized version is given by
By direct calculation using (14-67), the output autocorrelations are
given by
PILLAI/Cha
(14-88)
(14-87)
. 1 | | , 1 | |
2 1
< <
) 2 ( ) 1 (
)} ( ) ( {
)} ( )] 2 ( ) 1 ( {[
)} ( ) ( { ) (
2 1
*
*
2 1
*
+ =
+ +
+ + + =
+ =
n R a n R a
m X m n W E
m X m n X a m n X a E
m X m n X E n R
XX XX
XX
0
|
.
|
\
|
=
+ =
= =
=
2
2
*
2
2
2
*
2 1
*
2
*
2 1
2
*
1
*
1 2
*
1
2
1
*
1
2
1
2
0
* 2
* 2 *
| | 1
) ( | |
1
) (
1
) (
| | 1
) ( | |

) ( ) (
) ( ) ( ) ( ) ( ) ( ) (
n n n n
k
b b b b b b
k h k n h
n h n h n h n h n R n R
W
W
W WW XX
(14-89)
1 2
( )
( ) ( 1) ( 2).
(0)
XX
X X X
XX
R n
n a n a n
R
= = +
42
where we have made use of (14-85). From (14-89), the normalized
output autocorrelations may be expressed as
where c
1
and c
2
are appropriate constants.
Damped Exponentials: When the second order system in
(14-83)-(14-85) is real and corresponds to a damped exponential
response, the poles are complex conjugate which gives
in (14-83). Thus
In that case in (14-90) so that the normalized
correlations there reduce to
But from (14-86)
PILLAI/Cha
(14-90)
n n
XX
XX
X
c c
R
n R
n
*
2 2
*
1 1
) 0 (
) (
) ( + = =
2
1 2
4 0 a a + <
*

1 2
j
c c c e

= =
*

1 2 1
, , 1.
j
r e r
= = <
(14-91)
(14-92) ). cos( 2 } Re{ 2 ) (
*
1 1
+ = = n cr c n
n
n
X
, 1 , cos 2
2
2
1 2 1
< = = = + a r a r
(14-93)
43
and hence which gives
Also from (14-88)
so that
where the later form is obtained from (14-92) with n = 1. But
in (14-92) gives
Substituting (14-96) into (14-92) and (14-95) we obtain the normalized
output autocorrelations to be
PILLAI/Cha
2
1 2
2 sin ( 4 ) 0 r a a = + >
1 ) 0 ( =
X
.
) 4 (
tan
1
2
2
1
a
a a +
=
(14-94)
(14-95)
(14-96)
) 1 ( ) 1 ( ) 0 ( ) 1 (
2 1 2 1 X X X X
a a a a + = + =
) cos( 2
1
) 1 (
2
1
+ =
= cr
a
a
X
. cos 2 / 1 or , 1 cos 2 = = c c
44
where satisfies
Thus the normalized autocorrelations of a damped second order
system with real coefficients subject to random uncorrelated
impulses satisfy (14-97).
More on ARMA processes
From (14-70) an ARMA (p, q) system has only p + q + 1 independent
coefficients, and hence its impulse
response sequence {h
k
} also must exhibit a similar dependence among
them. In fact according to P. Dienes (The Taylor series, 1931),
.
1
1 cos
) cos(
2 2
1
a a
a

=
+

(14-98)
1 ,
cos
) cos(
) ( ) (
2
2 /
2
<
+
= a
n
a n
n
X
(14-97)
( , 1 , , 0 ),
k i
a k p b i q = =
PILLAI/Cha
45
an old result due to Kronecker
1
(1881) states that the necessary and
sufficient condition for to represent a rational
system (ARMA) is that
where
i.e., In the case of rational systems for all sufficiently large n, the
Hankel matrices H
n
in (14-100) all have the same rank.
The necessary part easily follows from (14-70) by cross multiplying
and equating coefficients of like powers of
1
Among other things God created the integers and the rest is the work of man. (Leopold Kronecker)
PILLAI/Cha
0
( )
k
k
k
H z h z

=
=
det 0, (for all sufficiently large ),

n
H n N n = (14-99)
(14-100)
, 0, 1, 2, .
k
z k
=
0 1 2
1 2 3 1
1 2 2
.
n
n
n
n n n n
h h h h
h h h h
H
h h h h
+
+ +
| |
|
|
=
|
|
\ .
46
This gives
For systems with
in (14-102) we get
which gives det H
p
= 0. Similarly gives
1, i p q = +
0 0
1 0 1 1
0 1 1
0 1 1 1 1
0 , 1.
q q q m
q i q i q i q i
b h
b h a h
b h a h a h
h a h a h a h i
+ + + +
=
= +
= + + +
= + + + +
.
1, letting , 1, , q p i p q p q = +
0 1 1 1 1
1 1 2 1 1 2
0
0
p p p p
p p p p p p
h a h a h a h
h a h a h a h

+
+ + + + =
+ + + + =
(14-102)
(14-101)
2p q
(14-103)
PILLAI/Cha
47
and that gives det H
p+1
= 0 etc. (Notice that )
(For sufficiency proof, see Dienes.)
It is possible to obtain similar determinantial conditions for ARMA
systems in terms of Hankel matrices generated from its output
autocorrelation sequence.
Referring back to the ARMA (p, q) model in (14-68),
the input white noise process w(n) there is uncorrelated with its own
past sample values as well as the past values of the system output.
This gives
0, 1, 2,
p k
a k
+
= =
0 1 1 1
1 1 2 2
1 1 2 2 2
0
0
0,
p p p
p p p
p p p p p
h a h a h
h a h a h
h a h a h
+ +
+ +
+ + + +
+ + + =
+ + + =
+ + + =
(14-104)
PILLAI/Cha
*
{ ( ) ( )} 0, 1 E w n w n k k =
*
{ ( ) ( )} 0, 1. E w n x n k k =
(14-105)
(14-106)
48
PILLAI/Cha
Together with (14-68), we obtain
and hence in general
and
Notice that (14-109) is the same as (14-102) with {h
k
} replaced
*
* *
1 0
*
1 0
{ ( ) ( )}
{ ( ) ( )} { ( ) ( )}
{ ( ) ( )}
i
p q
k k
k k
p q
k i k k
k k
r E x n x n i
a x n k x n i b w n k w n i
a r b w n k x n i
= =
= =
=
= +
= +

(14-107)
1
0,
p
k i k i
k
a r r i q
=
+
(14-108)
1
0, 1.
p
k i k i
k
a r r i q
=
+ = +
(14-109)
49
by {r
k
} and hence the Kronecker conditions for rational systems can
be expressed in terms of its output autocorrelations as well.
Thus if X(n) ~ ARMA (p, q) represents a wide sense stationary
stochastic process, then its output autocorrelation sequence {r
k
}
satisfies
where
represents the Hankel matrix generated from
It follows that for ARMA (p, q) systems, we have
1
rank rank , 0,
p p k
D D p k
+
= = (14-110)
(14-111)
( 1) ( 1) k k + +
0 1 2
, , , , , .
k k
r r r r
0 1 2
1 2 3 1
1 2 2
k
k
k
k k k k
r r r r
r r r r
D
r r r r
+
+ +
| |
|
|
=
|
|
\ .
PILLAI/Cha
(14-112) det 0, for all sufficiently large .
n
D n =
1
15. Poisson Processes
In Lecture 4, we introduced Poisson arrivals as the limiting behavior
of Binomial random variables. (Refer to Poisson approximation of
Binomial random variables.)
From the discussion there (see (4-6)-(4-8) Lecture 4)
where
" , 2 , 1 , 0 ,
! " duration of interval
an in occur arrivals "
= =
)
`

k
k
e
k
P
k
(15-1)
=
= =
T
T np
(15-2)
Fig. 15.1
PILLAI
0 T

arrivals k
2
0 T

arrivals k
2
PILLAI
It follows that (refer to Fig. 15.1)
since in that case
From (15-1)-(15-4), Poisson arrivals over an interval form a Poisson
random variable whose parameter depends on the duration
of that interval. Moreover because of the Bernoulli nature of the
underlying basic random arrivals, events over nonoverlapping
intervals are independent. We shall use these two key observations
to define a Poisson process formally. (Refer to Example 9-5, Text)
Definition: X(t) = n(0, t) represents a Poisson process if
(i) the number of arrivals n(t
1
, t
2
) in an interval (t
1
, t
2
) of length
t = t
2
t
1
is a Poisson random variable with parameter
Thus
(15-3)
. 2 2
2
1
= =
=
T
T np
(15-4)
. t
2
" arrivals occur in an
(2 )
, 0, 1, 2, ,
interval of duration 2 " !
k
k
P e k
k

= =
`
)
"
3
PILLAI
and
(ii) If the intervals (t
1
, t
2
) and (t
3
, t
4
) are nonoverlapping, then the
random variables n(t
1
, t
2
) and n(t
3
, t
4
) are independent.
Since n(0, t) ~ we have
and
To determine the autocorrelation function let t
2
> t
1
,
then from (ii) above n(0, t
1
) and n(t
1
, t
2
) are independent Poisson
random variables with parameters and respectively.
Thus
1 2 2 1
, , 2 , 1 , 0 ,
!
) (
} ) , ( { t t t k
k
t
e k t t n P
k
t
= = = =

"
(15-5)
), , (
2 1
t t R
XX
), ( t P
t t n E t X E = = )] , 0 ( [ )] ( [
(15-6)
. )] , 0 ( [ )] ( [
2 2 2 2
t t t n E t X E + = =
(15-7)
1
t ) (
1 2
t t
). ( )] , ( [ )] , 0 ( [ )] , ( ) , 0 ( [
1 2 1
2
2 1 1 2 1 1
t t t t t n E t n E t t n t n E = =
(15-8)
4
PILLAI
But
and hence the left side if (15-8) can be rewritten as
Using (15-7) in (15-9) together with (15-8), we obtain
Similarly
Thus
) ( ) ( ) , 0 ( ) , 0 ( ) , (
1 2 1 2 2 1
t X t X t n t n t t n = =
)]. ( [ ) , ( )}] ( ) ( ){ ( [
1
2
2 1 1 2 1
t X E t t R t X t X t X E
XX
=
(15-9)
. ,
)] ( [ ) ( ) , (
1 2 2 1
2
1
1
2
1 2 1
2
2 1
t t t t t
t X E t t t t t R
XX
+ =
+ =

(15-10)
(15-12)
(15-11)
. , ) , (
1 2 2 1
2
2 2 1
t t t t t t t R
XX
< + =
). , min( ) , (
2 1 2 1
2
2 1
t t t t t t R
XX
+ =
5
PILLAI
From (15-12), notice that
the Poisson process X(t)
does not represent a wide
sense stationary process.
Define a binary level process
that represents a telegraph signal (Fig. 15.2). Notice that the
transition instants {t
i
} are random. (see Example 9-6, Text for
the mean and autocorrelation function of a telegraph signal).
Although X(t) does not represent a wide sense stationary process,
) (
) 1 ( ) (
t X
t Y =
(15-13)
Fig. 15.2
0
1
t
i
t
t
) (t X
t
) (t Y
t
1 +
Poisson
arrivals
1
1
t
6
PILLAI
its derivative does represent a wide sense stationary process.
To see this, we can make use of Fig. 14.7 and (14-34)-(14-37).
From there
and
and
) (t X
) (t X
) (t X
dt
d ) (
Fig. 15.3 (Derivative as a LTI system)
2
1 1 2
1 2
1 2
2
2
1 1 2
2
1 1 2

( , )
( )

( )
XX
XX
t t t
R t t
R t , t
t
t t t
t U t t

= =

+ >
= +
constant a
dt
t d
dt
t d
t
X
X
,
) (
) (

= = =
(15-14)
(15-15)
(15-16)
). (

) , (
) (
2 1
2
1
2 1
2 1
t t
t
t t R
, t t R
X X
X X
+ =
=

7
PILLAI
From (15-14) and (15-16) it follows that is a wide sense
stationary process. Thus nonstationary inputs to linear systems can
lead to wide sense stationary outputs, an interesting observation.
Sum of Poisson Processes:
If X
1
(t) and X
2
(t) represent two independent Poisson processes,
then their sum X
1
(t) + X
2
(t) is also a Poisson process with
parameter (Follows from (6-86), Text and the definition
of the Poisson process in (i) and (ii)).
Random selection of Poisson Points:
Let represent random arrival points associated
with a Poisson process X(t) with parameter
and associated with
each arrival point,
define an independent
Bernoulli random
variable N
i
, where
) (t X
. ) (
2 1
t +
" " , , , ,
2 1 i
t t t
, t
. 1 ) 0 ( , ) 1 ( p q N P p N P
i i
= = = = = (15-17)
1
t
i
t
t
2
t
" "
Fig. 15.4
8
PILLAI
Define the processes
we claim that both Y(t) and Z(t) are independent Poisson processes
with parameters and respectively.
Proof:
But given X(t) = n, we have so that
and
Substituting (15-20)-(15-21) into (15-19) we get
pt qt
=
= = = =
k n
n t X P n t X k t Y P t Y )}. ) ( { } ) ( | ) ( { ) (
(15-19)
) ( ) ( ) 1 ( ) ( ; ) (
) (
1
) (
1
t Y t X N t Z N t Y
t X
i
i
t X
i
i
= = =

= =
(15-18)
) , ( ~ ) (
1
p n B N t Y
n
i
i
=
=
( )
{ ( ) | ( ) } , 0 ,
k n k
n
k
P Y t k X t n p q k n
= = = (15-20)
( )
{ ( ) } .
!
n
t
t
P X t n e
n
= =
(15-21)
9
PILLAI
More generally,
( ) ( )
!
( )! ! ! ( )!
(1 )
{ ( ) } ( )
!
( )
( ) , 0, 1, 2,
! !
~ ( ).
n n k
q t
k t
t q t t k n k k
n
n k k n n k
n k n k
e
q t k
k pt
p e
P Y t k e p q t
k
e pt
pt e k
k k
P pt

= =

= = =
= = =

"
(15-22)
( )

( ( ) ) ( (
{ ( ) , ( ) } { ( ) , ( ) ( ) }
{ ( ) , ( ) }
{ ( ) | ( ) } { ( ) }
( ) ( ) ( )

( )! ! !
k m n n
k m t pt qt
k m
k
P Y t k P Z
P Y t k Z t m P Y t k X t Y t m
P Y t k X t k m
P Y t k X t k m P X t k m
t pt qt
p q e e e
k m k m

+

+
=
= = = = =
= = = +
= = = + = +
= =
+

) )
{ ( ) } { ( ) },
t m
P Y t k P Z t m
=
= = =
(15-23)
10
PILLAI
which completes the proof.
Notice that Y(t) and Z(t) are generated as a result of random Bernoulli
selections from the original Poisson process X(t) (Fig. 15.5),
where each arrival gets tossed
over to either Y(t) with
probability p or to Z(t) with
probability q. Each such
sub-arrival stream is also
a Poisson process. Thus
random selection of Poisson
points preserve the Poisson
nature of the resulting
processes. However, as we
shall see deterministic
selection from a Poisson
process destroys the Poisson
property for the resulting processes.
Fig. 15.5
t
t
q
p p p
) ( ~ ) ( t P t X
) ( ~ ) ( qt P t Z
t
q
p
) ( ~ ) ( pt P t Y
11
PILLAI
Inter-arrival Distribution for Poisson Processes
Let denote the time interval (delay)
to the first arrival from any fixed point
t
0
. To determine the probability
distribution of the random variable
we argue as follows: Observe that
the event is the same as n(t
0
, t
0
+t) = 0, or the complement
event is the same as the event n(t
0
, t
0
+t) > 0 .
Hence the distribution function of is given by
(use (15-5)), and hence its derivative gives the probability density
function for to be
i.e., is an exponential random variable with parameter
so that
1
,
1
" "
1
t
1
1
( )
( ) , 0
t
dF t
f t e t
dt

= =
(15-24)
(15-25)
1

. / 1 ) (
1
= E
Fig. 15.6
1
" "
1
t >
1
1
t
n
t
t
2
t
"
1
I
st
2
nd
arrival
n
th
arrival
0
t
1
1 0 0
0 0
( ) { } { ( ) 0} { ( , ) 0}
1 { ( , ) 0} 1
t
F t P t P X t P n t t t
P n t t t e
= = > = + >
= + = =
12
PILLAI
Similarly, let t
n
represent the n
th
random arrival point for a Poisson
process. Then
and hence
which represents a gamma density function. i.e., the waiting time to
the n
th
Poisson arrival instant has a gamma distribution.
Moreover
(15-26)
1
1 1
1 0
1
( )
( ) ( )
( )
( 1)! !
, 0
( 1)!
n
n
k k
n n
t
x x
t
k k
n n
x
dF x
x x
f x e e
dx k k
x
e x
n

= =
= = +

(15-27)
=
=
n
i
i n
t
1
1
0
( ) { } { ( ) }
( )
1 { ( ) } 1
!
n
t n
k
n
t
k
F t P t t P X t n
t
P X t n e
k

=
= =
= < =
13
PILLAI
where is the random inter-arrival duration between the (i 1)
th
and i
th
events. Notice that are independent, identically distributed
random variables. Hence using their characteristic functions,
it follows that all inter-arrival durations of a Poisson process are
independent exponential random variables with common parameter
i.e.,
Alternatively, from (15-24)-(15-25), we have is an exponential
random variable. By repeating that argument after shifting t
0
to the
new point t
1
in Fig. 15.6, we conclude that is an exponential
random variable. Thus the sequence are independent
exponential random variables with common p.d.f as in (15-25).
Thus if we systematically tag every m
th
outcome of a Poisson process
X(t) with parameter to generate a new process e(t), then the
inter-arrival time between any two events of e(t) is a gamma
random variable.
i
.
( ) , 0.
i
t
f t e t

=
(15-28)
s
i
" " , , , ,
2 1 n

t
14
Notice that
The inter-arrival time of e(t) in that case represents an Erlang-m
random variable, and e(t) an Erlang-m process (see (10-90), Text).
In summary, if Poisson arrivals are randomly redirected to form new
queues, then each such queue generates a new Poisson process
(Fig. 15.5). However if the arrivals are systematically redirected
(I
st
arrival to I
st
counter, 2
nd
arrival to 2
nd
counter, m
th
to m
th
,
(m +1)
st
arrival to I
st
counter, then the new subqueues form
Erlang-m processes.
Interestingly, we can also derive the key Poisson properties (15-5)
and (15-25) by starting from a simple axiomatic approach as shown
below:
), "
, "
. / 1 )] ( [ then , if and , / )] ( [ = = = t e E m m t e E
PILLAI
15
PILLAI
Axiomatic Development of Poisson Processes:
The defining properties of a Poisson process are that in any small
interval one event can occur with probability that is proportional
to Further, the probability that two or more events occur in that
interval is proportional to and events
over nonoverlapping intervals are independent of each other. This
gives rise to the following axioms.
Axioms:
(i)
(ii)
(iii)
and
(iv)
Notice that axiom (iii) specifies that the events occur singly, and axiom
(iv) specifies the randomness of the entire series. Axiom(ii) follows
from (i) and (iii) together with the axiom of total probability.
, t
. t
), of powers (higher ), ( t t
) ( } 1 ) , ( { t t t t t n P + = = +
) ( 1 } 0 ) , ( { t t t t t n P + = = +
) ( } 2 ) , ( { t t t t n P = +
) , 0 ( of t independen is ) , ( t n t t t n +
(15-29)
)
16
PILLAI
We shall use these axiom to rederive (15-25) first:
Let t
0
be any fixed point (see Fig. 15.6) and let represent the
time of the first arrival after t
0
. Notice that the random variable
is independent of the occurrences prior to the instant t
0
(Axiom (iv)).
With representing the distribution function of
as in (15-24) define Then for
From axiom (iv), the conditional probability in the above expression
is not affected by the event which refers to {n(t
0
, t
0
+ t) = 0},
i.e., to events before t
0
+ t, and hence the unconditional probability
in axiom (ii) can be used there. Thus
or
1 0
+ t
1
} { ) (
1
1
t P t F =
,
1
0 > t
} {
1
t >
1
1 0 0
1 0 0
0 0 1 1
( ) { }
{ , and no event occurs in ( , )}
{ , ( , ) 0}
{ ( , ) 0 | } { }.
Q t t P t t
P t t t t t t
P t n t t t t t
P n t t t t t t P t

+ = > +
= > + + +
= > + + + =
= + + + = > >
) ( )] ( 1 [ ) ( t Q t t t t Q + = +
0
( ) ( )
lim ( ) ( ) ( ) .
t
t
Q t t Q t
t
Q t Q t Q t ce

= = =
1
1
( ) 1 ( ) { }. Q t F t P t
= = >
17
PILLAI
But so that
which gives
to be the p.d.f of as in (15-25).
Similarly (15-5) can be derived from axioms (i)-(iv) in (15-29) as well.
To see this, let
represent the probability that the total number of arrivals in the
interval (0, t) equals k. Then
1 } 0 { ) 0 (
1
= > = = P Q c
1
( ) 1 ( )
t
Q t F t e

= =
1
( ) 1 , 0
t
F t e t

=
1
1
( )
( ) , 0
t
dF t
f t e t
dt

= =
} { )} ) , 0 ( { ) (
3 2 1
X X X P k t t n P t t p
k
= = + = +
1
(15-30)
or
( ) { (0, ) )}, 0, 1, 2,
k
p t P n t k k = = = "
18
PILLAI
where the events
are mutually exclusive. Thus
But as before
and
). ( ) ( ) ( ) (
3 2 1
X P X P X P t t p
k
+ + = +
t p t
k t n P k t n t t t n P X P
t p t
k t n P t t t n P
k t n P k t n t t t n P X P
k
k
=
= = = + =
=
= = + =
= = = + =
1
2
1
} 1 ) , 0 ( { } 1 ) , 0 ( | 1 ) , ( { ) (
) ( ) 1 (
} ) , 0 ( { } 0 ) , ( {
} ) , 0 ( { } ) , 0 ( | 0 ) , ( { ) (
0 ) (
3
= X P
1
2
3
= " (0, ) , and ( , ) 0"
= " (0, ) 1, and ( , ) 1"
= " (0, ) , and ( , ) 2"
X n t k n t t t
X n t k n t t t
X n t k i n t t t i
= + =
= + =
= + =
19
where once again we have made use of axioms (i)-(iv) in (15-29).
This gives
or with
we get the differential equation
whose solution gives (15-5). Here [Solution to the above
differential equation is worked out in (16-36)-(16-41), Text].
This is completes the axiomatic development for Poisson processes.
PILLAI
. 0 ) (
1

t p
) ( ) ( ) 1 ( ) (
1
t tp t p t t t p
k k k
+ = +
) (
) ( ) (
lim
0
t p
t
t p t t p
k
k k
t

=
+

" , 2 , 1 , 0 ), ( ) ( ) (
1
= =

k t p t p t p
k k k

20
PILLAI
Poisson Departures between Exponential Inter-arrivals
Let and represent two independent
Poisson processes called arrival and departure processes.
) ( ~ ) ( t P t X ) ( ~ ) ( t P t Y
Let Z represent the random interval between any two successive
arrivals of X(t). From (15-28), Z has an exponential distribution with
parameter Let N represent the number of departures of Y(t)
between any two successive arrivals of X(t). Then from the Poisson
nature of the departures we have
Thus
.
( )
{ | } .
!
k
t
t
P N k Z t e
k
= = =
Fig. 15.7
1
t
i
t
t
2
t
1 + i
t
i
Z
) (t X
) (t Y
21
PILLAI
i.e., the random variable N has a geometric distribution. Thus if
customers come in and get out according to two independent
Poisson processes at a counter, then the number of arrivals between
any two departures has a geometric distribution. Similarly the
number of departures between any two arrivals also represents
another geometric distribution.
(15-31)

0

( )
!
0

( )
0

0
!
!
1
!

{ } { | } ( )

( )

Z
k
t t t
k
k t
k
k x
k
k
k
P N k P N k Z t f t dt
e e dt
t e dt
x e dx

+
+
+ +
= = = =
=
=
| |
=
|
\ .
| | | |
=
|
\ . \ .
, 0, 1, 2,
k
k =
|
"
22
PILLAI
Stopping Times, Coupon Collecting, and Birthday Problems
Suppose a cereal manufacturer inserts a sample of one type
of coupon randomly into each cereal box. Suppose there are n such
distinct types of coupons. One interesting question is that how many
boxes of cereal should one buy on the average in order to collect
at least one coupon of each kind?
We shall reformulate the above problem in terms of Poisson
processes. Let represent n independent
identically distributed Poisson processes with common parameter
Let represent the first, second, random arrival instants
of the process They will correspond to the first,
second, appearance of the i
th
type coupon in the above problem.
Let
so that the sum X(t) is also a Poisson process with parameter
) ( , ), ( ), (
2 1
t X t X t X
n
"
. t
. , 2, 1, ), ( n i t X
i
" =
" , ,
2 1 i i
t t
"
(15-32)
. n =
(15-33)
where , t
"
1
( ) ( ),
n
i
i
X t X t
=
=
23
PILLAI
From Fig. 15.8, represents
The average inter-arrival duration
between any two arrivals of
whereas
represents the average inter-arrival
time for the combined sum process
X(t) in (15-32).
, , 2, 1, ), ( n i t X
i
" =
/ 1
/ 1
Fig. 15.8
1 k
t
11
t
t
21
t
1 i
t
k
Y
1
Y
2
Y
3
Y
1
12
t
31
t
"
1 n
t
N
th
arrival I
st
stopping
time T
Define the stopping time T to be that random time instant by which
at least one arrival of has occurred.
Clearly, we have
But from (15-25), are independent exponential
random variables with common parameter This gives
) ( , ), ( ), (
2 1
t X t X t X
n
"
n i t
i
, 2, 1, ,
1
" =
.
). , , , , , ( max
1 1 21 11 n i
t t t t T " " = (15-34)
. )] ( [ } { } { } {
} , , , {
} ) , , , ( {max } { ) (
1 21 11
1 21 11
1 21 11
n
ti n
n
n
t F t t P t t P t t P
t t t t t t P
t t t t P t T P t F
T
= =
=
= =
"
"
"
24
PILLAI
Thus
represents the probability distribution function of the stopping time
random variable T in (15-34). To compute its mean, we can make
use of Eqs. (5-52)-(5-53) Text, that is valid for nonnegative random
variables. From (15-35) we get
so that
Let so that
and
( ) (1 ) , 0
T
t n
F t e t

=
(15-35)
( ) 1 ( ) 1 (1 ) , 0
t n
T
P T t F t e t

> = =
(15-36)

0 0
{ } ( ) {1 (1 ) } .
t n
E T P T t dt e dt

= > =

1 ,
t
e x

=
,
(1 )
, or
t
dx
x
e dt dx dt
= =
= + + + + =
=
n
k
k
n
n
k
x
dx x x x
dx
x
dx
x T E
x
x
n
1
1
0
1
0
1 2
1
0
1
0
1 1
1
1
1 1
) 1 (

1
) 1 ( } {

"
25
PILLAI
where is the Eulers constant
1
.
Let the random variable N denote the total number of all arrivals
up to the stopping time T, and the inter-arrival
random variables associated with the sum process X(t) (see Fig 15.8).
1
Eulers constant: The series converges, since
and so that Thus the series {u
n
} converges
to some number From (1) we obtain also so that
1 1 1 1 1 1 1
{ } 1 1
2 3 2 3
1
(ln )
n
E T
n n
n n

| | | |
= + + + + = + + + +
| |
\ . \ .
+
" "
0.5772157 "
(15-37)
N k Y
k
, 2, 1, , " =
3
1 1 1
{1 ln }
2 n
n + + + + "
2 2
2
1 1
1
0 0
1
x
n
n n
dx dx
n
u =

2
2
1
1 1
.
6
n
n
n n
u

= =
< = <

0. >
{ }
1
1
1 3
1 1 1
lim{1 ln } lim ln 0.5772157 .
2
n
n
k k
n k
k
n n
u u
n
n
+
=
=

+ + + + + = = = =
" "
(1)
1
1 1
( 1) ln
n n
k
k
k k
u n
= =
= +

1 1
1 1
( )
0 0
1
( )
1
ln 0
x
n n x n n x n
dx dx
n
n
u
n
+ +
= = =
+
>

26
Then we obtain the key relation
so that
since {Y
i
} and N are independent random variables, and hence
But so that and substituting this
into (15-37), we obtain
Thus on the average a customer should buy about or slightly
more, boxes to guarantee that at least one coupon of each
} { } { }] | { [ } {
i
Y E N E n N T E E T E = = = (15-40)
), l( Exponentia ~
i
Y / 1 } { =
i
Y E
{ } (ln ). E N n n +
(15-41)
, ln n n
=
=
N
i
i
Y T
1
(15-38)
} { } { } | { } | {
1 1
i
n
i
i
n
i
i
Y nE Y E n N Y E n N T E = = = = =

= =
(15-39)
27
type has been collected.
Next, consider a slight generalization to the above problem: What if
two kinds of coupons (each of n type) are mixed up, and the objective
is to collect one complete set of coupons of either kind?
Let X
i
(t) and Y
i
(t), represent the two kinds of
coupons (independent Poisson processes) that have been mixed up to
form a single Poisson process Z(t) with normalized parameter unity.
i.e.,
As before let represent the first, second, arrivals of
the process X
i
(t), and represent the first, second,
arrivals of the process
n i , 2, , 1 " =
. ) ( ~ )] ( ) ( [ ) (
1
=
+ =
n
i
i i
t P t Y t X t Z
(15-42)
" , ,
2 1 i i
t t
" , ,
2 1 i i

"
"
. , 2, 1, ), ( n i t Y
i
" =
PILLAI
28
The stopping time T
1
in this case represents that random instant at
which either all X type or all Y type have occurred at least
once. Thus
where
and
Notice that the random variables X and Y have the same distribution
as in (15-35) with replaced by 1/2n (since and there are 2n
independent and identical processes in (15-42)), and hence
Using (15-43), we get
} , min{
1
Y X T =
(15-43)
(15-44)
(15-45)
1 =
/ 2
( ) ( ) (1 ) , 0.
X Y
t n n
F t F t e t
= =
(15-46)
11 21 1
max ( , , , )
n
X t t t = "
11 21 1
max ( , , , ).
n
Y = "
PILLAI
29
to be the probability distribution function of the new stopping time T
1
.
Also as in (15-36)
1
1
/ 2 2
( ) ( ) (min{ , } )
1 (min{ , } ) 1 ( , )
1 ( ) ( )
1 (1 ( ))(1 ( ))
1 {1 (1 ) } , 0
T
X Y
t n n
F t P T t P X Y t
P X Y t P X t Y t
P X t P Y t
F t F t
e t
= =
= > = > >
= > >
=
=
(15-47)
1

1 1
0 0

/ 2 2
0
{ } ( ) {1 ( )}
{1 (1 ) } .
T
t n n
E T P T t dt F t dt
e dt

= > =
=

/ 2 / 2
2 1
.
2 1
Let 1 , or ,
t n t n
ndx
n x
e x e dt dx dt

= = =

+ + + + =
|
.
|
\
|
=
1
0
1 2
1
0
1
0
2
1
) 1 )( 1 ( 2
) 1 ( 2
1
) 1 ( 2 } {
1
1
dx x x x x n
dx x n
x
dx
x n T E
n n
n
n
n
x
x
"
PILLAI
30
( ) ( )
{ }
1 1
1
1
0
0 0
1 1 1 1 1 1
1
2 3 1 2 2
{ } 2 ( )
2
2 (ln( / 2) ).
n n
k n k
k k
n
n n n
E T n x x dx
n
n n

+
= =
+ + + + + + +
+ +
=
=
+

" "
(15-48)
PILLAI
Once again the total number of random arrivals N up to T
1
is related
as in (15-38) where Y
i
~ Exponential (1), and hence using (15-40) we
get the average number of total arrivals up to the stopping time to be
We can generalize the stopping times in yet another way:
Poisson Quotas
Let
where X
i
(t) are independent, identically distributed Poisson
(15-49)
1
{ } { } 2 (ln( / 2) ). E N E T n n = +
) ( ~ ) ( ) (
1
t P t X t X
n
i
i

=
=
(15-50)
31
PILLAI
processes with common parameter so that
Suppose integers represent the preassigned number of
arrivals (quotas) required for processes in
the sense that when m
i
arrivals of the process X
i
(t) have occurred,
the process X
i
(t) satisfies its quota requirement.
The stopping time T in this case is that random time
instant at which any r processes have met their quota requirement
where is given. The problem is to determine the probability
density function of the stopping time random variable T, and
determine the mean and variance of the total number of random
arrivals N up to the stopping time T.
Solution: As before let represent the first, second,
arrivals of the i
th
process X
i
(t), and define
Notice that the inter-arrival times Y
ij
and independent, exponential
random variables with parameter and hence
t
i
.
2 1 n
+ + + = "
n
m m m , , ,
2 1
"
) ( , ), ( ), (
2 1
t X t X t X
n
"
" , ,
2 1 i i
t t
"
1 ,
=
j i ij ij
t t Y (15-51)
,
i
=
i
m
m Y t ) , Gamma( ~
n r
=
i
j
i i ij m i
1
,
32
PILLAI
Define T
i
to the stopping time for the i
th
process; i.e., the occurrence
of the m
i
th
arrival equals T
i
. Thus
or
Since the n processes in (15-49) are independent, the associated
stopping times defined in (15-53) are also
independent random variables, a key observation.
Given independent gamma random variables
in (15-52)-(15-53) we form their order statistics: This gives
Note that the two extremes in (15-54) represent
and
n i m t T
i i m i i
i
, 2, 1, ), , Gamma( ~
,
" = = (15-52)
. 0 ,
)! 1 (
) (
1
t
m
t
t f
t m
i
i
m
i i
i
i
T
e

(15-53)
n i T
i
, , 2 , 1 , " =
, , , ,
2 1 n
T T T "
.
) ( ) ( ) 2 ( ) 1 ( n r
T T T T < < < < < " " (15-54)
(15-55)
). , , , ( max
) , , , ( min
2 1 ) (
2 1 ) 1 (
n n
n
T T T T
T T T T
"
"
=
=
(15-56)
33
PILLAI
The desired stopping time T when r processes have satisfied their
quota requirement is given by the r
th
order statistics T
(r)
. Thus
where T
(r)
is as in (15-54). We can use (7-14), Text to compute the
probability density function of T. From there, the probability density
function of the stopping time random variable T in (15-57) is given by
where is the distribution of the i.i.d random variables T
i
and
their density function given in (15-53). Integrating (15-53) by
parts as in (4-37)-(4-38), Text, we obtain
Together with (15-53) and (15-59), Eq. (15-58) completely specifies
the density function of the stopping time random variable T,
) (t F
i
T
) (t f
i
T
) (r
T T =
(15-57)
) ( )] ( 1 )[ (
)! ( )! 1 (
!
) (
1
t f t F t F
r n r
n
t f
i
T
i
T i
T
T
k n k

=
(15-58)
. 0 ,
!
) (
1 ) (
1
0
=

t e
k
t
t F
t
m
k
k
i
i
i
i
T

(15-59)
34
PILLAI
where r types of arrival quota requirements have been satisfied.
If N represents the total number of all random arrivals up to T,
then arguing as in (15-38)-(15-40) we get
where Y
i
are the inter-arrival intervals for the process X(t), and hence
with normalized mean value for the sum process X(t), we get
To relate the higher order moments of N and T we can use their
characteristic functions. From (15-60)
=
=
N
i
i
Y T
1
(15-60)
} { } { } { } {
1
N E Y E N E T E
i
= =
(15-61)
(15-62)
) 1 (=
}. { } { T E N E =
1 1
[ { }]
{ } { } { [ | ]}
[{ { }| } ].
N n
i i
i i
j Y n
i
i
j Y j Y
j T
E e
j Y n
E e E e E E e N n
E E e N n
= =

= = =
= =
(15-63)
35
PILLAI
But Y
i
~ Exponential (1) and independent of N so that
and hence from (15-63)
which gives (expanding both sides)
or
a key identity. From (15-62) and (15-64), we get
1
{ | } { }
1
i i
j Y j Y
E e N n E e
j

= = =
( )
{ }
0
1

1
{ } [ { }] ( ) {(1 ) }
i
N
j Y j T n N
n
j
E e E e P N n E E j

=

= = = =
=
+ + =
0 0
)} 1 ( ) 1 ( { } {
!
) (
!
) (
k
k
k
k
k
k N N N E T E
k
j
k
j
"

)}, 1 ( ) 1 ( { } { + + = k N N N E T E
k
"
}. { } { var } { var T E T N =
(15-64)
(15-65)
36
PILLAI
As an application of the Poisson quota problem, we can reexamine
the birthday pairing problem discussed in Example 2-20, Text.
Birthday Pairing as Poisson Processes:
In the birthday pairing case (refer to Example 2-20, Text), we
may assume that n = 365 possible birthdays in an year correspond to
n independent identically distributed Poisson processes each with
parameter 1/n, and in that context each individual is an arrival
corresponding to his/her particular birth-day process. It follows that
the birthday pairing problem (i.e., two people have the same birth date)
corresponds to the first occurrence of the 2
nd
return for any one of
the 365 processes. Hence
so that from (15-52), for each process
Since and from (15-57) and (15-66) the stopping time
in this case satisfies
1 , 2
2 1
= = = = r m m m
n
"
(15-66)
). 1/ (2, Gamma ~ n T
i
, / 1 / n n
i
= =
(15-67)
). , , , ( min
2 1 n
T T T T " = (15-68)
37
PILLAI
Thus the distribution function for the birthday pairing stopping
time turns out to be
where we have made use of (15-59) with
As before let N represent the number of random arrivals up to the
stopping time T. Notice that in this case N represents the number of
people required in a crowd for at least two people to have the same
birth date. Since T and N are related as in (15-60), using (15-62) we get
To obtain an approximate value for the above integral, we can expand
in Taylor series. This gives
. / 1 and 2 n m
i i
= =
1 2
( ) { } 1 { }
1 {min ( , , , ) }
1 [ { }] 1 [1 ( )]
1 (1 )
T
T
i
n
n n
i
n t
t
n
F t P T t P T t
P T T T t
P T t F t
e
= = >
= >
= > =
= +
"
(15-69)
) 1 ln( ) 1 ln(
n
t
n
t
n
n
+ = +

0

0 0
{ } { } ( )
{1 ( )} (1 ) .
T
n t
t
n
E N E T P T t dt
F t dt e dt

= = >
= = +

(15-70)
38
PILLAI
and hence
so that
and substituting (15-71) into (15-70) we get the mean number of
people in a crowd for a two-person birthday coincidence to be
On comparing (15-72) with the mean value obtained in Lecture 6
(Eq. (6-60)) using entirely different arguments
( )
2 3
2 3
1 ln
2 3
t
n
t t t
n
n n
+ = +
( )
2 3
2 2
3
( )
1
t t
n
n
t
n
t
n
e
+
+ =
( )
3
2 2
2
2 3 2
3
2
1 1
3
t
t t
n n n
t
n
t
t
n
n
e e e e

+ +
| |
=
|
\ .
(15-71)
2 2
2
2
2

/ 2 3 / 2
0 0

0
1
3
2 1
2
2 3
{ }
2
2
3
24.612.
t n t n
x
n
n
n
n
E N e dt t e dt
n xe dx

+
= + = +
=

(15-72)
( { } 24.44), E X
39
we observe that the two results are essentially equal. Notice that the
probability that there will be a coincidence among 24-25 people is
about 0.55. To compute the variance of T and N we can make use of
the expression (see (15-64))
which gives (use (15-65))
The high value for the standard deviations indicate that in reality the
crowd size could vary considerably around the mean value.
Unlike Example 2-20 in Text, the method developed here
can be used to derive the distribution and average value for
2 2

2
0

/ 2 4 / 2
2
0 0 0

2

2
0
2
1
3
3 2
8
3
{ ( 1)} { } 2 ( )
2 2
2 (2 ) 2 2 2
t t n t n
x
n
t
n n
n
E N N E T tP T t dt
t e dt t e dt t e dt
n e dx n n n n

+
+ = = >
| |
= +
|
\ .
= + = +
(15-73)
. 146 . 12 , 12 . 13
N T

(15-74)
PILLAI
40
a variety of birthday coincidence problems.
Three person birthday-coincidence:
For example, if we are interested in the average crowd size where
three people have the same birthday, then arguing as above, we
obtain
so that
and T is as in (15-68), which gives
to be the distribution of the
stopping time in this case. As before, the average crowd size for three-
person birthday coincidence equals
, 1 , 3
2 1
= = = = = r m m m
n
" (15-75)
) / 1 , 3 ( Gamma ~ n T
i
(15-76)
( )
2
2
( ) 1 [1 ( )] 1 1 , 0
2
i
n
n t
T T
t t
F t F t e t
n
n

= = + +
( )
2

2
0 0
{ } { } ( ) 1 .
2
n
t
t t
E N E T P T t dt e dt
n
n

= = > = + +

(15-77)
PILLAI
) / 1 , 3 with 59) - (15 (use n m
i i
= =
41
By Taylor series expansion
so that
Thus for a three people birthday coincidence the average crowd size
should be around 82 (which corresponds to 0.44 probability).
Notice that other generalizations such as two distinct birthdays
to have a pair of coincidence in each case (m
i
=2, r = 2) can be easily
worked in the same manner.
We conclude this discussion with the other extreme case,
where the crowd size needs to be determined so that all days
in the year are birthdays among the persons in a crowd.
3
3
3
2
2
2
2
2
2
2
2
2
6
2
3
1
2
2
1
2 2
1 ln
n
t
n
t
n
t
n
t
n
t
n
t
n
t
n
t
n
t
n
t

|
.
|
\
|
+ +
|
.
|
\
|
+
|
.
|
\
|
+ =
|
.
|
\
|
+ +
( )
3 2
1/ 3 2/ 3

/ 6
0 0
1/ 3 2/ 3
1
1
3
6
3
{ }
6 (4/ 3) 82.85.
t n x
n
E N e dt x e dx
n

= =
=

(15-78)
PILLAI
42
All days are birthdays:
Once again from the above analysis in this case we have
so that the stopping time statistics T satisfies
where T
i
are independent exponential random variables with common
parameter This situation is similar to the coupon collecting
problem discussed in (15-32)-(15-34) and from (15-35), the
distribution function of T in (15-80) is given by
and the mean value for T and N are given by (see (15-37)-(15-41))
Thus for everyday to be a birthday for someone in a crowd,
365 , 1
2 1
= = = = = = n r m m m
n
" (15-79)
), , , , ( max
2 1 n
T T T T " = (15-80)
.
1
n
=
/
( ) (1 ) , 0
T
t n n
F t e t
=
(15-81)
{ } { } (ln ) 2, 364.14. E N E T n n = + =
(15-82)
PILLAI
43
the average crowd size should be 2,364, in which case there is 0.57
probability that the event actually happens.
For a more detailed analysis of this problem using Markov chains,
refer to Examples 15-12 and 15-18, in chapter 15, Text. From there
(see Eq. (15-80), Text) to be quite certain (with 0.98 probability) that
all 365 days are birthdays, the crowd size should be around 3,500.
PILLAI
44
Bulk Arrivals and Compound Poisson Processes
In an ordinary Poisson process X(t), only one event occurs at
any arrival instant (Fig 15.9a). Instead suppose a random number
of events C
i
occur simultaneously as a cluster at every arrival instant
of a Poisson process (Fig 15.9b). If X(t) represents the total number of
all occurrences in the interval (0, t), then X(t) represents a compound
Poisson process, or a bulk arrival process. Inventory orders, arrivals
at an airport queue, tickets purchased for a show, etc. follow this
process (when things happen, they happen in a bulk, or a bunch of
items are involved.)
t
1
t
2
t
n
t
"
t
1
t
2
t
n
t
"
P
3
1

= C
P
2
2

= C
P
4

=
i
C
(a) Poisson Process
(b) Compound Poisson Process
Let
" , 2 , 1 , 0 }, { = = = k k C P p
i k
Fig. 15.9
(15-83)
PILLAI
45
PILLAI
represent the common probability mass function for the occurrence
in any cluster C
i
. Then the compound process X(t) satisfies
where N(t) represents an ordinary Poisson process with parameter Let
represent the moment generating function associated with the cluster
statistics in (15-83). Then the moment generating function of the
compound Poisson process X(t) in (15-84) is given by
( )
1
( ) ,
N t
i
i
X t C
=
=

(15-84)
=
= =
0
} { ) (
k
k
k
C
z p z E z P
i
(15-85)
1
( )
0
( )
0
(1 ( ))
0
( )
!
( ) { ( ) } { }
{ [ | ( ) ]} [ { | ( ) }]
( { }) { ( ) }
( )
X
k
i
i
k
C
i
n X t
n
X t
C k
k
k t t P z
k
t
k
z z P X t n E z
E E z N t k E E z N t k
E z P N t k
P z e e

=
= = =
= = = =
= =
= =
(15-86)
.
46
PILLAI
If we let
where represents the k fold convolution of the sequence {p
n
}
with itself, we obtain
that follows by substituting (15-87) into (15-86). Eq. (15-88)
represents the probability that there are n arrivals in the interval (0, t)
for a compound Poisson process X(t).
Substituting (15-85) into (15-86) we can rewrite also as
where which shows that the compound Poisson process
can be expressed as the sum of integer-secaled independent
Poisson processes Thus
} {
) (k
n
p
( )
0
( )
!
{ ( ) }
k
t k
n
k
t
k
P X t n e p
=
= =

(15-88)
(15-87)
) (z
X
2
1 2
(1 ) (1 ) (1 )
( )
k
k
X
t z t z t z
z e e e

= " "
(15-89)
,
k k
p =
. ), ( ), (
2 1
" t m t m
( )
0 0
( )
k
k k k n
n n
n n
P z p z p z

= =
| |
= =
|
\ .

47
PILLAI
More generally, every linear combination of independent Poisson
processes represents a compound Poisson process. (see Eqs. (10-120)-
(10-124), Text).
Here is an interesting problem involving compound Poisson processes
and coupon collecting: Suppose a cereal manufacturer inserts either
one or two coupons randomly from a set consisting of n types of
coupons into every cereal box. How many boxes should one buy
on the average to collect at least one coupon of each type? We leave
it to the reader to work out the details.
. ) ( ) (
1

=
=
k
k
t m k t X
(15-90)
1
16. Mean Square Estimation
Given some information that is related to an unknown quantity of
interest, the problem is to obtain a good estimate for the unknown in
terms of the observed data.
Suppose represent a sequence of random
variables about whom one set of observations are available, and Y
represents an unknown random variable. The problem is to obtain a
good estimate for Y in terms of the observations
Let
represent such an estimate for Y.
Note that can be a linear or a nonlinear function of the observation
Clearly
represents the error in the above estimate, and the square of
n
X X X , , ,
2 1
"
. , , ,
2 1 n
X X X "
) ( ) , , , (
2 1
X X X X Y
n
= = "
(16-1)
) (
. , , ,
2 1 n
X X X "
) (
) ( X Y Y Y X = =
(16-2)
2
| |
PILLAI
2
the error. Since is a random variable, represents the mean
square error. One strategy to obtain a good estimator would be to
minimize the mean square error by varying over all possible forms
of and this procedure gives rise to the Minimization of the
Mean Square Error (MMSE) criterion for estimation. Thus under
MMSE criterion,the estimator is chosen such that the mean
square error is at its minimum.
Next we show that the conditional mean of Y given X is the
best estimator in the above sense.
Theorem1: Under MMSE criterion, the best estimator for the unknown
Y in terms of is given by the conditional mean of Y
gives X. Thus
Proof : Let represent an estimate of Y in terms of
Then the error and the mean square
error is given by
} | | {
2
E
), (
) (
n
X X X , , ,
2 1
"
}. | { ) (
X Y E X Y = =
(16-3)
) (
X Y =
). , , , (
2 1 n
X X X X " =
,
Y Y =
} | ) ( | { } |
| { } | | {
2 2 2 2
X Y E Y Y E E
= = = (16-4)
PILLAI
} | | {
2
E
3
Since
we can rewrite (16-4) as
where the inner expectation is with respect to Y, and the outer one is
with respect to
Thus
To obtain the best estimator we need to minimize in (16-6)
with respect to In (16-6), since
and the variable appears only in the integrand term, minimization
of the mean square error in (16-6) with respect to is
equivalent to minimization of with respect to
}] | { [ ] [ X z E E z E
z X
=
}] | ) ( | { [ } | ) ( | {
z
2
z
2 2
X X Y E E X Y E
Y X

= =
. X
+

=
=
. ) ( } | ) ( | {
}] | ) ( | { [

2

2 2
dx X f X X Y E
X X Y E E
X
(16-6)
(16-5)
,
2
.
, 0 ) ( X f
X
, 0 } | ) ( | {
2
X X Y E
} | ) ( | {
2
X X Y E
.
PILLAI
4
Since X is fixed at some value, is no longer random,
and hence minimization of is equivalent to
This gives
or
But
since when is a fixed number Using (16-9)
) ( X
} | ) ( | {
2
X X Y E
. 0 } | ) ( | {
2
=
X X Y E
(16-7)
0 } | ) ( {| = X X Y E
(16-8)
), ( } | ) ( { X X X E = (16-9)
) ( , X x X = ). (x
PILLAI
. 0 } | ) ( { } | { = X X E X Y E
5
in (16-8) we get the desired estimator to be
Thus the conditional mean of Y given represents the best
estimator for Y that minimizes the mean square error.
The minimum value of the mean square error is given by
As an example, suppose is the unknown. Then the best
MMSE estimator is given by
Clearly if then indeed is the best estimator for Y
}. , , , | { } | { ) (
2 1 n
X X X Y E X Y E X Y " = = =
n
X X X , , ,
2 1
"
. 0 )} | {var(
] } | ) | ( | { [ } | ) | ( | {
) var(
2 2 2
min
=
= =
X Y E
X X Y E Y E E X Y E Y E
X Y

(16-11)
(16-10)
3
X Y =
,
3
X Y =
3
X Y =
. } | { } | {
3

3
X X X E X Y E Y = = =
(16-12)
PILLAI
6
in terms of X. Thus the best estimator can be nonlinear.
Next, we will consider a less trivial example.
Example : Let
where k > 0 is a suitable normalization constant. To determine the best
estimate for Y in terms of X, we need
Thus
Hence the best MMSE estimator is given by
< < <

=
otherwise, 0
1 0 ,
) , (
,
y x kxy
y x f
Y X
). | (
|
x y f
X Y
1. x 0 ,
2
) 1 (
2

) , ( ) (
2
1
2
1

1

,
< <
= =
= =

x kx kxy
kxydy dy y x f x f
x
x x
Y X X
y
x
1
1
. 1 0 ;
1
2
2 / ) 1 ( ) (
) , (
) | (
2 2
,
< < <
= = y x
x
y
x kx
kxy
x f
y x f
x y f
X
Y X
X Y
PILLAI
(16-13)
7
Once again the best estimator is nonlinear. In general the best
estimator is difficult to evaluate, and hence next we
will examine the special subclass of best linear estimators.
Best Linear Estimator
In this case the estimator is a linear function of the
observations Thus
where are unknown quantities to be determined. The
mean square error is given by
.
1
) 1 (
3
2
1
1
3
2
1 3
2

) | ( } | { ) (
2
2
2
3
1
2
3
1

2
1
2
1

1
2
1

2 2
|
x
x x
x
x
x
y
dy y dy y
dy x y f y X Y E X Y
x
x
x
x
x
y
x
X Y
+ +
=
=
= =
= = =

} | { X Y E
(16-14)
Y
. , , ,
2 1 n
X X X "
n
a a a , , ,
2 1
"
=
= + + + =
n
i
i i n n l
X a X a X a X a Y
1
2 2 1 1
.
"
(16-15)
)
(
l
Y Y =
PILLAI
8
and under the MMSE criterion should be chosen so
that the mean square error is at its minimum possible
value. Let represent that minimum possible value. Then
To minimize (16-16), we can equate
This gives
But
n
a a a , , ,
2 1
"
} | | { } |
{| } | | {
2 2 2
= =
i i l
X a Y E Y Y E E
} | | {
2
E
2
n
=
=
n
i
i i
a a a
n
X a Y E
n
1
2
, , ,
2
}. | {| min
2 1
"
(16-17)
,n. , , k E
a
k
" 2 1 , 0 } | {|
2
= =

(16-18)
. 0 2
| |
} | {|
*
2
2
=
(
(
)
`
=
)
`
k k k
a
E
a
E E
a

(16-19)
PILLAI
(16-16)
9
Substituting (16-19) in to (16-18), we get
or the best linear estimator must satisfy
Notice that in (16-21), represents the estimation error
and represents the data. Thus from
(16-21), the error is orthogonal to the data for the
best linear estimator. This is the orthogonality principle.
In other words, in the linear estimator (16-15), the unknown
constants must be selected such that the error
.
) ( ) (
1 1
k
k
n
i
i i
k k
n
i
i i
k
X
a
X a
a
Y
a
X a Y
a
=

= =
, 0 } { 2
} | {|
*
2
= =
k
k
X E
a
E
. , , 2 , 1 , 0 } {
*
n k X E
k
" = =
(16-20)
(16-21)
n
i
i i
X a Y
1
), (

n k X
k
=1 ,
n k X
k
=1 ,
n
a a a , , ,
2 1
"
PILLAI
10
is orthogonal to every data for the
best linear estimator that minimizes the mean square error.
Interestingly a general form of the orthogonality principle
holds good in the case of nonlinear estimators also.
Nonlinear Orthogonality Rule: Let represent any functional
form of the data and the best estimator for Y given With
we shall show that
implying that
This follows since
=
=
n
i
i i
X a Y

1
n
X X X , , ,
2 1
"
) ( X h
} | { X Y E
. X
} | { X Y E Y e =
). ( } | { X h X Y E Y e =
, 0 )} ( { = X eh E
. 0 )} ( { )} ( {
]} | ) ( [ { )} ( {
} ) ( ] | [ { )} ( {
)} ( ]) | [ {( )} ( {
= =
=
=
=
X Yh E X Yh E
X X Yh E E X Yh E
X h X Y E E X Yh E
X h X Y E Y E X eh E
PILLAI
(16-22)
11
Thus in the nonlinear version of the orthogonality rule the error is
orthogonal to any functional form of the data.
The orthogonality principle in (16-20) can be used to obtain
the unknowns in the linear case.
For example suppose n = 2, and we need to estimate Y in
terms of linearly. Thus
From (16-20), the orthogonality rule gives
Thus
or
n
a a a , , ,
2 1
"
2 1
and X X
2 2 1 1
X a X a Y
l
+ =
0 } ) {( } X {
0 } ) {( } X {
*
2 2 2 1 1
*
2
*
1 2 2 1 1
*
1
= =
= =
X X a X a Y E E
X X a X a Y E E
} { } | {| } {
} { } { } | {|
*
2 2
2
2 1
*
2 1
*
1 2
*
1 2 1
2
1
YX E a X E a X X E
YX E a X X E a X E
= +
= +
PILLAI
12
(16-23) can be solved to obtain in terms of the cross-
correlations.
The minimum value of the mean square error in (16-17) is given by
But using (16-21), the second term in (16-24) is zero, since the error is
orthogonal to the data where are chosen to be
optimum. Thus the minimum value of the mean square error is given
by
2 1
and a a
|
|
.
|
\
|
= |
.
|
\
|
|
|
.
|
\
|
} {
} {

} | {| } {
} { } | {|
*
2
*
1
2
1
2
2
*
2 1
*
1 2
2
1
YX E
YX E
a
a
X E X X E
X X E X E
(16-23)
2
n
n
a a a , , ,
2 1
" ,
i
X
}. { min } { min
} ) ( { min } { min
} | {| min
*
1
, , ,
*
, , ,
1
*
, , ,
*
, , ,
2
, , ,
2
2 1 2 1
2 1 2 1
2 1
l
n
i
i
a a a a a a
n
i
i i
a a a a a a
a a a
n
X E a Y E
X a Y E E
E
n n
n n
n

=
=
=
= =
=
" "
" "
"
(16-24)
PILLAI
13
where are the optimum values from (16-21).
Since the linear estimate in (16-15) is only a special case of
the general estimator in (16-1), the best linear estimator that
satisfies (16-20) cannot be superior to the best nonlinear estimator
Often the best linear estimator will be inferior to the best
estimator in (16-3).
This raises the following question. Are there situations in
which the best estimator in (16-3) also turns out to be linear ? In
those situations it is enough to use (16-21) and obtain the best
linear estimators, since they also represent the best global estimators.
Such is the case if Y and are distributed as jointly Gaussian
We summarize this in the next theorem and prove that result.
Theorem2: If and Y are jointly Gaussian zero
n
a a a , , ,
2 1
"
) ( X
}. | { X Y E
} { } | {|
} ) {( } {
1
* 2
1
* * 2
=
=
=
= =
n
i
i i
n
i
i i n
Y X E a Y E
Y X a Y E Y E
(16-25)
n
X X X , , ,
2 1
"
n
X X X , , ,
2 1
"
PILLAI
14
mean random variables, then the best estimate for Y in terms of
is always linear.
Proof : Let
represent the best (possibly nonlinear) estimate of Y, and
the best linear estimate of Y. Then from (16-21)
is orthogonal to the data Thus
Also from (16-28),
n
X X X , , ,
2 1
"
} | { ) , , , (

2 1
X Y E X X X Y
n
= = "
(16-26)
=
=
n
i
i i l
X a Y
1
(16-27)
. 1 , n k X
k
=
(16-28)
. 1 , 0 } X {
*
k
n k E = =
(16-29)
. 0 } { } { } {
1
= =

=
n
i
i i
X E a Y E E
(16-30)
PILLAI
1

n
l i i
i
Y Y Y a X
=
= =
15
Using (16-29)-(16-30), we get
From (16-31), we obtain that and are zero mean uncorrelated
random variables for But itself represents a Gaussian
random variable, since from (16-28) it represents a linear combination
of a set of jointly Gaussian random variables. Thus and X are
jointly Gaussian and uncorrelated random variables. As a result, and
X are independent random variables. Thus from their independence
But from (16-30), and hence from (16-32)
. 1 n k =

k
X
. 1 , 0 } { } { } {
* *
n k X E E X E
k k
= = =
(16-31)
}. { } | { E X E =
(16-32)
, 0 } { = E
. 0 } | { = X E
(16-33)
0 } | { } | {
1
= =

=
X X a Y E X E
n
i
i i
PILLAI
16
or
From (16-26), represents the best possible estimator,
and from (16-28), represents the best linear estimator.
Thus the best linear estimator is also the best possible overall estimator
in the Gaussian case.
Next we turn our attention to prediction problems using linear
estimators.
Linear Prediction
Suppose are known and is unknown.
Thus and this represents a one-step prediction problem.
If the unknown is then it represents a k-step ahead prediction
problem. Returning back to the one-step predictor, let
represent the best linear predictor. Then
. } | { } | {
1 1
l
n
i
i i
n
i
i i
Y X a X X a E X Y E = = =

= =
(16-34)
) ( } | { x X Y E =
=
n
i
i i
X a
1
n
X X X , , ,
2 1
"
1 + n
X
,
1 +
=
n
X Y
,
k n
X
+
1
+ n
X
PILLAI
17
where the error
is orthogonal to the data, i.e.,
Using (16-36) in (16-37), we get
Suppose represents the sample of a wide sense stationary
i
X
(16-35)
, 1 ,
1
1
1
1 2 2 1 1
1
1 1 1
= =
+ + + + =
+ = =
+
+
=
+
=
+ + +
n
n
i
i i
n n n
n
i
i i n n n n
a X a
X X a X a X a
X a X X X
"
(16-36)
. 1 , 0 } {
*
n k X E
k n
= =
(16-37)
+
=
= = =
1
1
* *
. 1 , 0 } { } {
n
i
k i i k n
n k X X E a X E
(16-38)
PILLAI
1
1
= ,
n
n i i
i
X a X
+
=
18
stochastic process so that
Thus (16-38) becomes
Expanding (16-40) for we get the following set of
linear equations.
Similarly using (16-25), the minimum mean square error is given by
* *
) ( } {
i k k i k i
r r k i R X X E

= = =
) (t X
(16-39)
. 1 , 1 , 0 } {
1
1
1
*
n k a r a X E
n
n
i
k i i k n
= = = =
+
+
=

(16-40)
, , , 2 , 1 n k " =
. 0
2 0
1 0
1 0
*
3 3
*
2 2
*
1 1
1 2 1 3 0 2
*
1 1
1 2 3 1 2 0 1
n k r r a r a r a r a
k r r a r a r a r a
k r r a r a r a r a
n n n n
n n n
n n n
= = + + + + +
= = + + + + +
= = + + + + +

"
#
"
"
(16-41)
PILLAI
19
The n equations in (16-41) together with (16-42) can be represented as
Let
.
} ) {(
} { } { } | {|
0 1
*
2 3
*
1 2
*
1
1
1
*
1
1
1
*
1
*
1
* 2 2
r r a r a r a r a
r a X X a E
X E Y E E
n n n n
n
i
i n i
n
i
n i i
n n n n
+ + + + + =
= =
= = =

+
=
+
+
=
+
+

"

(16-42)
.
0

0
0
0
1

2
n
3
2
1
0
*
1
*
1
*
1 0
*
2
*
1
2 0
*
1
*
2
1 1 0
*
1
2 1 0
|
|
|
|
|
|
|
|
.
|
\
|
=
|
|
|
|
|
|
|
|
.
|
\
|
|
|
|
|
|
|
|
|
.
|
\
|
#
#
"
"
#
"
"
"
n
n n
n n
n
n
n
a
a
a
a
r r r r
r r r r
r r r r
r r r r
r r r r
(16-43)
PILLAI
20
Notice that is Hermitian Toeplitz and positive definite. Using
(16-44), the unknowns in (16-43) can be represented as
Let
n
T
.

0
*
1
*
1
*
1 1 0
*
1
2 1 0
|
|
|
|
|
.
|
\
|
=
r r r r
r r r r
r r r r
T
n n
n
n
n
"
#
"
"
(16-44)
|
|
|
|
|
.
|
\
|
=
|
|
|
|
|
|
|
|
.
|
\
|
=
|
|
|
|
|
|
|
|
.
|
\
|
1
2
2
n
1
3
2
1

of
column
Last
0

0
0
0
1

n
n n
n
T
T
a
a
a
a
#
#
(16-45)
PILLAI
21
Then from (16-45),
Thus
.

1 , 1 2 , 1 1 , 1
1 , 2 22 21
1 , 1 12 11
1
|
|
|
|
|
.
|
\
|
=
+ + + +
+
+
n n
n
n
n
n
n
n
n n n
n
n n n
n
T T T
T T T
T T T
T
"
#
"
"
.

1

1 , 1
1 , 2
1 , 1
2
2
1
|
|
|
|
|
.
|
\
|
=
|
|
|
|
|
|
.
|
\
|
+ +
+
+
n n
n
n
n
n
n
n
n
T
T
T
a
a
a
#
#
, 0
1
1 , 1
2
> =
+ + n n
n
n
T
(16-46)
(16-47)
(16-48)
PILLAI
22
and
Eq. (16-49) represents the best linear predictor coefficients, and they
can be evaluated from the last column of in (16-45). Using these,
The best one-step ahead predictor in (16-35) taken the form
and from (16-48), the minimum mean square error is given by the
(n +1, n +1) entry of
From (16-36), since the one-step linear prediction error
.

1

1 , 1
1 , 2
1 , 1
1 , 1
2
1
|
|
|
|
|
.
|
\
|
=
|
|
|
|
|
.
|
\
|
+ +
+
+
+ +
n n
n
n
n
n
n
n n
n
n
T
T
T
T
a
a
a
#
#
(16-49)
n
T
.
1
n
T
. ) (
1
1
1 ,
1 , 1
1

=
+
+ +
+
|
|
.
|
\
|
=
n
i
i
n i
n
n n
n
n
X T
T
X
(16-50)
,
1 1 1 1 1
X a X a X a X
n n n n n n
+ + + + =
+
" (16-51)
PILLAI
23
we can represent (16-51) formally as follows
Thus, let
them from the above figure, we also have the representation
The filter
represents an AR(n) filter, and this shows that linear prediction leads
to an auto regressive (AR) model.
n
n
n n n
z a z a z a X + + + +

+
1
1
2
1
1
1
"
, 1 ) (
1
2
1
1 n
n n n
z a z a z a z A

+ + + + = "
(16-52)
.
) (
1

1 +

n
n
n
X
z A
n
n n n
z a z a z a z A
z H

+ + + +
= =
1
2
1
1
1
1
) (
1
) (
"
(16-53)
PILLAI
24
The polynomial in (16-52)-(16-53) can be simplified using
(16-43)-(16-44). To see this, we rewrite as
To simplify (16-54), we can make use of the following matrix identity
) (z A
n
) (z A
n
(
(
(
(
(
(
=
(
(
(
(
(
(
=
+ + + + + =

2
n
1 1 ) 1 (
2
1
1 ) 1 (
1 2
1
) 1 (
2 1
0
0
0
] 1 , , , , [
1
] 1 , , , , [
1 ) (
# " # "
"
n
n n
n
n n
n n
n n
n
T z z z
a
a
a
z z z
z a z a z a z a z A
(16-54)
PILLAI
.

0
0

1
(
=
(
B CA D C
A
I
AB I
D C
B A
(16-55)
25
Taking determinants, we get
In particular if we get
Using (16-57) in (16-54), with
.

1
B CA D A
D C
B A

=
(16-56)
, 0 D
.
0

) 1 (

1
C
B A
A
B CA
n
(16-57)
(
(
(
= = =

2
1 ) 1 (

0
, ], 1 , , , , [
n
n
n n
B T A z z z C
# "
PILLAI
26
we get
Referring back to (16-43), using Cramers rule to solve for
we get
.
1

| |

0 1 , , ,

0

0
0

| |
) 1 (
) (
1 ) 1 (
1 0
*
2
*
1
1 1 0
*
1
2 1 0
2
1
2

=
z z z
r r r r
r r r r
r r r r
T
z z
T
T
z A
n n
n n
n
n
n
n
n
n
n
n
n
n
"
"
#
"
"
"
#

(16-58)
), 1 (
1
=
+ n
a
PILLAI
1
| |
| |
| |

1
2 0 1
1 0
2
1
= = =

+
n
n
n
n
n
n
n
n
T
T
T
r r
r r
a
"
#
"
27
or
Thus the polynomial (16-58) reduces to
The polynomial in (16-53) can be alternatively represented as
in (16-60), and in fact represents a stable
. 0
| |
| |
1
2
> =
n
n
n
T
T
. 1
1

| |
1
) (
1
2
1
1
1 ) 1 (
1 0
*
2
*
1
1 1 0
*
1
2 1 0
1
n
n n
n n
n n
n
n
n
n
z a z a z a
z z z
r r r r
r r r r
r r r r
T
z A

+ + + + =
=
"
"
"
#
"
"
(16-59)
(16-60)
PILLAI
) (z A
n
) ( ~
) (
1
) ( n AR
z A
z H
n
=
28
AR filter of order n, whose input error signal is white noise of
constant spectral height equal to and output is
It can be shown that has all its zeros in provided
thus establishing stability.
Linear prediction Error
From (16-59), the mean square error using n samples is given
by
Suppose one more sample from the past is available to evaluate
( i.e., are available). Proceeding as above
the new coefficients and the mean square error can be determined.
From (16-59)-(16-61),
n
| | / | |
1 n n
T T .
1 + n
X
1 + n
X
. 0
| |
| |
1
2
> =
n
n
n
T
T
(16-61)
0 1 1
, , , , X X X X
n n
"
2
1 + n
PILLAI
.
| |
| |
1
2
1
n
n
n
T
T
+
+
=
(16-62)
) (z A
n
1 | | > z
0 | | >
n
T
29
Using another matrix identity it is easy to show that
Since we must have or for every n.
From (16-63), we have
or
since Thus the mean square error decreases as more
and more samples are used from the past in the linear predictor.
In general from (16-64), the mean square errors for the one-step
predictor form a monotonic nonincreasing sequence
). | | 1 (
| |
| |
| |
2
1
1
2
1 +
+
=
n
n
n
n
s
T
T
T (16-63)
, 0 | | >
k
T
0 ) | | 1 (
2
1
>
+ n
s
1 | |
1
<
+ n
s
) | | 1 (
| |
| |
| |
| |
2
1
1
1
2 2
1
+
+
=
+
n
n
n
n
n
s
T
T
T
T
n n

, ) | | 1 (
2 2
1
2 2
1 n n n n
s < =
+ +
(16-64)
PILLAI
. 1 ) | | 1 (
2
1
<
+ n
s
30
whose limiting value
Clearly, corresponds to the irreducible error in linear
prediction using the entire past samples, and it is related to the power
spectrum of the underlying process through the relation
where represents the power spectrum of
For any finite power process, we have
and since Thus
2 2 2
1
2

+
> " "
k n n
(16-65)
. 0
2
) (nT X
). (nT X
( ) 0
XX
S

2

1
exp ln ( ) 0.
2
XX
S d

(
=
(

(16-66)

( ) ,
XX
S d
<
PILLAI
0
2

ln ( ) ( ) .
XX XX
S d S d

+ +

<

( ( ) 0), ln ( ) ( ).
XX XX XX
S S S
(16-67)
31
Moreover, if the power spectrum is strictly positive at every
Frequency, i.e.,
then from (16-66)
and hence
i.e., For processes that satisfy the strict positivity condition in
(16-68) almost everywhere in the interval the final
minimum mean square error is strictly positive (see (16-70)).
i.e., Such processes are not completely predictable even using
their entire set of past samples, or they are inherently stochastic,
( ) 0, in - ,
XX
S > < <

ln ( ) .
XX
S d
>

2

1
exp ln ( ) 0
2
XX
S d e

(
= > =
(

(16-68)
(16-69)
(16-70)
), , (
PILLAI
32
since the next output contains information that is not contained in
the past samples. Such processes are known as regular stochastic
processes, and their power spectrum is strictly positive.
) (
XX
S

Power Spectrum of a regular stochastic Process
PILLAI
Conversely, if a process has the following power spectrum,
such that in then from (16-70), ( ) 0
XX
S =
2 1
< <
. 0
2
=

) (
XX
S
33
Such processes are completely predictable from their past data
samples. In particular
is completely predictable from its past samples, since consists
of line spectrum.
in (16-71) is a shape deterministic stochastic process.
+ =
k
k k k
t a nT X ) cos( ) (
(16-71)
( )
XX
S
) (nT X
"
1

) (
XX
S
PILLAI
1
17. Long Term Trends and Hurst Phenomena
From ancient times the Nile river region has been known for
its peculiar long-term behavior: long periods of dryness followed by
long periods of yearly floods. It seems historical records that go back
as far as 622 AD also seem to support this trend. There were long
periods where the high levels tended to stay high and other periods
where low levels remained low
1
.
An interesting question for hydrologists in this context is how
to devise methods to regularize the flow of a river through reservoir
so that the outflow is uniform, there is no overflow at any time, and
in particular the capacity of the reservoir is ideally as full at time
as at t. Let denote the annual inflows, and
1
A reference in the Bible says seven years of great abundance are coming throughout the land of
Egypt, but seven years of famine will follow them (Genesis).
0
t t +
} {
i
y
n i n
y y y s + + + = "
2
(17-1)
PILLAI
2
their cumulative inflow up to time n so that
represents the overall average over a period N. Note that may
as well represent the internet traffic at some specific local area
network and the average system load in some suitable time frame.
To study the long term behavior in such systems, define the
extermal parameters
as well as the sample variance
In this case
} {
i
y
N
y
}, { max
1
N n
N n
N
y n s u =

. ) (
1
2
1
N
N
n
n N
y y
N
D =

=
}, { min
1
N n
N n
N
y n s v =

(17-3)
(17-4)
(17-5)
N N N
v u R = (17-6)
N
s
y
N
y
N
N
i
i N
= =

=1
1
(17-2)
PILLAI
3
defines the adjusted range statistic over the period N, and the dimen-
sionless quantity
that represents the readjusted range statistic has been used extensively
by hydrologists to investigate a variety of natural phenomena.
To understand the long term behavior of where
are independent identically distributed random
variables with common mean and variance note that for large N
by the strong law of large numbers
and
N
N N
N
N
D
v u
D
R
=
(17-7)
N N
D R /
N i y
i
" , 2 , 1 , =
,
2
), , (
2
n n N s
d
n

) / , (
2
N N y
d
N
2

d
N
D
(17-8)
(17-9)
(17-10)
PILLAI
4
with probability 1. Further with where 0 < t < 1, we have
where is the standard Brownian process with auto-correlation
function given by min To make further progress note that
so that
Hence by the functional central limit theorem, using (17-3) and
(17-4) we get
, Nt n =

) ( lim lim t B
N
Nt s
N
n s
d
Nt
N
n
N

(17-11)
) (t B
). , (
2 1
t t
) ( ) (
) (

N s
N
n
n s
y n n s y n s
N n
N n N n
=
=
(17-12)
( ) (1), 0 t 1.
d
n N n N
s ny s n s N n
B t tB
N
N N N

= < <
(17-13)
PILLAI
5
where Q is a strictly positive random variable with finite variance.
Together with (17-10) this gives
a result due to Feller. Thus in the case of i.i.d. random variables the
rescaled range statistic is of the order of It
follows that the plot of versus log N should be linear
with slope H = 0.5 for independent and identically distributed
observations.
0 1 0 1
max{ ( ) (1)} min{ ( ) (1)} ,
d
N N
t t
u v
B t tB B t tB Q
N
< < < <

, Q N
v u
D
R
d
N N
N
N

N N
D R /
). (
2 / 1
N O
) / log(
N N
D R
(17-14)
(17-15)
) / log(
N
N
D R
N log
Slope=0.5
PILLAI
6
The hydrologist Harold Erwin Hurst (1951) generated
tremendous interest when he published results based on water level
data that he analyzed for regions of the Nile river which showed that
Plots of versus log N are linear with slope
According to Fellers analysis this must be an anomaly if the flows are
i.i.d. with finite second moment.
The basic problem raised by Hurst was to identify circumstances
under which one may obtain an exponent for N in (17-15).
The first positive result in this context was obtained by Mandelbrot
and Van Ness (1968) who obtained under a strongly
dependent stationary Gaussian model. The Hurst effect appears for
independent and non-stationary flows with finite second moment also.
In particular, when an appropriate slow-trend is superimposed on
a sequence of i.i.d. random variables the Hurst phenomenon reappears.
To see this, we define the Hurst exponent fora data set to be H if
) / log(
N N
D R
. 75 . 0 H
2 / 1 > H
2 / 1 > H
PILLAI
7
where Q is a nonzero real valued random variable.
IID with slow Trend
Let be a sequence of i.i.d. random variables with common mean
and variance and be an arbitrary real valued function
on the set of positive integers setting a deterministic trend, so that
represents the actual observations. Then the partial sum in (17-1)
becomes
where represents the running mean of the slow trend.
From (17-5) and (17-17), we obtain
, , N Q
N D
R
d
H
N
N
(17-16)
} {
n
X
,
2
n
g
n n n
g x y + =
(17-17)
) (
1
2 1 2 1
n n
n
i
i n n n
g x n
g x x x y y y s
+ =
+ + + + = + + + =

=
" "
(17-18)
=
=
n
i
i n
g n g
1
/ 1
PILLAI
8
Since are i.i.d. random variables, from (17-10) we get
Further suppose that the deterministic sequence
converges to a finite limit c. Then their Caesaro means
also converges to c. Since
applying the above argument to the sequence and
we get (17-20) converges to zero. Similarly, since
). )( (
2
) (
1
) )( (
2
) (
1
) (
1
) (
1
1
2
1
2
1 1 1
2 2
2
1
N n N n
N
n
N
N
n
n X
N n
N
n
N n
N
n
N
n
N n N n
N
N
n
n N
g g x x
N
g g
N
g g x x
N
g g
N
x x
N
y y
N
D
+ + =
+ + =
=

= =
= = =
=
(17-19)
} {
n
x
.
2 2

d
X
} {
n
g
N
N
n
n N
g g =
=1
1
, ) ( ) (
1
) (
1
2
1
2 2
1
c g c g
N
g g
N
N
N
n
n N n
N
n
=

= =
(17-20)
2
) ( c g
n

2
) ( c g
N

PILLAI
9
by Schwarz inequality, the first term becomes
But and the Caesaro means
Hence the first term (17-21) tends to zero as and so does the
second term there. Using these results in (17-19), we get
To make further progress, observe that

= =
=
N
n
N N n n N n N n
N
n
c g x c g x
N
g g x x
N
1 1
), )( ( ) )( (
1
) )( (
1

. ) (
1
) (
1
) )( (
1
1
2 2
1
2
1

= = =

N
n
n
N
n
n
N
n
n n
c g
N
x
N
c g x
N

(17-21)
(17-22)
2
1
2
1
) (
=
N
n
n N
x . 0 ) (
1
2
1

=
N
n
n N
c g
, N
.
2

d
N n
D c g
(17-23)
(17-24)
)} ( { max )} ( { max
)} ( ) ( { max
} { max
0 0
N n
N n
N n
N n
N n N n
N n N
g g n x x n
g g n x x n
g n s u
+
+ =
=
< < < <
PILLAI
10
and
Consequently, if we let
for the i.i.d. random variables, then from (17-6),(17-24) and (17-25)
(17-26), we obtain
where
From (17-24) (17-25), we also obtain
)}. ( { min )} ( { min
)} ( ) ( { min
} { min
0 0
N n
N n
N n
N n
N n N n
N n N
g g n x x n
g g n x x n
g n s v
+
+ =
=
< < < <
)} ( { min )} ( { max
0 0
N n
N n
N n
N n
N
x x n x x n r =
< < < <
(17-25)
(17-26)
N N N N N
G r v u R + =
(17-27)
)} ( { min )} ( { max
0 0
N n
N n
N n
N n
N
g g n g g n G =
< < < <
(17-28)
PILLAI
11
From (17-27) and (17-31) we get the useful estimates
and
Since are i.i.d. random variables, using (17-15) in (17-26) we get
a positive random variable, so that
)}, ( { min )} ( { max
0 0
N n
N n
N n
N n
N
g g n x x n v +
< < < <
)}, ( { max )} ( { min
0 0
N n
N n
N n
N n
N
g g n x x n u +
< < < <
.
N N N
r G R
(17-29)
(17-30)
(17-31)
,
N N N
r G R
.
N N N
G r R
(17-32)
(17-33)
} {
n
x
y probabilit in Q
N
r
N
r
N
X
N
,
2

(17-34)
y. probabilit in Q N
r
N

(17-35)
hence and )] ( max ) ( min } ) min {( max )} {( max [use
i
i
i
i
i i
i i
i i
i
y x y x y x + = + +
PILLAI
12
Consequently for the sequence in (17-17) using (17-23)
in (17-32)-(17-34) we get
if H > 1/2. To summarize, if the slow trend converges to a finite
limit, then for the observed sequence for every H > 1/2
in probability as
In particular it follows from (17-16) and (17-36)-(17-37) that
the Hurst exponent H > 1/2 holds for a sequence if and only
if the slow trend sequence satisfies
} {
n
y
0
/

2 / 1

H H
N
H
N
N N
N
Q
N
r
N D
G R

(17-36)
} {
n
g
}, {
n
y
0

H
N
H
N
N
H
N
N
H
N
N
N
G
N D
R
N D
G
N D
R
(17-37)
. N
} {
n
y
} {
n
g
. 2 / 1 , 0 lim
0
> > =

H c
N
G
H
N
N
(17-38)
PILLAI
13
In that case from (17-37), for that H > 1/2 we obtain
where is a positive number.
Thus if the slow trend satisfies (17-38) for some H > 1/2,
then from (17-39)
Example: Consider the observations
where are i.i.d. random variables. Here and the
sequence converges to a for so that the above result applies. Let
, /
0
N as y probabilit in c
N D
R
H
N
N
(17-39)
0
c
} {
n
g
. as , log log + N c N H
D
R
N
N
(17-40)
1 , + + = n bn a x y
n n

(17-41)
n
x ,
bn a g
n
+ =
, 0 <
. ) (
1 1
|
.
|
\
|
= =

= =
N
k
n
k
N n n
k
N
n
k b g g n M

(17-42)
PILLAI
14
To obtain its max and min, notice that
if and negative otherwise. Thus max is achieved
at
and the minimum of is attained at N=0. Hence from (17-28)
and (17-42)-(17-43)
Now using the Reimann sum approximation, we may write
N
M
, ) (
/ 1
1
1

=
<
N
k
N
k n
/ 1
1
0
1
|
.
|
\
|
=

=
N
k
k
N
n
(17-44)
0
1

1
1
>
|
.
|
\
|
=

=
N
k
n n
k
N
n b M M

(17-43)
0 =
N
M
.
1
0
1
0
|
.
|
\
|
=

= =
N
k
n
k
N
k
N
n
k b G

PILLAI
15
so that
and using (17-45)-(17-46) repeatedly in (17-44) we obtain
<
=
> +
1 ,
1 ,
log

1 , ) 1 (
/ 1
1
/ 1
0
N
k
N
N
N
n
k
(17-46)
<
=
> +
=
=
1 ,
1 ,
log

1 , ) 1 (
1 1
1
1

0
1

N
k
N
N
N
dx x
N
k
N
k
N
N
k
(17-45)
PILLAI
16
where are positive constants independent of N. From (17-47),
notice that if then
where and hence (17-38) is satisfied. In that case
3 2 1
, , c c c
, 0 2 / 1 < <
<
|
.
|
\
|
|
.
|
\
|

=
|
|
.
|
\
|

>
+
|
|
.
|
\
|
=
=
+
= =
1 , 1
1 , log log
1
log
1
1 , ) (
1

1 1
3
1
0
2 0
0
0
1
1 0
0
1 1
0
0
0

c k
N
n
b
N c N
N
n
n
bn
N c N n
bn
k
N
k
n
bn G
k
N
k
n
k
n
(17-47)
, ~
1
H
n
N c G
1 2 / 1 < < H
PILLAI
17
and the Hurst exponent
Next consider In that case from the entries in
(17-47) we get and diving both sides of (17-33)
with
so that
where the last step follows from (17-15) that is valid for i.i.d.
observations. Hence using a limiting argument the Hurst exponent
.
1
) 1 (

+
N as y probabilit in c
N D
R
N
N
(17-48)
), (
2 / 1
N o G
N
=
. 2 / 1 <
,
2 / 1
N D
N
y probabilit in
N
N o
N D
r R
N
N N
0

) (
~
2 / 1
2 / 1
2 / 1

Q
N
r
N D
R
N
N
N
2 / 1 2 / 1

~
(17-49)
. 2 / 1 1 > + = H
PILLAI
18
H = 1/2 if Notice that gives rise to i.i.d.
observations, and the Hurst exponent in that case is 1/2. Finally for
the slow trend sequence does not converge and
(17-36)-(17-40) does not apply. However direct calculation shows
that in (17-19) is dominated by the second term which for
large N can be approximated as so that
From (17-32)
where the last step follows from (17-34)-(17-35). Hence for
from (17-47) and (17-50)
. 2 / 1 0 =
} {
n
g
N
D
2
0
2
1
N x
N
N

N as N c D
N

4

(17-50)
0
1
4

N c
Q N
N D
r
N D
G R
N
N
N
N N
0 >
, 0 >
PILLAI
19
as Hence the Hurst exponent is 1 if In summary,
4
1
1
4
1
1
c
c
N c
N c
N D
G
N D
R
N
N
N
N

+
+
. N . 0 >
<
> > +
=
>
=
-1/2 2 / 1
-1/2 0 1
0 2 / 1
0 1
) (
H
(17-51)
(17-52)
Fig.1 Hurst exponent for a process with superimposed slow trend
1/2
1
-1/2
0

) ( H
Hurst
phenomenon
PILLAI
1
For a deterministic signal x(t), the spectrum is well defined: If
represents its Fourier transform, i.e., if
then represents its energy spectrum. This follows from
Parsevals theorem since the signal energy is given by
Thus represents the signal energy in the band
(see Fig 18.1).
( ) X

( ) ( ) ,
j t
X x t e dt
2
| ( ) | X

2 2

1
2
( ) | ( ) | . x t dt X d E

+ +

= =

(18-1)
(18-2)
2
| ( ) | X
( , ) +
Fig 18.1
18. Power Spectrum
t
0
( ) X t
PILLAI
0
2
| ( )| X
Energy in
( , ) +
+
2
However for stochastic processes, a direct application of (18-1)
generates a sequence of random variables for every Moreover,
for a stochastic process, E{| X(t) |
2
} represents the ensemble average
power (instantaneous energy) at the instant t.
To obtain the spectral distribution of power versus frequency for
stochastic processes, it is best to avoid infinite intervals to begin with,
and start with a finite interval ( T, T ) in (18-1). Formally, partial
Fourier transform of a process X(t) based on ( T, T ) is given by
so that
represents the power distribution associated with that realization based
on ( T, T ). Notice that (18-4) represents a random variable for every
and its ensemble average gives, the average power distribution
based on ( T, T ). Thus
.

( ) ( )
T
j t
T
T
X X t e dt
2
2

| ( ) | 1
( )
2 2
T
j t
T
T
X
X t e dt
T T

=

(18-3)
(18-4)
,
PILLAI
3
represents the power distribution of X(t) based on ( T, T ). For wide
sense stationary (w.s.s) processes, it is possible to further simplify
(18-5). Thus if X(t) is assumed to be w.s.s, then
and (18-5) simplifies to
Let and proceeding as in (14-24), we get
to be the power distribution of the w.s.s. process X(t) based on
( T, T ). Finally letting in (18-6), we obtain T
1 2

( )
1 2 1 2

1
( ) ( ) .
2
T XX
T T
j t t
T T
P R t t e dt dt
T

=

1 2
1 2
2
( ) *
1 2 1 2
( )
1 2 1 2
| ( ) | 1
( ) { ( ) ( )}
2 2
1
( , )
2
T
XX
T T
j t t
T
T T
T T
j t t
T T
X
P E E X t X t e dt dt
T T
R t t e dt dt
T

= =
`
)
=

2
2
2
| |
2
2
1
( ) ( ) (2 | |)
2
( ) (1 ) 0
T XX
XX
T
j
T
T
j
T
T
P R e T d
T
R e d
=
=
1 2 1 2
( , ) ( )
XX XX
R t t R t t =
1 2
t t =
(18-5)
(18-6)
PILLAI
4
to be the power spectral density of the w.s.s process X(t). Notice that
i.e., the autocorrelation function and the power spectrum of a w.s.s
Process form a Fourier transform pair, a relation known as the
Wiener-Khinchin Theorem. From (18-8), the inverse formula gives
and in particular for we get
From (18-10), the area under represents the total power of the
process X(t), and hence truly represents the power
spectrum. (Fig 18.2).
( ) lim ( ) ( ) 0
XX T XX
j
T
S P R e d
= =
F T
( ) ( ) 0.
XX XX
R S

1
2
( ) ( )
XX XX
j
R S e d
=

2
1
2
( ) (0) {| ( ) | } ,
XX XX
S d R E X t P the total power.
= = =
0, =
( )
XX
S
( )
XX
S
(18-9)
(18-8)
(18-7)
(18-10)
PILLAI
5
Fig 18.2
The nonnegative-definiteness property of the autocorrelation function
in (14-8) translates into the nonnegative property for its Fourier
transform (power spectrum), since from (14-8) and (18-9)
From (18-11), it follows that
( ) ( ) 0.
XX XX
R nonnegative - definite S
( )
* *
1 1 1 1
2
1
1
2
1
2
( ) ( )
( ) 0.
i j
XX XX
i
XX
n n n n
j t t
i j i j i j
i j i j
n
j t
i
i
a a R t t a a S e d
S a e d
= = = =
+
=
=
=

(18-11)
(18-12)
PILLAI
+
0

represents the power
in the band
( , ) +
( )
XX
S
( )
XX
S
6
If X(t) is a real w.s.s process, then so that
so that the power spectrum is an even function, (in addition to being
real and nonnegative).
( ) = ( )
XX XX
R R

0
( ) ( )
( ) cos
2 ( ) cos ( ) 0
XX XX
XX
XX XX
j
S R e d
R d
R d S
=
=
= =
(18-13)
PILLAI
7
Power Spectra and Linear Systems
If a w.s.s process X(t) with autocorrelation
function is
applied to a linear system with impulse
response h(t), then the cross correlation
function and the output autocorrelation function are
given by (14-40)-(14-41). From there
But if
Then
since
( )
XY
R
( )
YY
R
( ) ( ) 0
XX XX
R S
h(t)
X(t) Y(t)
Fig 18.3
* *
( ) ( ) ( ), ( ) ( ) ( ) ( ).
XY XX YY XX
R R h R R h h = =
(18-14)
( ) ( ), ( ) ( ) f t F g t G
(18-16)
(18-15)
( ) ( ) ( ) ( ) f t g t F G

{ ( ) ( )} ( ) ( )
j t
f t g t f t g t e dt
F
PILLAI
8
(18-20)
(18-19)
(18-18)
Using (18-15)-(18-17) in (18-14) we get
since
where
represents the transfer function of the system, and
{ }

( )

{ ( ) ( )}= ( ) ( )
= ( ) ( ) ( )
= ( ) ( ).
j t
j j t
f t g t f g t d e dt
f e d g t e d t
F G

+ +

+ +

F
(18-17)
* *
( ) { ( ) ( )} ( ) ( )
XY XX XX
S R h S H = = F
( )
*

* *

( ) ( ) ( ),
j j t
h e d h t e dt H

+ +

= =

( ) ( )
j t
H h t e dt
2
( ) { ( )} ( ) ( )
( ) | ( ) | .
YY YY XY
XX
S R S H
S H

= =
=
F
PILLAI
9
From (18-18), the cross spectrum need not be real or nonnegative;
However the output power spectrum is real and nonnegative and is
related to the input spectrum and the system transfer function as in
(18-20). Eq. (18-20) can be used for system identification as well.
W.S.S White Noise Process: If W(t) is a w.s.s white noise process,
then from (14-43)
Thus the spectrum of a white noise process is flat, thus justifying its
name. Notice that a white noise process is unrealizable since its total
power is indeterminate.
From (18-20), if the input to an unknown system in Fig 18.3 is
a white noise process, then the output spectrum is given by
Notice that the output spectrum captures the system transfer function
characteristics entirely, and for rational systems Eq (18-22) may be
used to determine the pole/zero locations of the underlying system.
( ) ( ) ( ) .
WW WW
R q S q = =
(18-22)
(18-21)
2
( ) | ( ) |
YY
S q H =
PILLAI
10
Example 18.1: A w.s.s white noise process W(t) is passed
through a low pass filter (LPF) with bandwidth B/2. Find the
autocorrelation function of the output process.
Solution: Let X(t) represent the output of the LPF. Then from (18-22)
Inverse transform of gives the output autocorrelation function
to be
2
, | | / 2
( ) | ( ) | .
0, | | / 2
XX
q B
S q H
B
= =

>
( )
XX
S
/ 2 / 2
/ 2 / 2

( ) ( )
sin( / 2)
sinc( / 2)
( / 2)
XX XX
B B
j j
B B
R S e d q e d
B
qB qB B
B

= =

= =
(18-23)
(18-24)
PILLAI
Fig. 18.4
(a) LPF

2
| ( )| H
/ 2 B / 2 B
1
qB
( )
XX
R
(b)
11
Eq (18-23) represents colored noise spectrum and (18-24) its
autocorrelation function (see Fig 18.4).
Example 18.2: Let
represent a smoothing operation using a moving window on the input
process X(t). Find the spectrum of the output Y(t) in term of that of X(t).
Solution: If we define an LTI system
with impulse response h(t) as in Fig 18.5,
then in term of h(t), Eq (18-25) reduces to
so that
Here

1
2
( ) ( )
t T
t T
T
Y t X d
+
=

(18-25)

( ) ( ) ( ) ( ) ( ) Y t h t X d h t X t
+
= =
2
( ) ( ) | ( ) | .
YY XX
S S H =

1
2
( ) sinc( )
T
j t
T
T
H e dt T
= =
(18-28)
(18-27)
(18-26)
PILLAI
Fig 18.5
T T
t
( ) h t
1 / 2T
12
so that
2
( ) ( ) sinc ( ).
YY XX
S S T =
(18-29)
Notice that the effect of the smoothing operation in (18-25) is to
suppress the high frequency components in the input
and the equivalent linear system acts as a low-pass filter (continuous-
time moving average) with bandwidth in this case.
PILLAI
(beyond / ), T
Fig 18.6
( )
XX
S
2
sinc ( ) T
T

( )
YY
S
2 / T
13
Discrete Time Processes
For discrete-time w.s.s stochastic processes X(nT) with
autocorrelation sequence (proceeding as above) or formally
defining a continuous time process we get
the corresponding autocorrelation function to be
Its Fourier transform is given by
and it defines the power spectrum of the discrete-time process X(nT).
From (18-30),
so that is a periodic function with period
{ } ,
k
r
+
( ) ( ) ( ),
n
X t X nT t nT =
( ) ( ).
XX k
k
R r kT
+
=
=
(18-30)
( ) 0,
XX
j T
k
k
S r e

=
=
( )
XX
S
( ) ( 2 / )
XX XX
S S T = +
2
2 . B
T
=
(18-31)
(18-32)
PILLAI
14
This gives the inverse relation
and
represents the total power of the discrete-time process X(nT). The
input-output relations for discrete-time system h(nT) in (14-65)-(14-67)
translate into
and
where
represents the discrete-time system transfer function.

1
( )
2
XX
B
jk T
k
B
r S e d
B

=

(18-33)

2
0

1
{| ( ) | } ( )
2
XX
B
B
r E X nT S d
B

= =

(18-34)
*
( ) ( ) ( )
XY XX
j
S S H e

=
2
( ) ( ) | ( ) |
YY XX
j
S S H e

=
( ) ( )
j j nT
n
H e h nT e

+
=
=

(18-35)
(18-37)
(18-36)
PILLAI
15
Matched Filter
Let r(t) represent a deterministic signal s(t) corrupted by noise. Thus
where r(t) represents the observed data,
and it is passed through a receiver
with impulse response h(t). The
output y(t) is given by
where
and it can be used to make a decision about the presence of absence
of s(t) in r(t). Towards this, one approach is to require that the
receiver output signal to noise ratio (SNR)
0
at time instant t
0
be
maximized. Notice that
h(t)
r(t)
y(t)
Fig 18.7 Matched Filter
0
t t =
0
( ) ( ) ( ), 0 r t s t w t t t = + < <
(18-38)
(18-39)
( ) ( ) ( ), ( ) ( ) ( ),
s
y t s t h t n t w t h t = = (18-40)
PILLAI
( ) ( ) ( )
s
y t y t n t = +
16
represents the output SNR, where we have made use of (18-20) to
determine the average output noise power, and the problem is to
maximize (SNR)
0
by optimally choosing the receiver filter
Optimum Receiver for White Noise Input: The simplest input
noise model assumes w(t) to be white noise in (18-38) with spectral
density N
0
, so that (18-41) simplifies to
and a direct application of Cauchy-Schwarz inequality
in (18-42) gives
(18-41)
( ). H
0
2

0

2
0

( ) ( )
( )
2 | ( ) |
j t
S H e d
SNR
N H d
(18-42)
PILLAI
0
2
0 0
0
2
2
2

0

2

1
2
1 1
2 2
Output signal power at | ( ) |
( )
Average output noise power {| ( ) | }
( ) ( )
| ( ) |
( ) ( ) | ( ) |
nn WW
s
j t
s
t t y t
SNR
E n t
S H e d
y t
S d S H d
+ +

=
= =
= =

17
and equality in (18-43) is guaranteed if and only if
or
From (18-45), the optimum receiver that maximizes the output SNR
at t = t
0
is given by (18-44)-(18-45). Notice that (18-45) need not be
causal, and the corresponding SNR is given by (18-43).
0

2

2
0
0

0 0
1
2
( )
( ) | ( ) |
s
N
s t dt
E
SNR S d
N N

+
+
= =
(18-43)
0
*
( ) ( )
j t
H S e

=
0
( ) ( ). h t s t t =
(18-44)
(18-45)
PILLAI
Fig 18.8
(a) (b) t
0
=T/2 (c) t
0
=T
( ) h t
( ) s t
( ) h t
T
t
t
t
T
/ 2 T
t
0
Fig 18-8 shows the optimum h(t) for two different values of t
0
. In Fig
18.8 (b), the receiver is noncausal, whereas in Fig 18-8 (c) the
receiver represents a causal waveform.
18
If the receiver is not causal, the optimum causal receiver can be
shown to be
and the corresponding maximum (SNR)
0
in that case is given by
Optimum Transmit Signal: In practice, the signal s(t) in (18-38) may
be the output of a target that has been illuminated by a transmit signal
f (t) of finite duration T. In that case
where q(t) represents the target impulse response. One interesting
question in this context is to determine the optimum transmit
(18-48)
0
( ) ( ) ( ) ( ) ( ) ,
T
s t f t q t f q t d = =
(18-47)
(18-46)
0
( ) ( ) ( )
opt
h t s t t u t =
0
0

2
0
0
1
( ) ( )
t
N
SNR s t dt =

( ) f t
T
t
q(t)
( ) f t
( ) s t
Fig 18.9
PILLAI
19
signal f (t) with normalized energy that maximizes the receiver output
SNR at t = t
0
in Fig 18.7. Notice that for a given s(t), Eq (18-45)
represents the optimum receiver, and (18-43) gives the corresponding
maximum (SNR)
0
. To maximize (SNR)
0
in (18-43), we may substitute
(18-48) into (18-43). This gives
where is given by
and is the largest eigenvalue of the integral equation
0
1 2
0

2
0 1 1 1
0 0

*
1 2 2 2 1 1
0 0 0
( , )

1 2 2 2 1 1 max 0
0 0
1
1
( ) |{ ( ) ( ) }|
( ) ( ) ( ) ( )
{ ( , ) ( ) } ( ) /
T
T T
T T
N
N
SNR q t f d dt
q t q t dt f d f d
f d f d N

=
=
=

_
max
1 2
( , )

*
1 2 1 2
0
( , ) ( ) ( ) q t q t dt
(18-49)
(18-50)

1 2 2 2 max 1 1
0
( , ) ( ) ( ), 0 .
T
f d f T = < <
(18-51)
PILLAI
20
PILLAI
If the causal solution in (18-46)-(18-47) is chosen, in that case the
kernel in (18-50) simplifies to
and the optimum transmit signal is given by (18-51). Notice
that in the causal case, information beyond t = t
0
is not used.
and
Observe that the kernal in (18-50) captures the target
characteristics so as to maximize the output SNR at the observation
instant, and the optimum transmit signal is the solution of the integral
equation in (18-51) subject to the energy constraint in (18-52).
Fig 18.10 show the optimum transmit signal and the companion receiver
pair for a specific target with impulse response q(t) as shown there .

2
0
( ) 1.
T
f t dt =
(18-52)
1 2
( , )
0
*
1 2 1 2
0
( , ) ( ) ( ) .
t
q t q t dt =
(18-53)
t
(a)
( ) q t
(b)
t
T
( ) f t
0
t
( ) h t
t
(c)
Fig 18.10
21
What if the additive noise in (18-38) is not white?
Let represent a (non-flat) power spectral density. In that case,
what is the optimum matched filter?
If the noise is not white, one approach is to whiten the input
noise first by passing it through a whitening filter, and then proceed
with the whitened output as before (Fig 18.7).
Notice that the signal part of the whitened output s
g
(t) equals
where g(t) represents the whitening filter, and the output noise n(t) is
white with unit spectral density. This interesting idea due to
( )
WW
S
( ) ( ) ( )
g
s t s t g t =
(18-54)
PILLAI
Whitening Filter
g(t)
( ) ( ) ( ) r t s t w t = +
( ) ( )
g
s t n t +
Fig 18.11
colored noise
white noise
22
Whitening Filter: What is a whitening filter? From the discussion
above, the output spectral density of the whitened noise process
equals unity, since it represents the normalized white noise by design.
But from (18-20)
which gives
i.e., the whitening filter transfer function satisfies the magnitude
relationship in (18-55). To be useful in practice, it is desirable to have
the whitening filter to be stable and causal as well. Moreover, at times
its inverse transfer function also needs to be implementable so that it
needs to be stable as well. How does one obtain such a filter (if any)?
[See section 11.1 page 499-502, (and also page 423-424), Text
for a discussion on obtaining the whitening filters.].
( )
nn
S
2
1 ( ) ( ) | ( ) | ,
WW nn
S S G = =
2
1
| ( ) | .
( )
WW
G
S
=
(18-55)
PILLAI
( ) G
Wiener has been exploited in several other problems including
prediction, filtering etc.
23
From there, any spectral density that satisfies the finite power constraint
and the Paley-Wiener constraint (see Eq. (11-4), Text)
can be factorized as
where H(s) together with its inverse function 1/H(s) represent two
filters that are both analytic in Re s > 0. Thus H(s) and its inverse 1/ H(s)
can be chosen to be stable and causal in (18-58). Such a filter is known
as the Wiener factor, and since it has all its poles and zeros in the left
half plane, it represents a minimum phase factor. In the rational case,
if X(t) represents a real process, then is even and hence (18-58)
reads

( )
XX
S d
+
<
(18-56)
(18-57)
2
| log ( ) |
1
XX
S
d
<
+
2
( ) | ( ) | ( ) ( ) |
XX s j
S H j H s H s

=
= =
(18-58)
( )
XX
S
PILLAI
24
Example 18.3: Consider the spectrum
which translates into
The poles ( ) and zeros ( ) of this
function are shown in Fig 18.12.
From there to maintain the symmetry
condition in (18-59), we may group
together the left half factors as
2 2
0 ( ) ( ) | ( ) ( ) | .
XX XX s j s j
S S s H s H s

= =
= =
(18-59)
2 2 2
4
( 1)( 2)
( )
( 1)
XX
S

+
=
+
2 2 2
2
4
(1 )(2 )
( ) .
1
XX
s s
S s
s
+
=
+
Fig 18.12
2 s j =
2 s j =
1 s=+ 1 s =
1
2
j
s
+
=
1
2
j
s
+
=
1
2
j
s

=
1
2
j
s

=
2
2
1 1
2 2
( 1)( 2 )( 2 ) ( 1)( 2)
( )
2 1
j j
s s
s s j s j s s
H s
s s
| | | |
| |
\ . \ .
+
+ +
+ + + +
= =
+ +
PILLAI
25
and it represents the Wiener factor for the spectrum
Observe that the poles and zeros (if any) on the appear in
even multiples in and hence half of them may be paired with
H(s) (and the other half with H( s)) to preserve the factorization
condition in (18-58). Notice that H(s) is stable, and so is its inverse.
More generally, if H(s) is minimum phase, then ln H(s) is analytic on
the right half plane so that
gives
Thus
and since are Hilbert transform pairs, it follows that
the phase function in (18-60) is given by the Hilbert
axis j
( )
XX
S
PILLAI
( )
( ) ( )
j
H A e

=
(18-60)

0
ln ( ) ln ( ) ( ) ( ) .
j t
H A j b t e dt

+

= =

0

0
ln ( ) ( ) cos
( ) ( )sin
t
t
A b t t dt
b t t dt

=
=

cos and sin t t

( )
( ) above.
XX
S
26
transform of Thus
Eq. (18-60) may be used to generate the unknown phase function of
a minimum phase factor from its magnitude.
For discrete-time processes, the factorization conditions take the form
(see (9-203)-(9-205), Text)
and
In that case
where the discrete-time system

( ) <
XX
S d
(18-63)
(18-62)

ln ( ) > .
XX
S d
2
( ) | ( ) |
XX
j
S H e

=
0
( ) ( )
k
k
H z h k z
=
=

PILLAI
ln ( ). A
( ) {ln ( )}. A = H
(18-61)
27
is analytic together with its inverse in |z| >1. This unique minimum
phase function represents the Wiener factor in the discrete-case.
Matched Filter in Colored Noise:
Returning back to the matched filter problem in colored noise, the
design can be completed as shown in Fig 18.13.
(Here represents the whitening filter associated with the noise
spectral density as in (18-55)-(18-58). Notice that G(s) is the
inverse of the Wiener factor L(s) corresponding to the spectrum
i.e.,
The whitened output s
g
(t) + n(t) in Fig 18.13 is similar
h
0
(t)=s
g
(t
0
t)
1
( ) ( ) G j L j
=
0
t t =
( ) ( )
g
s t n t +
( ) ( ) ( ) r t s t w t = +
Whitening Filter
Matched Filter
Fig 18.13
( ) G j
( )
WW
S
( ).
WW
S
2
( ) ( ) | | ( ) | ( ).
WW s j
L s L s L j S

=
= =
(18-64)
PILLAI
28
to (18-38), and from (18-45) the optimum receiver is given by
where
If we insist on obtaining the receiver transfer function for the
original colored noise problem, we can deduce it easily from Fig 18.14
Notice that Fig 18.14 (a) and (b) are equivalent, and Fig 18.14 (b) is
equivalent to Fig 18.13. Hence (see Fig 18.14 (b))
or
0 0
( ) ( )
g
h t s t t =
1
( ) ( ) ( ) ( ) ( ) ( ).
g g
s t S G j S L j S
= =
( ) H
( ) H
0
t t =
L
-1
(s) L(s)
( ) H
0
t t =
( ) r t ( ) r t
0
( )

H
_
(a)
(b)
Fig 18.14
0
( ) ( ) ( ) H L j H =
PILLAI
29
turns out to be the overall matched filter for the original problem.
Once again, transmit signal design can be carried out in this case also.
AM/FM Noise Analysis:
Consider the noisy AM signal
and the noisy FM signal
where
0
0
1 1 *
0
1 1 *
( ) ( ) ( ) ( ) ( )
( ){ ( ) ( )}
j t
g
j t
H L j H L S e
L L S e

= =
=
(18-65)
0
( ) ( ) cos( ) ( ), X t m t t n t = + + (18-66)
(18-67)
0
( ) cos( ( ) ) ( ), X t A t t n t = + + +

0
( ) FM
( )
( ) PM.
t
c m d
t
cm t

(18-68)
PILLAI
30
Here m(t) represents the message signal and a random phase jitter
in the received signal. In the case of FM, so that
the instantaneous frequency is proportional to the message signal. We
will assume that both the message process m(t) and the noise process
n(t) are w.s.s with power spectra and respectively.
We wish to determine whether the AM and FM signals are w.s.s,
and if so their respective power spectral densities.
Solution: AM signal: In this case from (18-66), if we assume
then
so that (see Fig 18.15)
( ) ( ) ( ) t t c m t
= =
( )
mm
S ( )
nn
S
~ (0, 2 ), U
0
1
( ) ( ) cos ( )
2
XX mm nn
R R R = +
(18-69)
0 0
( ) ( )
( ) ( ).
2
XX XX
XX nn
S S
S S

+ +
= +
(18-70)
( )
mm
S

0
0

0
( )
XX
S
0
( )
mm
S
(a) (b)
Fig 18.15
PILLAI
31
Thus AM represents a stationary process under the above conditions.
What about FM?
FM signal: In this case (suppressing the additive noise component in
(18-67)) we obtain
since
2
0
0
2
0
0
2
0
( / 2, / 2) {cos( ( / 2) ( / 2) )
cos( ( / 2) ( / 2) )}
{cos[ ( / 2) ( / 2)]
2
cos[2 ( / 2) ( / 2) 2 ]}
[ {cos( ( / 2) ( / 2))}cos
2
{sin( ( / 2) ( / 2))
XX
R t t A E t t
t t
A
E t t
t t t
A
E t t
E t t

+ = + + + +
+ +
= + +
+ + + + +
= +
+
0
}sin ]
(18-71)
PILLAI
0
0
0
{cos(2 ( / 2) ( / 2) 2 )}
{cos(2 ( / 2) ( / 2))} {cos 2 }
{sin(2 ( / 2) ( / 2))} {sin 2 } 0.
E t t t
E t t t E
E t t t E

+ + + +
= + + +
+ + + =
0
0
32
Eq (18-71) can be rewritten as
where
and
In general and depend on both t and so that noisy FM
is not w.s.s in general, even if the message process m(t) is w.s.s.
In the special case when m(t) is a stationary Gaussian process, from
(18-68), is also a stationary Gaussian process with autocorrelation
function
for the FM case. In that case the random variable
( , ) a t ( , ) b t
2
0 0
( / 2, / 2) [ ( , ) cos ( , )sin ]
2
XX
A
R t t a t b t + = (18-72)
(18-74)
(18-73)
( ) t
2
2
2
( )
( ) ( )
mm
d R
R c R
d

= = (18-75)
PILLAI
( , ) {cos( ( / 2) ( / 2))} a t E t t = +
( , ) {sin( ( / 2) ( / 2))} b t E t t = +
33
where
Hence its characteristic function is given by
which for gives
where we have made use of (18-76) and (18-73)-(18-74). On comparing
(18-79) with (18-78) we get
and
so that the FM autocorrelation function in (18-72) simplifies into
(18-76)
2
2( (0) ( )).
Y
R R

=
(18-77)
2
2 2
( (0) ( ))
/ 2
{ }
Y
R R
j Y
E e e e

= =
(18-78)
1 =
{ } {cos } {sin } ( , ) ( , ),
jY
E e E Y jE Y a t jb t = + = +
(18-79)
( (0) ( ))
( , )
R R
a t e

=
(18-80)
( , ) 0 b t
(18-81)
PILLAI
2
( / 2) ( / 2) ~ (0, )
Y
Y t t N = +
34
Notice that for stationary Gaussian message input m(t) (or ), the
nonlinear output X(t) is indeed strict sense stationary with
autocorrelation function as in (18-82).
Narrowband FM: If then (18-82) may be approximated
as
which is similar to the AM case in (18-69). Hence narrowband FM
and ordinary AM have equivalent performance in terms of noise
suppression.
Wideband FM: This case corresponds to In that case
a Taylor series expansion or gives
( ) t
2
( (0) ( ))
0
( ) cos .
2
XX
R R
A
R e

=
(18-82)
(0) 1, R
<
( 1 , | | 1)
x
e x x
<< =
2
0
( ) {(1 (0)) ( )}cos
2
XX
A
R R R

= + (18-83)
(0) 1. R
>
( ) R

PILLAI
35
and substituting this into (18-82) we get
so that the power spectrum of FM in this case is given by
where
Notice that always occupies infinite bandwidth irrespective
of the actual message bandwidth (Fig 18.16)and this capacity to spread
the message signal across the entire spectral band helps to reduce the
noise effect in any band.
( )
XX
S
{ }
2
2
2
2
(0)
0
( ) cos
2
c
mm
XX
R
A
R e

+
=

(18-85)
(18-86)
PILLAI
{ }
0 0
1
2
( ) ( ) ( )
XX
S S S = + +
2
2 2
1
( ) (0) (0) (0) (0)
2 2
mm
c
R R R R R

= + + = + (18-84)
(18-87)
0

( )
XX
S
Fig 18.16
2 2
2
/ 2 (0)
( ) .
2
mm
c R
A
S e

~
36
Spectrum Estimation / Extension Problem
Given a finite set of autocorrelations one interesting
problem is to extend the given sequence of autocorrelations such that
the spectrum corresponding to the overall sequence is nonnegative for
all frequencies. i.e., given we need to determine
such that
Notice that from (14-64), the given sequence satisfies T
n
> 0, and at
every step of the extension, this nonnegativity condition must be
satisfied. Thus we must have
Let Then
0 1
, , , ,
n
r r r
0 1
, , , ,
n
r r r
1 2
, ,
n n
r r
+ +

( ) 0.
jk
k
k
S r e

=
=
(18-88)
0, 1, 2, .
n k
T k
+
> =
1
.
n
r x
+
=
(18-89)
PILLAI
37
so that after some algebra
or
where
2 2 2
1
1 1
1
| |
det 0
n n n
n n
n
x
T

+ +

= = >
(18-90)
2
2
1
1
| | ,
n
n n
n
r
+
| |
<
|
\ .
(18-91)
PILLAI
1
1
* * *
1 0

=

, ,
n
n n
n
x
r
T T
r
x r r r
+
.
Fig 18.17
1 n
r
+
(18-92)
1
1 1 2 1 1
, [ , , , ] , [ , , , ] .
T T T
n n n n n
a T b a r r r b r r r

= = =

38
Eq. (18-91) represents the interior of a circle with center and radius
as in Fig 18.17, and geometrically it represents the admissible
set of values for r
n+1
. Repeating this procedure for it
follows that the class of extensions that satisfy (18-85) are infinite.
It is possible to parameterically represent the class of all
admissible spectra. Known as the trigonometric moment problem,
extensive literature is available on this topic.
[See section 12.4 Youlas Parameterization, pages 562-574, Text for
a reasonably complete description and further insight into this topic.].
2 3
, , ,
n n
r r
+ +

n
1
/
n n

PILLAI
1
19. Series Representation of Stochastic Processes
Given information about a stochastic process X(t) in
can this continuous information be represented in terms of a countable
set of random variables whose relative importance decrease under
some arrangement?
To appreciate this question it is best to start with the notion of
a Mean-Square periodic process. A stochastic process X(t) is said to
be mean square (M.S) periodic, if for some T > 0
i.e
Suppose X(t) is a W.S.S process. Then
Proof: suppose X(t) is M.S. periodic. Then
, 0 T t
(19-1) . all for 0 ] ) ( ) ( [
2
t t X T t X E = +
( ) ( ) with 1 for all . X t X t T probability t = +
) (
PILLAI
( ) is mean-square perodic ( ) is periodic in the
ordinary sense, where

X t R
*
( ) [ ( ) ( )] R E X t X t T = +
2
But from Schwarz inequality
Thus the left side equals
or
i.e.,
i.e., X(t) is mean square periodic.
. period with periodic is ) ( T R
2
2 2
*
1 2 2 1 2 2
0
[ ( ){ ( ) ( )} ] [ ( ) ] [ ( ) ( ) ] E X t X t T X t E X t E X t T X t + +
_
* *
1 2 1 2 2 1 2 1
[ ( ) ( )] [ ( ) ( )] ( ) ( ) E X t X t T E X t X t R t t T R t t + = + =
Then periodic. is ) ( Suppose ) ( R
0 ) ( ) ( ) 0 ( 2 ] | ) ( ) ( [|
* 2
= = + R R R t X t X E
( ) ( ) for any R T R + =
. 0 ] ) ( ) ( [
2
= + t X T t X E (19-2)
PILLAI
(19-3)
*
1 2 2
[ ( ){ ( ) ( )} ] 0 E X t X t T X t + =
3
Thus if X(t) is mean square periodic, then is periodic and let
represent its Fourier series expansion. Here
In a similar manner define
Notice that are random variables, and
0
0
1
( ) .
T
jn
n
R e d
T

=

0
0
1
( )
T
jk t
k
c X t e dt
T

=

+ = k c
k

,
(19-5)
PILLAI
(19-6)
) ( R
0
0
2
( ) ,
jn
n
R e
T

+
= =
(19-4)
0 1 0 2
0 1 0 2
0 2 1 0 1

* *
1 1 2 2
2
0 0

2 1 1 2
2
0 0

( ) ( )
2 1 2 1 1
0 0
1
[ ] [ ( ) ( ) ]
1
( )
1 1
[ ( ) ( )]
m
T T
jk t jm t
k m
T T
jk t jm t
T T
jm t t j m k t
E c c E X t e dt X t e dt
T
R t t e e dt dt
T
R t t e d t t e dt
T T

=
=
=

_ _
_
4
i.e., form a sequence of uncorrelated random variables,
and, further, consider the partial sum
We shall show that in the mean square sense as
i.e.,
Proof:
But
0 1
,

1 ( ) *
1
0
0,
[ ] { }
0 .
m k
T
m
j m k t
k m m
T
k m
E c c e dt
k m

> =
= =

_
(19-7)
+ =
=
n
n n
c } {
0
( ) .
N
jk t
N k
K N
X t c e

=
=

) ( ) (
~
t X t X
N
=
2
2
*
2
[ ( ) ( ) ] [ ( ) ] 2Re[ ( ( ) ( )]
[ ( ) ].
N N
N
E X t X t E X t E X t X t
E X t
=
+

(19-8)
. N
. as 0 ] ) (
~
) ( [
2
N t X t X E
N
(19-9)
(19-10)
PILLAI
5
0
0
0
2
* *

( ) *
0

( )
0
[ ( ) ] (0) ,
[ ( ) ( )] [ ( )]
1
[ ( ) ( ) ]
1
[ ( ) ( )] .
k
k
k
N
jk t
N k
k N
N
T
jk t
k N
N N
T
jk t
k
k N k N
E X t R
E X t X t E c e X t
E X e X t d
T
R t e d t
T

+
=
=

=

= =
= =
=
=
= =
_
PILLAI
(19-12)
Similarly
i.e.,
0 0
2
( ) ( ) * *
2
[ ( ) ] [ [ ] .
[ ( ) ( ) ] 2( ) 0 as
N
j k m t j k m t
N k m k m k
k m k m k N
N
N k k
k k N
E X t E c c e E c c e
E X t X t N

=
+
= =
= = =
=

(19-13)
0
( ) , .
jk t
k
k
X t c e t
=
< < +
=
(19-14)
and
6
Thus mean square periodic processes can be represented in the form
of a series as in (19-14). The stochastic information is contained in the
random variables Further these random variables
are uncorrelated and their variances
This follows by noticing that from (19-14)
Thus if the power P of the stochastic process is finite, then the positive
sequence converges, and hence This
implies that the random variables in (19-14) are of relatively less
importance as and a finite approximation of the series in
(19-14) is indeed meaningful.
The following natural question then arises: What about a general
stochastic process, that is not mean square periodic? Can it be
represented in a similar series fashion as in (19-14), if not in the whole
interval say in a finite support
Suppose that it is indeed possible to do so for any arbitrary process
X(t) in terms of a certain sequence of orthonormal functions.
*
,
( { } )
k m k k m
E c c =
0 as .
k
k
. , + = k c
k
2
(0) [ ( ) ] .
k
k
R E X t P
+
=
= = = <
k
k

+
=
0 as .
k
k
, k
, < < t
0 ? t T
PILLAI
7
i.e.,
where
and in the mean square sense
Further, as before, we would like the c
k
s to be uncorrelated random
variables. If that should be the case, then we must have
Now
=
=
1
) ( ) (
~
n
k k
t c t X
(19-15)
(19-16)
(19-17)
( ) ( ) in 0 . X t X t t T
=
*
,
[ ] .
k m m k m
E c c =
(19-18)

* * *
1 1 1 2 2 2
0 0

* *
1 1 2 2 2 1
0 0

*
1 1 2 2 2 1
0 0
[ ] [ ( ) ( ) ( ) ( ) ]
( ) { ( ) ( )} ( )
( ){ ( , ) ( ) }
T T
k m k m
T T
k m
T T
k XX m
E c c E X t t dt X t t dt
t E X t X t t dt dt
t R t t t dt dt

=
=
=

(19-19)
PILLAI

*
0

*
,
0
( ) ( )
( ) ( ) ,
T
k k
T
k n k n
c X t t dt
t t dt

=
=
8
and
Substituting (19-19) and (19-20) into (19-18), we get
Since (19-21) should be true for every we must have
or
i.e., the desired uncorrelated condition in (19-18) gets translated into the
integral equation in (19-22) and it is known as the Karhunen-Loeve or
K-L. integral equation. The functions are not arbitrary
and they must be obtained by solving the integral equation in (19-22).
They are known as the eigenvectors of the autocorrelation

*
1 1 2 2 2 1 1
0 0
( ){ ( , ) ( ) ( )} 0.
XX
T T
k m m m
t R t t t dt t dt =

1 2 2 2 1
0
( , ) ( ) ( ) 0,
XX
T
m m m
R t t t dt t
( ), 1 ,
k
t k =

*
, 1 1 1
0
( ) ( ) .
T
m k m m k m
t t dt =

(19-20)
(19-21)

1 2 2 2 1 1
0
( , ) ( ) ( ), 0 , 1 .
XX
T
m m m
R t t t dt t t T m = < < =
=1
)} ( {
k k
t
(19-22)
PILLAI
9
function of Similarly the set represent the eigenvalues
of the autocorrelation function. From (19-18), the eigenvalues
represent the variances of the uncorrelated random variables
This also follows from Mercers theorem which allows the
representation
where
Here and are known as the eigenfunctions
and eigenvalues of A direct substitution and
simplification of (19-23) into (19-22) shows that
Returning back to (19-15), once again the partial sum
1 2
( , ).
XX
R t t
*
1 2 1 2 1 2
1
( , ) ( ) ( ), 0 , ,
XX k k k
k
R t t t t t t T
=
= < <
(19-23)

*
,
0
( ) ( ) .
T
k m k m
t t dt =
1 2
( , ) respectively.
XX
R t t
) (t
k
,
k
k
( ) ( ), , 1 .
k k k
t t k = = =
1
( ) ( ) ( ), 0
N
N
k k
N
k
X t c t X t t T

=
=
PILLAI
(19-24)
(19-25)
1
{ }
k k

=
k
,
k
c
1 . k =
1 k =
10
in the mean square sense. To see this, consider
We have
Also
Similarly
* * *
1

* *
0
1

*
0
1
*
1
[ ( ) ( )] ( ) ( )
[ ( ) ( )] ( ) ( )
( ( , ) ( ) ) ( )
( ) ( )= | (
N
N k k
k
N
T
k k
k
N
T
k k
k
N
k k k k k
k
E X t X t X t c t
E X t X t d
R t d t
t t

=
=
=
=
=
=
=
=
2
1
) | .
N
k
t
=
(19-26)
=
=
N
k
k k N
t t X t X E
1
2 *
| ) ( | )] (
~
) ( [
PILLAI
2 2 *
* 2
[| ( ) ( ) | ] [| ( ) | ] [ ( ) ( )]
[ ( ) ( )] [| ( ) | ].
N N
N N
E X t X t E X t E X t X t
E X t X t E X t
=
+

2
[| ( ) | ] ( , ). E X t R t t =
(19-27)
(19-28)
(19-29)
11
and
Hence (19-26) simplifies into
i.e.,
where the random variables are uncorrelated and faithfully
represent the random process X(t) in provided
satisfy the K-L. integral equation.
Example 19.1: If X(t) is a w.s.s white noise process, determine the
sets in (19-22).
Solution: Here
2 * * 2
1
[| ( ) | ] [ ] ( ) ( ) | ( ) | .
N
N k m k m k k
k m k
E X t E c c t t t
=
= =

. as 0 | ) ( | ) , ( ] | ) (
~
) ( [|
1
2 2
=

=
N
k
k k N
t t t R t X t X E
1
( ) ( ), 0 ,
k k
k
X t c t t T
=1
} {
k k
c
0 , t T
( ),
k
t
(19-30)
) ( ) , (
2 1 2 1
t t q t t R
XX
=
PILLAI
(19-31)
(19-32)
1
{ , }
k k k

=
(19-33)
1 , k =
12
and
can be arbitrary so long as they are orthonormal as in (19-17)
and Then the power of the process
and in that sense white noise processes are unrealizable. However, if
the received waveform is given by
and n(t) is a w.s.s white noise process, then since any set of
orthonormal functions is sufficient for the white noise process
representation, they can be chosen solely by considering the other
signal s(t). Thus, in (19-35)
2
1 1
[| ( ) | ] (0)
k
k k
P E X t R q

= =
= = = = =

) ( ) ( ) (
2 1 2 1 2 1
t t q t t R t t R
ss rr
+ =
. 1 , = = k q
k
( ) ( ) ( ), 0 r t s t n t t T = + < <
) (t
k

PILLAI
(19-34)
(19-35)
(19-36)

1 2 2 1 1 2 2 1
0 0

1 1
( , ) ( ) ( ) ( )
( ) ( )
XX
T T
k k
k k k
R t t t dt q t t t dt
q t t

=
= =

13
and if
Then it follows that
Notice that the eigenvalues of get incremented by q.
Example19.2: X(t) is a Wiener process with
In that case Eq. (19-22) simplifies to
and using (19-39) this simplifies to
) (
2 1
t t R
ss

*
1 2 1 2
1
( ) ( ) ( ) ( ).
rr k k k
k
R t t q t t
=
= +
2 1 2
1 2 1 2
1 1 2

( , ) min( , ) , 0

XX
t t t
R t t t t
t t t
>
= = >
(19-39)
1
1

1 2 2 2 1 2 2 2
0 0

1 2 2 2 1
( , ) ( ) ( , ) ( )
( , ) ( ) ( ),
T t
XX k XX k
T
XX k k k
t
R t t t dt R t t t dt
R t t t dt t

=
+ =

PILLAI
=
=
1
2
*
1 2 1
) ( ) ( ) (
k
k k k ss
t t t t R
(19-37)
(19-38)
0
1
t
T
2
dt
1
1

2 2 2 1 2 2 1
0
( ) ( ) ( ).
t T
k k k k
t
t t dt t t dt t + =

(19-40)
14
Derivative with respect to t
1
gives [see Eqs. (8-5)-(8-6), Lecture 8]
or
Once again, taking derivative with respect to t
1
, we obtain
or
and its solution is given by
But from (19-40)
1

1 1 1 1 2 2 1
( ) ( 1) ( ) ( ) ( )
T
k k k k k
t
t t t t t dt t + + =
`
( ) cos sin .
k k
k k k
t A t B t

= +
. ) ( ) (

1 2 2
1
=
T
t
k k k
t dt t `
, 0 ) 0 ( =
k
PILLAI
(19-41)
(19-42)
1 1
( 1) ( ) ( )
k k k
t t = ``
2
1
1
2
1
( )
( ) 0,
k
k
k
d t
t
dt

+ =
(19-43)
15
(19-45)
(19-47)

(0) 0, 1 ,
( ) cos ,
k k
k k
k k
A k
t B t

= = =
= `
PILLAI
and from (19-41)
This gives
and using (19-44) we obtain
Also
( ) 0.
k
T = ` (19-44)
( )

2
2 2
1
2
( ) cos 0
2 1
2
, 1 .
( )
k k
k
k k
k
T B T
T k
T
k
k

= =
=
= =
`
(19-46)
16
PILLAI
Further, orthonormalization gives
Hence
with as in (19-47) and as in (19-16),
( ) sin , 0 .
k
k k
t B t t T
= (19-48)
( )
2
1 cos 2
2 2 2
2
0 0 0
sin 2
sin(2 1) 0 2 2 2
2 4
0
2
( ) sin
1
1
2 2 2
2/ .
k
k
k
k k
t T T T
k k k
T
t
k
k k k
k
T
t dt B t dt B dt
T T
B B B
B T

(
| |
= =
|
(
\ .

| |
| |
= = = =
|
|
\ .
\ .
=

(19-49)
( )
( )
2 2
,
1
2
( ) sin sin
k
k T T
t
T
t t k

= =
k
k
c
17
is the desired series representation.
Example 19.3: Given
find the orthonormal functions for the series representation of the
underlying stochastic process X(t) in 0 < t < T.
Solution: We need to solve the equation
Notice that (19-51) can be rewritten as,
Differentiating (19-52) once with respect to t
1
, we obtain
PILLAI
=
=
1
) ( ) (
k
k k
t c t X
| |
( ) , 0,
XX
R e

= >
(19-50)
0
1
t
T 2
dt
2
dt
1 2
| |
2 2 1
0
( ) ( ).
T
t t
n n n
e t dt t
(19-51)
0 0
1
1 2 2 1
1
t
( ) ( )
2 2 2 2 1
0 t
( ) ( ) ( )
T
t t t t
n n n n
e t dt e t dt t

+ =

(19-52)
18
Differentiating (19-53) again with respect to t
1
, we get
or
1
1 2 2 1
1
1
1 2 2 1
1

( ) ( )
1 2 2 1 2 2
0
1
1

( ) ( )
1
2 2 2 2
0
1
( ) ( ) ( ) ( ) ( )
( )

( )
( ) ( )
t T
t t t t
n n n n
t
n
n
t T
t t t t
n n
n n
t
t e t dt t e t dt
d t
dt
d t
e t dt e t dt
dt

+ +
=
+ =

(19-53)
1
1 2
2 1
1

( )
1 2 2
0
2

( )
1
1 2 2
2
1
( ) ( ) ( )
( )
( ) ( )
t
t t
n n
T
t t
n n
n n
t
t e t dt
d t
t e t dt
dt

+ =
1
1 2 2 1
1
1

( ) ( )
1 2 2 2 2
0
( ) {use (19-52)}
2
1
2
1
2 ( ) ( ) ( )
( )

n n
t T
t t t t
n n n
t
t
n n
t e t dt e t dt
d t
dt

(
+ +

=

_
PILLAI
19
or
or
Eq.(19-54) represents a second order differential equation. The solution
for depends on the value of the constant on the
right side. We shall show that solutions exist in this case only if
or
In that case
Let
and (19-54) simplifies to
( )
n
t
2
1
1
2
1
( )
( 2) ( )
n n
n n
d t
t
dt

=
2
1
1
2
1
( ) ( 2)
( ).
n n
n
n
d t
t
dt

| |
=
|
\ .
(19-54)
(19-55)
PILLAI
( 2) /
n n

2,
n
<
( 2) / 0.
n n
<
2
0 .
n
< <
(19-56)
2
2
1

1
2
1
( )
( ).
n
n n
d t
t
dt
= (19-57)
2
(2 )
0,
n
n
n

= >
20
PILLAI
General solution of (19-57) is given by
From (19-52)
and
Similarly from (19-53)
and
Using (19-58) in (19-61) gives
1 1 1
( ) cos sin .
n n n n n
t A t B t = +
(19-58)
2
2 2
0
1
(0) ( )
T
t
n n
n
e t dt
=

(19-59)
2
2 2
0
1
( ) ( ) .
T
t
n n
n
T e T t dt
(19-60)
2
1

1
2 2
0
1
0
( )
(0) ( ) (0)
T
t
n
n n n
n
t
d t
e t dt
dt

=
= = =
`
(19-61)
2
2 2
0
( ) ( ) ( ).
T
t
n n n
n
T e T t dt T

= =
`
(19-62)
21
PILLAI
or
and using (19-58) in (19-62), we have
or
Thus are obtained as the solution of the transcendental equation
,
n n
n
A
B

=

n n n
B A =
(19-63)
(19-64)
sin cos ( cos sin ),
( ) cos ( )sin
n n n n n n n n n n
n n n n n n n n
A T B T A T B T
A B T A B T

+ = +
+ =
2
2 2 / 2 / 2( / )
tan .
( ) 1
1 1
n n n n n n
n
n
n n n n
n n n
n n
A A
B B
A A B A B
T
A B

= = = =

2
2( / )
tan ,
( / ) 1
n
n
n
T

s
n
22
which simplifies to
In terms of from (19-56) we get
Thus the eigenvalues are obtained as the solution of the transcendental
equation (19-65). (see Fig 19.1). For each such the
corresponding eigenvector is given by (19-58). Thus
since from (19-65)
and c
n
is a suitable normalization constant.
2
(or ),
n n

2
( ) cos sin
sin( ) sin ( ), 0
n n n n n
n n n n n
T
t A t B t
c t c t t T

= +
= = < <
(19-66)
2 2
2
0.
n
n

= >
+
1 1
tan tan / 2,
n n
n n
n
A
T
B

| |
| |
= = =
|
|
\ .
\ .
(19-68)
(19-67)
s
n

PILLAI
tan( / 2) .
n
n
T

=
(19-65)
23
PILLAI
Fig 19.1 Solution for Eq.(19-65).
2
0
tan( / 2) T
/
24
PILLAI
Karhunen Loeve Expansion for Rational Spectra
[The following exposition is based on Youlas classic paper The
solution of a Homogeneous Wiener-Hopf Integral Equation occurring
in the expansion of Second-order Stationary Random Functions, IRE
Trans. on Information Theory, vol. 9, 1957, pp 187-193. Youla is
tough. Here is a friendlier version. Even this may be skipped on a first
reading. (Youla can be made only so much friendly.)]
Let X(t) represent a w.s.s zero mean real stochastic process with
autocorrelation function so that its power spectrum
is nonnegative and an even function. If is rational, then the
process X(t) is said to be rational as well. rational and even
implies
( ) ( )
XX XX
R R =
( )
XX
S
( )
XX
S

0
( ) ( ) 2 ( ) cos
XX XX XX
j t
S R e dt R d
= =

(19-69)
(19-70)
2
2
( )
( ) 0.
( )
XX
N
S
D

=
25
PILLAI
The total power of the process is given by
and for P to be finite, we must have
(i) The degree of the denominator polynomial
must exceed the degree of the numerator polynomial
by at least two,
and
(ii) must not have any zeros on the real-frequency
axis.
The s-plane extension of is given by
Thus
and the Laplace inverse transform of is given by
2
2

( )
( )

1 1
2 2
( )
XX
N
D
P S d d

+ +

= =

(19-71)
( ) 2 D n =
2
( ) D
2
( ) N
( ) 2 N m =
2
( ) D
( ) s j =
( )
XX
S
(19-72)
2 2
( )
k
s

2 2 2
( ) ( )
i
k
k
k
D s s =
(19-73)
( ) s j = +
2
2
2
( )
( ) | ( ) .
( )
XX s j
N s
S S s
D s
=

= =
26
PILLAI
Let represent the roots of D( s
2
) . Then
Let D
+
(s) and D
(s) represent the left half plane (LHP) and the right half
plane (RHP) products of these roots respectively. Thus
where
This gives
Notice that has poles only on the LHP and its inverse (for all t > 0)
converges only if the strip of convergence is to the right
1 2
, , ,
n

1 2
0 Re Re Re
n
<
(19-74)
(19-75)
| |

2 2 1
1
1 ( 1) ( 2)! | |

( 1)! ( 1)!( )! ( ) (2 )
k k j
k
k k j
j
k j
e
k j k j s

+
=
+

2
( ) ( ) ( ), D s D s D s
+
=
(19-76)
1
( )
( )
C s
D s
+
*
0
( ) ( )( ) ( ).
n
k
k k k
k k
D s s s d s D s
+
=
= + + = =

(19-77)
2
2
1 2
2
( ) ( ) ( )
( )
( ) ( ) ( )
C s C s N s
S s
D s D s D s
+
= = +
(19-78)
27
PILLAI
of all its poles. Similarly
C
2
(s) /D
(s) has poles only on

the RHP and its inverse will
converge only if the strip is
to the left of all those poles. In
that case, the inverse exists for
t < 0. In the case of from
(19-78) its transform N(s
2
) /D(s
2
)
is defined only for (see Fig 19.2). In particular,
for from the above discussion it follows that is given
by the inverse transform of C
1
(s) /D
+
(s). We need the solution to the
integral equation
that is valid only for 0 < t < T. (Notice that in (19-79) is the
reciprocal of the eigenvalues in (19-22)). On the other hand, the right
side (19-79) can be defined for every t. Thus, let
( ),
XX
R
1 1
Re Re Re s < <
0, >

0
( ) ( ) ( ) , 0
XX
T
t R t d t T = < <
(19-79)
(19-80)
( )
XX
R
Fig 19.2

Re
n

1
Re
Re
n
1
Re
strip of convergence
for ( )
XX
R
s j =

0
( ) ( ) ( ) ,
XX
T
g t R t d t = < < +
28
PILLAI
and to confirm with the same limits, define
This gives
and let
Clearly
and for t > T
since R
XX
(t) is a sum of exponentials Hence it
follows that for t > T, the function f (t) must be a sum of exponentials
Similarly for t < 0
, for 0.
k
t
k
k
a e t

>
.
k
t
k
k
a e

( ) 0
( ) .
0 otherwise
t t T
t

< <
(19-81)
(19-82)
+

( ) ( ) ( )
XX
g t R t d
+

( ) ( ) ( ) ( ) ( ) ( ) .
XX
f t t g t t R t d
= =
(19-83)
( ) 0, 0 f t t T = < <
(19-84)
( ) ( )
( ) 0
+

( ) { ( )} ( ) 0,
k
XX
D
d d
dt dt
D f t D R t d

+
=
+ +
= =

(19-85)
29
PILLAI
and hence f (t) must be a sum of exponentials
Thus the overall Laplace transform of f (t) has the form
where P(s) and Q(s) are polynomials of degree n 1 at most. Also from
(19-83), the bilateral Laplace transform of f (t) is given by
Equating (19-86) and (19-87) and simplifying, Youla obtains the key
identity
Youla argues as follows: The function is an entire
function of s, and hence it is free of poles on the entire
( ) ( )
( ) 0
+

( ) { ( )} ( ) 0,
k
XX
D
d d
dt dt
D f t D R t d
= =

, for 0.
k
t
k
k
b e t
<
contributes to 0
contributions
in < 0
contributions in
( ) ( )
( )
( ) ( )
sT
t
t
t T
P s Q s
F s e
D s D s
+
>
>
=
_
_
(19-86)
2
2
1 1
( )
( )
( ) ( ) 1 , Re Re Re
N s
D s
F s s s
(
= < <

(19-87)
(19-88)
2 2
( ) ( ) ( ) ( )
( ) .
( ) ( )
sT
P s D s e Q s D s
s
D s N s
+
=

0
( ) ( )
T
st
s t e dt

=
30
PILLAI
finite s-plane However, the denominator on the right
side of (19-88) is a polynomial and its roots contribute to poles of
Hence all such poles must be cancelled by the numerator. As a result
the numerator of in (19-88) must possess exactly the same set of
zeros as its denominator to the respective order at least.
Let be the (distinct) zeros of the
denominator polynomial Here we assume that
is an eigenvalue for which all are distinct. We have
These also represent the zeros of the numerator polynomial
Hence
and
which simplifies into
From (19-90) and (19-92) we get
). Re ( + < < s
) (s
1 2
( ), ( ), , ( )
n

2 2
( ) ( ). D s N s

'
k
s
'
k
s
( ) ( ) ( ) ( ).
sT
P s D s e Q s D s
+
1 2
0 Re ( ) Re ( ) Re ( ) .
n
< < < < < (19-89)
( ) ( ) ( ) ( )
k
T
k k k k
D P e D Q

+
= (19-90)
(19-91) ( ) ( ) ( ) ( )
k
T
k k k k
D P e D Q

+
=
( ) ( ) ( ) ( ).
k
T
k k k k
D P e D Q

+
=
(19-92)
( ). s
31
PILLAI
i.e., the polynomial
which is at most of degree n 1 in s
2
vanishes at
(for n distinct values of s
2
). Hence
or
Using the linear relationship among the coefficients of P(s) and Q(s)
in (19-90)-(19-91) it follows that
are the only solutions that are consistent with each of those equations,
and together we obtain
( ) ( ) ( ) ( ), 1, 2, ,
k k k k
P P Q Q k n = =
(19-93)
( ) ( ) ( ) ( ) ( ) L s P s P s Q s Q s =
(19-94)
2 2 2
1 2
, , ,
n

2
( ) 0 L s
(19-95)
( ) ( ) ( ) ( ). P s P s Q s Q s =
(19-96)
( ) ( ) or ( ) ( ) P s Q s P s Q s = =
(19-97)
32
PILLAI
as the only solution satisfying both (19-90) and (19-91). Let
In that case (19-90)-(19-91) simplify to (use (19-98))
where
For a nontrivial solution to exist for in (19-100), we
must have
( ) ( ) P s Q s =
(19-98)

1
0
( ) .
n
i
i
i
P s p s
=
=
(19-99)
0 1 1
, , ,
n
p p p

1

0
( ) ( ) ( ) ( )
{1 ( 1) } 0, 1, 2, ,
k
T
k k k k
n
i i
k k i
i
P D e D P
a p k n
= = =

(19-100)
( ) ( )
.
( ) ( )
k k
T T
k k
k
k k
D D
a e e
D D

+

+ +

= = (19-101)
33
PILLAI
The two determinant conditions in (19-102) must be solved together to
obtain the eigenvalues that are implicitly contained in the
and (Easily said than done!).
To further simplify (19-102), one may express a
k
in (19-101) as
so that
'
i
s
'
i
a s
'
i
s
2
, 1, 2, ,
k
k
a e k n

= =
1 1
1 1 1 1 1
1 1
2 2 2 2 2
1,2
1 1
(1 ) (1 ) (1 ( 1) )
(1 ) (1 ) (1 ( 1) )
0.

(1 ) (1 ) (1 ( 1) )
n n
n n
n n
n n n n n
a a a
a a a
a a a

= =

. . . .

(19-102)
(19-104)
(19-103)

/ 2 / 2
/ 2 / 2
1 ( ) ( )
tan
1 ( ) ( )
( ) ( )
( ) ( )
k k k
k k k
k k
k k
T
k k k
k
T
k k k
T T
k k
T T
k k
a D e D e e
a e e D e D
e D e D
e D e D

+ +
+ +
+ +
+ +

= = =
+ + +

=
+
h
34
PILLAI
Let
and substituting these known coefficients into (19-104) and simplifying
we get
and in terms of in (19-102) simplifies to
if n is even (if n is odd the last column in (19-107) is simply
Similarly in (19-102) can be obtained by
replacing with in (19-107).
0 1
( )
n
n
D s d d s d s
+
= + + +
(19-105)

2
tan ,
k
h
2 3

0 2 1 3

2 3

0 2 1 3
( ) tan ( / 2) ( )
tan
( ) ( ) tan ( / 2)
k k k k
k
k k k k
d d T d d
d d d d T

+ + + + +
=
+ + + + +

(19-106)
1 1 1
1 2
[ , , , ] ).
n n n
n
T

cot
k
h tan
k
h
2 3 1

1 1 1 1 1 1 1
2 3 1

2 2 2 2 2 2 2

1 tan tan tan
1 tan tan tan

1 tan
n
n
n

. . . . .
2 3 1

0
tan tan
n
n n n n n n

(19-107)
h
h
h
h
h
h
h
h
h
h
h
h
35
PILLAI
To summarize determine the roots with that satisfy
in terms of and for every such determine using (19-106).
Finally using these and in (19-107) and its companion
equation , the eigenvalues are determined. Once are
obtained, can be solved using (19-100), and using that can
be obtained from (19-88).
Thus
and
Since is an entire function in (19-110), the inverse Laplace
transform in (19-109) can be performed through any strip of
convergence in the s-plane, and in particular if we use the strip
,
,
k
'
k
s
2 2
( ) ( ) 0, 1, 2, ,
k k
D N k n = = (19-108)

k
s tanh
k
s

k
s
k
s

k
p s ( )
i
s
( )
i
s
2 2
( ) ( , ) ( ) ( , )
( )
( ) ( )
sT
i i
i
i
D s P s e D s Q s
s
D s N s

=

(19-109)
1
( ) { ( )}.
i i
t L s

= (19-110)
1
Re( ) 0
i
>
36
PILLAI
then the two inverses
obtained from (19-109) will be causal. As a result
will be nonzero only for t > T and using this in (19-109)-(19-110) we
conclude that for 0 < t < T has contributions only from the first
term in (19-111). Together with (19-81), finally we obtain the desired
eigenfunctions to be
that are orthogonal by design. Notice that in general (19-112)
corresponds to a sum of modulated exponentials.
Re Re( ) (to the right of all Re( )),
n i
s >
1 1
2 2 2 2
( ) ( ) ( ) ( )
,
( ) ( ) ( ) ( )
D s P s D s Q s
L L
D s N s D s N s
+

` `

) )
(19-111)
(19-112)
{ }
2 2
1
( ) ( )
( ) ( )
sT
D s Q s
e
D s N s
L

( )
i
t
1
2 2
( ) ( , )
( ) , 0 ,
( ) ( )
Re Re 0, 1, 2, ,
k
k
k
n
D s P s
t L t T
D s N s
s k n

= < <
`

)
> > =
37
PILLAI
Next, we shall illustrate this procedure through some examples. First,
we shall re-do Example 19.3 using the method described above.
Example 19.4: Given we have
This gives and P(s), Q(s) are constants
here. Moreover since n = 1, (19-102) reduces to
and from (19-101), satisfies
or is the solution of the s-plane equation
But |e
sT
| >1 on the RHP, whereas on the RHP. Similarly
|e
sT
| <1 on the LHP, whereas on the LHP.
| |
( ) ,
XX
R e

=
( ) , ( ) D s s D s s
+
= + =
1 1
1 0, or 1 a a = =
1
sT
s
e
s

=
+
2
2 2 2
2 ( )
( ) .
( )
XX
N
S
D

= =
+
(19-113)
(19-114)
1
1 1
1 1
( )
( )
T
D
e
D
+

= =
+
1
s
s
+
<
1
s
s
+
>
38
PILLAI
Thus in (19-114) the solution s must be purely imaginary, and hence
in (19-113) is purely imaginary. Thus with in (19-114)
we get
or
which agrees with the transcendental equation (19-65). Further from
(19-108), the satisfy
or
Notice that the in (19-66) is the inverse of (19-116) because as
noted earlier in (19-79) is the inverse of that in (19-22).
1
1
s j =
2 2
0.
2
n
n

+
= >
(19-116)
(19-115)
2 2 2 2
( ) ( ) 2 0
n
n n n
s j
D s N s

=
= + =
1
1
tan( / 2) T

=
1
1
1
j T
j
e
j
=
+
s
39
PILLAI
Finally from (19-112)
which agrees with the solution obtained in (19-67). We conclude this
section with a less trivial example.
Example 19.5
In this case
This gives With n = 2,
(19-107) and its companion determinant reduce to
1
2 2
( ) cos sin , 0
n n n n n
n
s
t L A t B t t T
s

+
= = + < <
`
+
)
(19-117)
| | | |
( ) .
XX
R e e

= + (19-118)
2
2 2 2 2 2 2 2 2
2 2 2( )( )
( ) .
( )( )
XX
S

+ +
= + =
+ + + +
(19-119)
2
( ) ( )( ) ( ) . D s s s s s
+
= + + = + + +

2 2 1 1

2 2 1 1
tan tan
cot cot

=
=
h h
h h
40
PILLAI
or
From (19-106)
Finally can be parametrically expressed in terms of
using (19-108) and it simplifies to
This gives
and
(19-120)
(19-121)
2 2
1 2
and
2
2
1
( ) ( ) 4 ( )
2
b b c
+
=

1 2
tan tan . = h h
2

2

( ) tan ( / 2) ( )
tan , 1, 2
( ) ( ) tan ( / 2)
i i i
i
i i i
T
i
T

+ + +
= =
+ + +
h
h
h
2 2 4 2 2 2
2 2
4 2
( ) ( ) ( 2 ( ))
2 ( )
0.
D s N s s s
s bs c

= + +
+ +
= + =
41
PILLAI
and
and substituting these into (19-120)-(19-121) the corresponding
transcendental equation for can be obtained. Similarly the
eigenfunctions can be obtained from (19-112).
2
2 2 2
2 1
( ) ( ) 4 ( )
( ) 4 ( )
2
b b c
b c

= =
i
s
1
20. Extinction Probability for Queues
and Martingales
(Refer to section 15.6 in text (Branching processes) for
discussion on the extinction probability).
20.1 Extinction Probability for Queues:
A customer arrives at an empty server and immediately goes
for service initiating a busy period. During that service period,
other customers may arrive and if so they wait for service.
The server continues to be busy till the last waiting customer
completes service which indicates the end of a busy
period. An interesting question is whether the busy periods
are bound to terminate at some point ? Are they ?
PILLAI
2
Do busy periods continue forever? Or do such
queues come to an end sooner or later? If so, how ?
Slow Traffic ( )
Steady state solutions exist and the probability of extinction
equals 1. (Busy periods are bound to terminate with
probability 1. Follows from sec 15.6, theorem 15-9.)
Heavy Traffic ( )
Steady state solutions do not exist, and such queues can be
characterized by their probability of extinction.
Steady state solutions exist if the traffic rate Thus
What if too many customers rush in, and/or the service
rate is slow ( ) ? How to characterize such queues ?
. 1 <
{ }
lim ( ) 1. exists if
k
n
p P X nT k
= = <
1
1
1 >
PILLAI
3
Extinction Probability for Population Models
) (
0
3
26 X =
1
3 X =
2
9 X =
( ) 2
1
Y
( ) 2
2
Y
( ) 2
3
Y
( ) 3
1
Y
( ) 3
2
Y
( ) 3
3
Y
( ) 3
5
Y
( ) 3
4
Y
( ) 3
6
Y
( ) 3
8
Y
( ) 3
7
Y
( ) 3
9
Y
0
1 X =
0
1 X =
Fig 20.1
PILLAI
4
Offspring moment generating function:
=
=
0
) (
k
k
k
z a z P
Queues and Population Models
Population models
: Size of the n
th
generation
: Number of offspring for the i
th
member of
the n
th
generation. From Eq.(15-287), Text
n
X
( ) n
i
Y
(
1
1
n
X
n
n k
k
X Y
+
=
=
)
Let
z
) (z P
0
a
1
1
PILLAI
(20-1)
Fig 20.2
( )
= { }
n
k i
a P Y k =
5
{ }
1
1 1
1 1
0
( ) { } { }
( { })
{[ ( )] } { ( )} { } ( ( ))
n
j
i
n i
X k
n n
k
Y
X
n n
j
j
n n
j
P z P X k z E z
E E z X j E E z X j
E P z P z P X j P P z
+
+ =
+ +
=
= = =
| |
= = = =
|
\ .
= = = =
)) ( ( )) ( ( ) (
1
z P P z P P z P
n n n
= =
+
Extinction probability satisfies the equation
which can be solved iteratively as follows:
z z P = ) (
0
2, 1, , ) (
1
= =

k z P z
k k
{ } ?
n
P X k = =
lim { 0} ? Extinction probability
n o
n
P X
= = = =
and
PILLAI
(20-2)
(20-3)
(20-4)
(20-5)

0 0
(0) z P a = =
6
Let
0 0
( ) (1) { } 0
i i k
k k
E Y P k P Y k ka

= =
= = = = = >

0
0
0 0
1 1
1 ( ) ,
1
is the unique solution of P z z
a

> =
`
< <
)
Left to themselves, in the long run, populations either
die out completely with probability , or explode with
probability 1- (Both unpleasant conclusions).
0
.
0
Review Theorem 15-9 (Text)

PILLAI
(20-6)
(20-7)
0
a
0
1
1
( ) z P
1
1 >
0
a
(a)
(b)
1
( ) z P
Fig 20.3
7
arrivals between two
( ) 0.
successive departures
k i
k
a P Y k P

= = =
`
)
Note that the statistics depends both on the arrival as well
as the service phenomena.
{ }
k
a
Queues :
2
s
2
X
1 Y
t
0
1
X
0
X
3
X
1
s
3
s
4
s
2 Y 3 Y
k
s
1
1
n
X
n i
i
X Y
+
=
=
Customers that arrive

during the first service time
Service time of
first customer
Service time of
second customer
First
customer
Busy period
PILLAI
Fig 20.4
8
: Inter-departure statistics generated by arrivals
) (
0
=
=
k
k
k
z a z P
: Traffic Intensity Steady state
Heavy traffic
=
=
=
1
) 1 (
k
k
ka P
1
Termination of busy periods corresponds to extinction of
queues. From the analogy with population models the
extinction probability is the unique root of the equation
0
z z P = ) (
Slow Traffic :
Heavy Traffic :
i.e., unstable queues either terminate their busy periods
with probability , or they will continue to be busy with
probability 1- . Interestingly, there is a finite probability of
busy period termination even for unstable queues.
: Measure of stability for unstable queues.
1 0 1
0
< < >
1 1
0
=
1
0
<
0
1 >
}
( 1) >
PILLAI
9
Example 20.1 : M/M/ 1 queue
2, 1, 0, ,
1 1
1
=
|
.
|
\
|
+ +
=
|
.
|
\
|
+ +
= k a
k k
k
1
From (15-221), text,we have
t
) ( P
) ( P
Number of arrivals between any two departures follows
a geometric random variable.
>
=
= = + + =
=
+
=
1 ,
1
1 , 1
0 1) - z ( ) 1 ( 1 ) 1 ( ) (
,
)] 1 ( 1 [
1
) (
0
2
z z z z z P
z
z P
PILLAI
(20-8)
(20-10)
(20-9)
Fig 20.5
10
Example 20.2 : Bulk Arrivals M
[x]
/M/ 1 queue
Compound Poisson Arrivals : Departures are exponential
random variables as in a Poisson process with parameter
Similarly arrivals are Poisson with parameter However each
arrival can contain multiple jobs.
2, 1, , 0 k} {
,
= = = k
c
A P
k
i
.
: Number of items arriving at instant
i
i
t
A
=
= =
0
} { ) (
k
k
k
A
z c z E z C
i
Let
and represents the bulk arrival statistics.
PILLAI
.
A
1
A
2
) ( P
) ( P
t
1
t 2
t
Fig 20.6
11
Inter-departure Statistics of Arrivals
1
( ) [1 {1 ( )}] , Let (1 ) , 0, 1, 2,
1 1
( ) , ( )
1 1 (1 )
k
k
P z C z c k
z
C z P z
z z

= + = =

= =
+ +
)) 1 ( /( 1 0 1] - )z (1 [ 1) - (z ) (
1
1
) 1 (
0
Rate Traffic

+ = = + =
>
=
z z P
P
0
( ) { k} { }
k Y
k
P z P Y z E z
=
= = =
( )
1 2
( )
0
0

0
0

0
0
{ (0, )} { (0, )} (t)
( )
[ { }] ,
!
[ ( )] 1
! 1 {1 ( )}
arrivals in arrivals in
n
i
A A A
s
n
n
A n t t
n
n
t
n
E z n t P n t f dt
t
E z e e dt
n
t C z
e
n C z

+ + +
=

+
=
=
= =
= =
+
PILLAI
(20-11)
(20-12)
12
Bulk Arrivals (contd)
Compound Poisson arrivals with geometric rate
0
0
1 1
, .
(1 )
2
For , we obatain
3
3 1
,
2 (1 ) 2
= >
+
=
= >
+
Doubly Poisson arrivals gives
(1 )
( )
z
C z e

=
PILLAI
1
( )
1 [1 ( )]
P z z
C z
= =
+
(20-13)
(20-14)
(20-15)
0
2
1
1
1
0
M/M/1
(a)
(b)
Fig 20.7
1
13
Example 20.3 : M/E
n
/ 1 queue (n-phase exponential service)
From (16-213)
1 1 = n / M / M
2 1
2
= n / E / M
5 . 0
0
=
38 . 0
0
=
2 =
Example 20.4 : M/D/1 queue
Letting in (16.213),text, we obtain
so that
1 ,
2
8 1 1
0
2
2
> |
.
|
\
| + +
= =
n
m
. 1 ,
) 1 (
0
>

e
e
0
(1 / ) , 1
n
n n

+ >> (20-17)
n
z
n
z P

|
.
|
\
|
+ = ) 1 ( 1 ) (

n
z x
n
x
n
x
n
x z z P
1
, 0
1
) ( = =
|
.
|
\
|
+ +
+ =
(20-16)
, ) (
) 1 ( z
e z P

=

PILLAI
(20-18)
14
20.2 Martingales
Martingales refer to a specific class of stochastic processes
that maintain a form of stability in an overall sense. Let
refer to a discrete time stochastic process. If n refers
to the present instant, then in any realization the random
variables are known, and the future values
are unknown. The process is stable in the sense
that conditioned on the available information (past and
present), no change is expected on the average for the future
values, and hence the conditional expectation of the
immediate future value is the same as that of the present
value. Thus, if
{ , 0}
i
X i
0 1
, , ,
n
X X X
1 2
, ,
n n
X X
+ +

(20-19)
1 1 1 0
{ | , , , , }
n n n n
E X X X X X X
+
=
PILLAI
15
for all n, then the sequence {X
n
} represents a Martingale.
Historically martingales refer to the doubling the stake
strategy in gambling where the gambler doubles the bet on
every loss till the almost sure win occurs eventually at which
point the entire loss is recovered by the wager together with
a modest profit. Problems 15-6 and 15-7, chapter 15, Text
refer to examples of martingales. [Also refer to section 15-5,
Text].
If {X
n
} refers to a Markov chain, then as we have
seen, with
Eq. (20-19) reduces to the simpler expression [Eq. (15-224),
Text]
1
{ | },
ij n n
p P X j X i
+
= = =
.
ij
j
j p i =
(20-20)
PILLAI
16
PILLAI
For finite chains of size N, interestingly, Eq. (20-20) reads
implying that x
2
is a right-eigenvector of the transition
probability matrix associated with the eigenvalue 1.
However, the all one vector is always
an eigenvector for any P corresponding to the unit eigenvalue
[see Eq. (15-179), Text], and from Perrons theorem and the
discussion there [Theorem 15-8, Text] it follows that, for
finite Markov chains that are also martingales, P cannot be
a primitive matrix, and the corresponding chains are in fact
not irreducible. Hence every finite state martingale has
at least two closed sets embedded in it. (The closed sets in the
two martingales in Example 15-13, Text correspond to two
absorbing states. Same is true for the Branching Processes
discussed in the next example also. Refer to remarks
following Eq. (20-7)).

2 2 2
, [1, 2, 3, , ]
T
P x x x N = = (20-21)
N N
1
[1, 1, 1, , 1]
T
x =
( )
ij
P p =
17
PILLAI
Example 20.5: As another example, let {X
n
} represent the
branching process discussed in section 15-6, Eq. (15-287),
Text. Then Z
n
given by
is a martingale, where Y
i
s are independent, identically
distributed random variables, and refers to the extinction
probability for that process [see Theorem 15.9, Text].
To see this, note that
where we have used the Markov property of the chain,
1
0
1
,
n
n
X
X
n n i
i
Z X Y

=
= =

(20-22)
0
1
0
1 0 0 0
0 0 0 0
1
since { } is
a Markov chain
{ | , , } { | , , }
{ | } [ { }] [ ( )] ,
n
k
X k Y
n i
i i n n
n
X
n n n
Y X X
n n
i
X
E Z Z Z E X X
E X k E P Z

+
=
=
+
=
=
= = = = = =
(20-23)
since Y
i
s are
independent of X
n
use (15-2)
18
PILLAI
the common moment generating function P(z) of Y
i
s, and
Theorem 15-9, Text.
Example 20.6 (DeMoivres Martingale): The gamblers ruin
problem (see Example 3-15, Text) also gives rise to various
martingales. (see problem 15-7 for an example).
From there, if S
n
refers to player As cumulative capital
at stage n, (note that S
0
= $ a ), then as DeMoivre has observed
generates a martingale. This follows since
where the instantaneous gain or loss given by Z
n+1
obeys
and hence
( )
n
S
n
q
p
Y =
(20-24)
1 1 n n n
S S Z
+ +
= + (20-25)
1 1
{ 1} , { 1} ,
n n
P Z p P Z q
+ +
= = = = (20-26)
( )
( )
1
1
1 1 0 1 0
{ | , , , } { | , , , }
{ | },
n
n n
S
n n n n n
S Z
n
q
p
q
p
E Y Y Y Y E S S S
E S
+
+
+
+
=
=

19
( ) ( )
( )
( )
1
1 1 0
{ | , , , }
n n
S S
n n n n
q q q q
p p p p
E Y Y Y Y p q Y
+
= + = =
PILLAI
since {S
n
} generates a Markov chain.
Thus
i.e., Y
n
in (20-24) defines a martingale!
Martingales have excellent convergence properties
in the long run. To start with, from (20-19) for any given
n, taking expectations on both sides we get
Observe that, as stated, (20-28) is true only when n is known
or n is a given number.
As the following result shows, martingales do not fluctuate
wildly. There is in fact only a small probability that a large
deviation for a martingale from its initial value will occur.
(20-27)
1 0
{ } { } { }.
n n
E X E X E X
+
= = (20-28)
20
PILLAI
Hoeffdings inequality: Let {X
n
} represent a martingale and
be a sequence of real numbers such that the random
variables
Then
Proof: Eqs. (20-29)-(20-30) state that so long as the
martingale increments remain bounded almost surely, then
there is only a very small chance that a large deviation occurs
between X
n
and X
0
. We shall prove (20-30) in three steps.
(i) For any convex function f (x), and we have
(Fig 20.8)
1 2
, , ,
2 2
1
( 2 )
0
/
{| | } 2
i
n
i
x
n
P X X x e

=

(20-30)
(20-29)
0 1, < <
1 2 1 2
( ) (1 ) ( ) ( (1 ) ), f x f x f x x + + (20-31)
1
1 .
i i
i
i
X X
Y with probability one
21
PILLAI
which for
and
gives
Replacing a in (20-32) with any zero mean random variable
Y that is bounded by unity almost everywhere, and taking
expected values on both sides we get
Note that the right side is independent of Y in (20-33).
On the other hand, from (20-29)
and since Y
i
s are bounded by unity, from (20-32) we get
(as in (20-33))
1 1
, 1 ,
2 2
a a

+
= =
1 2
| | 1, 1, 1 a x x < = =
( ) , 0
x
f x e
= >
1 1
(1 ) (1 ) , | | 1.
2 2
a
a e a e e a

+ + <
(20-32)
2
/ 2
1
2
{ } ( )
Y
E e e e e

+
(20-33)
1 0 1 1 1 1
{ | , , , } ( | ) 0
i i i i i i i
E Y X X X E X X X X X

= = = (20-34)
Fig 20.8
1
x
2
x
1
( ) f x
2
( ) f x
( ) f x
x
1 2
(1 ) x x +
1 2
( ) (1 ) ( ) f x f x +
1 2
( (1 ) ) f x x +
i
i
22
(ii) To make use of (20-35), referring back to the Markov
inequality in (5-89), Text, it can be rewritten as
and with
But
2
/ 2
1 1 0
{ | , , , }
i
Y
i
E e X X X e

(20-35)
{ } { }, 0
X
P X e E e

>
0
( )
0
{ } { }
n
X X x
n
P X X x e E e

(20-36)
(20-37)
0
, and we get
n
X X X x = =
(20-38)
PILLAI
2 2
2
0 1 1 0
1 0
1 0
/ 2
2 2 2
1 1 0
( ) ( ) ( )
( )
1 1 0
( )
1 1 0
using (20-35)
( ) / 2 / 2
{ } { }
[ { | , , , }]
[ { | , , , }]
{ } .
n
i
n n n n
n n n
n n n
n
i n n
X X X X X X
X X Y
n
X X Y
n
e
X X
E e E e
E E e e X X X
E e E e X X X
E e e e

=
+

=
=
=

use (20-29)
23
Substituting (20-38) into (20-37) we get
(iii) Observe that the exponent on the right side of (20-39) is
minimized for and hence it reduces to
The same result holds when X
n
X
0
is replaced by X
0
X
n
,
and adding the two bounds we get (20-30), the Hoeffdings
inequality.
From (20-28), for any fixed n, the mean value
E{X
n
} equals E{X
0
}. Under what conditions is this result
true if we replace n by a random time T ? i.e., if T is a
random variable, then when is
PILLAI
2 2
1
( / 2)
0
{ }
i
n
i
x
n
P X X x e

=

(20-39)
2
1
/
n
i
i
x
=
=

(20-40)
2 2
1
/ 2
0
{ } , 0.
i
n
i
x
n
P X X x e x

>
24
The answer turns out to be that T has to be a stopping time.
What is a stopping time?
A stochastic process may be known to assume a
particular value, but the time at which it happens is in general
unpredictable or random. In other words, the nature of the
outcome is fixed but the timing is random. When that outcome
actually occurs, the time instant corresponds to a stopping
time. Consider a gambler starting with $a and let T refer to the
time instant at which his capital becomes $1. The random
variable T represents a stopping time. When the capital
becomes zero, it corresponds to the gamblers ruin and that
instant represents another stopping time (Time to go home for
the gambler!)
PILLAI
(20-41)
0
{ } { }.
T
E X E X =
?
25
Recall that in a Poisson process the occurrences of the first,
second, arrivals correspond to stopping times
Stopping times refer to those random instants at which there
is sufficient information to decide whether or not a specific
condition is satisfied.
Stopping Time: The random variable T is a stopping time
for the process X(t), if for all is a
function of the values of the process up to
t, i.e., it should be possible to decide whether T has occurred
or not by the time t, knowing only the value of the process
X(t) up to that time t. Thus the Poisson arrival times T
1
and T
2
referred above are stopping times; however T
2
T
1
is not
a stopping time.
A key result in martingales states that so long as
PILLAI
1 2
, , . T T
0, { } the event t T t
{ ( ) | 0, } X t >
26
T is a stopping time (under some additional mild restrictions)
Notice that (20-42) generalizes (20-28) to certain random
time instants (stopping times) as well.
Eq. (20-42) is an extremely useful tool in analyzing
martingales. We shall illustrate its usefulness by rederiving the
gamblers ruin probability in Example 3-15, Eq. (3-47), Text.
From Example 20.6, Y
n
in (20-24) refer to a martingale in the
gamblers ruin problem. Let T refer to the random instant at
which the game ends; i.e., the instant at which either player A
loses all his wealth and P
a
is the associated probability of ruin
for player A, or player A gains all wealth $(a + b) with
probability (1 P
a
). In that case, T is a stopping time
and hence from (20-42), we get
PILLAI
0
{ } { }.
T
E X E X = (20-42)
27
since player A starts with $a in Example 3.15. But
Equating (20-43)-(20-44) and simplifying we get
that agrees with (3-47), Text. Eq. (20-45) can be used to derive
other useful probabilities and advantageous plays as well. [see
Examples 3-16 and 3-17, Text].
Whatever the advantage, it is worth quoting the master
Gerolamo Cardano (1501-1576) on this: The greatest
advantage in gambling comes from not playing at all.
PILLAI
( )
( )
1
1
b
a
a b
p
q
p
q
P
+
(20-45)
( )
0
{ } { }
a
T
q
p
E Y E Y = =
(20-43)
( ) ( )
( )
0
{ } (1 )
(1 ).
a b
T a a
a b
a a
q q
p p
q
p
E Y P P
P P
+
+
= +
= +
(20-44)

Probability Theory and Stochastic Processes Lectures

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Probability Theory and Stochastic Processes Lectures

Hochgeladen von

Copyright:

Verfügbare Formate

1

Thus if and in (4-1) are within or around the

x erf(x) x erf(x) x erf(x) x erf(x)

Conditional Probability Density Function

Thus the definition of the conditional distribution depends

To compute (6-51), let us examine its one sided factor

Fig 6.3 Negative root

< < <

< < < <

< < <

< < <

< + < < < <

which shows that Y is still Cauchy with parameter

i.e., sum of independent Poisson r.vs is also a Poisson

< < <

since it is given that is a constant, behaves

(b) a-posteriori p.d.f of

(a) a-priori p.d.f of

< < <

det 0, (for all sufficiently large ),

< < <

cos and sin t t

(s) has poles only on

Review Theorem 15-9 (Text)

Customers that arrive

Das könnte Ihnen auch gefallen