2016-CME620 Stochastic

What We Will Study?
- 1
CME620 Stochastic Processes

Department of Telecommunications Engineering
CME620
2016
Week
1
f X ( x)
1
2
b g
0607
. a
X ~ N ,
3
px(k)
Lecture Guide
Prof. Okechukwu C. Ugweje

Prof. Okey Ugweje
Nigerian Turkish Nile University, Abuja
Topics
1.
2.
3.
4.
Course Introduction
Set Theory and Venn Diagrams
Unions, Intersections, Compliments, etc.
Probability Theory
Probability Space and Probability Measure
Axioms of Probability
Conditional Probability
Independence of Events (mutually exclusive events)
Partition- Law of total probability
Bayes' Rule
2
1. Definition and Characterization of One Random Variable
Probability Distribution Function (cdf) and their properties
Probability Density Function (pdf) and their properties
Probability Mass Function (pmf) and their properties
2. Conditional distributions and densities
3. Important Random Variables (Discrete and Continuous)
Discrete Binomial, Bernouli, Poisson, Hypergeometric,
- Uniform,
Exponential,
Rayleigh, Nakagami, 2
(c) Prof. Okey Ugweje Continuous Federal
University
of Technology,Gaussian,
Minna
What We Will Study? - 2

1. Statistical Properties of one Random Variable

Expected value (mean value)
Mean Square Value
Time Average
Statistical Average versus Time Average
Variance
2. Transformation of a Random Variable (cdf and pdf)
3. Calculating probabilities through cdf and pdf
END OF COMBINED COURSES
Set Theorem and Venn Diagram
Probability is too important to be left to the

mathematician
- Unknown Engineer
(c) Prof. Okey Ugweje
Federal University of Technology, Minna
Set Theory - 1
Set Theory - 2
Definition:
A set is a collection of distinct objects called elements
Usually written as a list of elements enclosed in
brace { }
Since elements must be distinct, 2 or more
elements in a set cannot be the same
Example 1:
{1,2,3} is a valid set whereas {1,1,3} is not
Set can be made up of elements which are
themselves sets
Set can be finite or infinite
Example 2:
The set of all positive integers {0,1,2,3, } is countably
infinite, whereas the set of all real number [0,1] is
uncountably infinite
All sets are subsets of the sample space
Definition:
The union of two sets A and B (denoted as A B) is a
set that contains all elements in either A or B
A B | A or B
For more than two elements
Set Theory - 3
Example 3:
If A = {1,2,4}, and B = {1,3,5}, then A B = {1,2,3,4,5},
Definition:
A set A is a subset of a set B (denoted as A B) if all the elements
of the set A are also in the set B.
Only one occurrence of an element in a set is allowed
Example 5:
Set A = {1, 2} is a subset of set B = {1, 2, 3, 5}
Definition:
The intersection of two sets A and B (denoted as A B) is a
set that contains only the elements that appear in both sets
Sometimes it is easier to describe a set by describing what is

not in the set. This leads to the concept of complement.
In general, if S contains n elements, then there are 2n subsets
A B | A and B
n
i 1
i 1
Definition: The complement of a set of all elements in the

universe that are not in the set.
A x|x A
Ai Ai
Example 6:
If ={1, 2, 3, 4, 5}, the complement of the set B = {1, 2, 3},
is the set Bc = {4, 5}
Example 4:
If A = {1, 2, 4}, and B = {1, 3, 5}, then A B = {1},
Set Theory - 4
For more than two elements
n
n
Ai Ai
i 1
i 1
Set Theory - 5
Set Theory - 6
Notice that c = and c =

With above definitions, we can describe complex
collection of objects
Some relationships with set are important enough
to have special names
Set Operators:
Definition: The sets A and B are said to be mutually

exclusive (or disjoint) if they have no elements in
common; i.e., A B =
Definition: The sets A and B are said to be mutually
exhaustive if they contain all the elements of the
universe; i.e., A B =
= universal set
= null set
= union
= intersection
, = subsets
element of
Venn Diagrams - 1
10
Venn Diagrams - 2
A Venn diagram is a geometric representation of sets

A
B
Union
All elements of both A and B
S
At least one of A or B occurs
A B A B
Parallel systems
Mathematical expression: AB = {x: x A or x B}
In a situation where one or more of the events A
occurs, we have
n
n
Ak A1 A2 An Ak
k 1
k 1
, = subsets and equality
= not a subset, = is an element of, not an
= S
Also for infinite union of sets, we have
Ak A1 A2 Ak ... Ak
k 1
k 1
Many more union relationships can be developed

especially when restrictions are placed on some sets
Some useful Union relationship:
A B B A
A A
A A A
A S S
A A S
A B C A B C
A B A if B A
11
12
Venn Diagrams - 3
Venn Diagrams - 4
Mathematical expression: A B = AB = {x: x A and x B}

In a situation where events occur in all experiment we have
Intersection (Product)
Elements common to all sets
Elements contained in all sets
Events occur in all experiment
Series systems
If A B = then A and B are said to be mutually exclusive

Some useful intersection relationship:
A B B A
A
A
B
C
AB
Ak A1 A2 Ak Ak
k 1
k 1
S
A
Also for infinite intersection of sets, we have
n
n
Ak A1 A2 Ak Ak
k 1
k 1
ABC
13
A A A
A S A
A A
A B C A B C
Venn Diagrams - 5
Partition: A partition of is a collection of mutually

exclusive subsets of such that their union is .
Ai A j , and
Mathematical expression: Ac = {x: x S and x A}

Some useful relationship:
i 1
A1
B
Aj
A2
Ai
An
Complements (Inversion, Opposite)

A
Ac
A A S
A A
A A
A B A B
A B A B
Consist of elements of set A not in set B

A - B = A Bc = A- (A B)
A
c
A-B
B - A = A B
B
Ac
S , S
Difference
A B
Ac
Mathematical expression: Ac = {x: x S and x A}

14
Venn Diagrams - 6
mutually
exclusive
B
B-A
S
15
16
Venn Diagrams - 7
Example 7 Venn Diagram
Subsets
B
A
S
B
C
EF
AB
ABC
De-Morgan's Law
A B A B ;
A
A B
A B
Ec
A B A B
A
E G
A
n
A B
i 1
17
i 1
F G
Aic ;
Example 8 Venn Diagram
k 1
E F G
E G F G
Bk Bkc
k 1
Federal University of
18 Technology, Minna
Example 8
Before launching a new academic program at the Federal

University of Technology (FUT) Minna, the office of the Vice
Chancellor conducted a survey of 130 engineering students to
determine the suitability of one of the following names:
A: Communications Engineering;
B: Communication Systems Engineering; and
C: Communications Technology
The findings of the survey are summarized as follows: 51 liked
name A; 25 liked name A and B; 63 liked name B; 18 liked name
A and C; 47 liked name C; 23 liked name B and C; 10 liked name
A and B and C.
a) Draw a Venn Diagram representing the above survey indicating
all the necessary numbers on the diagram.
b) If a participating FUT Minna student is selected at random, what
is the probability that he or she disliked all 3 program names?
a) The number in the sample space is 130 (i.e.,

N(S)=130) and the Venn diagram is shown below
Page 18
Page 19
b) Let Z = "people that like none of the names". From

Venn diagram in (a), we have N(Z)=25.
PZ
NZ
NS
25
130
0. 192
Page 20
Probability Theory - 1
Probability theory is concerned with the solution of

problems that involve uncertainty and randomness
It is important in the solution of many engineering
problems
Many of todays practical systems work in a chaotic
environment and in order to design efficient, reliable
and cost effective systems, probabilistic models must
be used
Through Random Variables and Random Processes,
we can talk about quantities and signals which are
unknown in advance
Review of Probability Theory

Probability is too important to be left to the
mathematician
- Unknown Engineer
21
Probability Theory - 2
22
Some Applications - 1
For example
Data sent through a communication system is
random since the outcome at the receiver is not
certain
Noise, interference and fading introduced by the
channel are random processes and can only be
modeled as such
The measure of performance (e.g., Bit Error Rate)
is probabilistic since it is an estimate of the received
signal compared to the transmitted signal
Random Input Signals

Input Signal
(Forcing Function)
System
Output Signal
Input of many physical systems involve a certain degree of

uncertainty/unpredictability that justifies random treatment,
e.g.,
Speech/music signal input of a communication system
Digits applied to a computer
Random signals applied to an aircraft flight control system
Random inputs to process control systems
Steering wheel movements in an automobile power-steering
system
23
24
Random Input Disturbances
Random Input Disturbances

System
s(t) + n(t)
System
Output Signal
Input
Output Signal
Noise n(t) is almost always random in nature and calls for the
use of probabilistic methods even if the signal s(t) is not, e.g.,
Thermal noise
Thermal motion of the conduction electrons in the amplifier
input circuit
Random variations in the number of electrons (or holes)
passing through a transistor
Since there are millions of electrons, one cannot calculate
the value of this kind of noise at every instant of time, but
can calculate:
Noise n(t) is almost always random in nature and calls for the
use of probabilistic methods even if the signal s(t) is not, e.g.,
Thermal noise
Thermal motion of the conduction electrons in the amplifier
input circuit
Random variations in the number of electrons (or holes)
passing through a transistor
Since there are millions of electrons, one cannot calculate
the value of this kind of noise at every instant of time, but
can calculate:
25
26
Quality Control
An important method of improving system reliability
is to improve quality of the individual elements.
This is often done by an inspection process since it
will be too costly to inspect every element
Information Theory (IT)

Information theory deals with the info content of
message signals such as printed pages, speech,
graphical data, velocity, radiation intensity, etc.
Since such messages and observations are
unknown in advance & random in nature, they can
only be described with probability/random process
The communication channels are subject to
random disturbances that limit their ability to
convey information. To analyze them, probabilistic
models are indispensable
Thus, it is very necessary to develop rules for

inspecting the elements selected at random. These
rules are based on probabilistic models
27
28
It is clear by now that almost any engineering

endeavor involves some degree of uncertainty and
randomness that makes the use of probability and
stochastic concepts a fundamental requirement.
In communication Systems, Randomness is a
CERTAINTY!!
Probability Concepts
We see that the theory of probability is at heart
only common sense reduced to calculations ...
- Laplace Pierre Simon
29
Probability Concepts
Probability Spaces
Probability theory deals with the study of random

phenomena
Experiment that do not yield the same outcome in
repeated trials or observations under the same
condition
Averages of phenomena occurring sequentially or
simultaneously
The observed averages approach a constant as
the number of experiments increases
When an experiment is performed, certain
elementary events, Ai occur in different but
completely uncertain ways
The triple (S, A, P) is called the probability space

where
S = sample space
A = event space
P = a mapping function
30
31
32
Probability Spaces
Example 9 Sample Spaces
Sample Space (S)

Set of all possible outcome of an experiment or trial or
observation
Individual outcomes are called elements or points in
the sample space, S = {s1,s2,s3,...}
Number of points in a sample space may be
a) finite (or bounded)
b) countable infinite (or discrete or can be
enumerated but not end)
c) simply infinite (continuous or unbounded)
Sometimes, S can include outcomes that are
impossible
Simple examples of sample spaces

Consider tossing a coin:
33
S = {head, tail} = {H, T} = {1, 0}
Consider tossing two coins:

S = {TT, TH, HT, HH} = {00, 01, 10, 11}
Consider tossing three coins:

S = {(000), (001), (010), , (111)}
Consider throwing a pair of dice:

S = {(1,1), (2,1),, (6,1), (5,6), (6,6)}
Consider two cards from a deck:

S = {(1,2), (2,1), , (51,52)}
= {(x,y): 1 x 52, 1 y 52, xy}
Example 10 Sample Spaces

Tossing of 2 Dice
a) Dice are distinguishable
S1 = {(1,1), (1,2), , (1,6); (2,1); (2,2), , (2,6);
(3,1); (3,2), , (3,6); (4,1); (4,2), , (4,6);
(5,1); (5,2), , (5,6); (6,1); (6,2), , (6,6)}
= {6}+{6}+{6}+{6}+{6}+{6} = 36 elements (or 62)
b) Dice are indistinguishable
S2 = {(1,1), (1,2), , (1,6); (2,1); (2,2), , (2,6);
(3,1); (3,2), , (3,6); (4,1); (4,2), (4,3), , (4,6);
(5,1); (5,2), , (5,6); (6,1); (6,2), (6,3), (6,4), (6,5), (6,6)}
= {6}+{5}+{4}+{3}+{2}+{1} = 21 elements
c) May also use Tabular method
34
Event - 1
In most experiments, we are interested in a specific outcome

that satisfies a given condition
Outcome of interest defines a Subset of the Sample Space
A
Definition:
An Event, A, is a set of outcomes;
a subset of the sample space
Event is any possible outcomes of
an experiment. It is the simplest random phenomenon
Event is usually known as the information space
Each Event has associated quantity which characterizes the
objective likelihood of occurrence of that event
That quantity is the probability of the event
36
Example 11 - Events
Event - 2
In toss of 3 coins, we are interested in the occurrence of

the following events:
A = {more heads than tail}
= {(111), (011), (101), (110)}
B = {same outcome}
= {(111), (000)}
C = {at least 2 heads}
= {[2 heads] or [3 heads]}
In throwing a pair of dice, sum of dots that show up to be
even
S = {2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}
D = {sum is even} = {2, 4, 6, 8, 10, 12}
Special events
There are two special events of interest:
1) Universal Set ( or S)
Set containing all elements
The totality of all elementary event i, known a priori,
1 , 2 , , k ,
Also known as Certain Event

2) Impossible (or null) Event ()
- never occurs or contains no outcome
- arises when none of the outcomes satisfy the given
condition
37
38
Definition of Probability
Axiomatic Probability widely accepted definition

Probability based on a set of axioms or rules
Based on the concept of probability space sample space, elements of a sample space and set
theory
Axiomatic probability assigns a number to an event
Axioms of Probability
The theory of probability as a mathematical
discipline can and should be developed from axioms
in exactly the same way as geometry and algebra.
Andrey Kolmogorov
39
40
Axioms of Probability - 1
From Axiomatic Probability definition, we say that

the probability P(A), of an event A, is a number
assigned to the event satisfying the following axioms:
Note:
(iii) states that if A and B are mutually exclusive (M.E.)
events, the probability of their union is the sum of their
probabilities, i.e.,
P A 0 (Probability is a nonnegative number)

ii: P 1 (Probability of the whole set is unity)
iii: If A B , then P A B P A P B .
i:
P A B P A P B ,
If A and B cannot occur simultaneously
This is the minimum number of axioms required to
establish the remaining concept of probability.
These axioms allow us to view events as object
with properties.
iv: If A , A , ... is a sequence of events such that

1
A A = for all i j , then

i
k 1
k 1
P[ Ak ] P[ Ak ]
41
The following conclusions follow from these axioms:

a) Since A A , we have using (ii) that
c) Suppose A and B are not mutually exclusive (M.E.)?
P A A P 1 .
How does one compute P ( A B ) ?

To compute the above probability, we should re-express A B
in terms of M.E. sets so that we can make use of the
probability axioms.
From Figure, we have
A
AB
But since A A , and using (iii)
P A A P A P A 1
A B A AB ,
where A and A B are clearly M.E. events.
P A 1 P A .
42
These axioms provide us with consistent rules that

any valid probability assignment must satisfy.
b) Similarly, for any A, A .
Hence it follows that P A P ( A ) P ( ) .
But A A , and thus P 0 .
or
43
A B
44
Thus using axioms (iii)

P A B P ( A AB ) P ( A ) P ( AB ).
Additional useful properties (or rules) of probability

theory or direct consequence of the axioms can be
developed as "Corollaries".
To compute P ( AB ),we can express B as

B B B ( A A)
Here are some useful Corollaries:
( B A) ( B A) BA B A
Corollary 1:
P B P ( BA) P ( B A),
P[Ac] = 1 - P[A]
P[S] = P[A Ac] = P[A] + P[Ac] = 1
Thus P AB P ( B ) P ( AB )
since BA AB and B A AB are M.E. events.
Hence,
P A B P ( A) P ( B ) P ( AB ).
and using other relations
Corollary 2:
P[A] 1 (Directly from axiom I)
45
46
Corollary 3:
Corollary 5:
P[AB] = P[A] + P[B] - P[A B]
P[] = 0 or P[] = 1 - P[S] = 0
AcB
Corollary 4:
If A1, A2, , are pairwise mutually exclusive,

then
k 1
A B = A (Ac B)
P[A B] = P[A] + P[Ac B]
P[A] = P[A B] - P[Ac B];
Substituting in will yield
P[A B] = P[A] + P[B] - P[A B]
AB
A+B
PL A O P[ A ], n 2
MN PQ
k 1
47
B = (A B) (Ac B)
P[B] = P[A B] + P[Ac B]
P[Ac B] = P[B] - P[A B]
48
Corollary 5A: P[ABC] = ?
General expression for probability of union of events

involve
1) adding the probabilities of a single event
2) subtracting the probabilities of the intersection of
double events,
3) adding all the probabilities of the intersection of triple
events
4) Etc.
P[ABC] = P[(AB) C]
= P[AB] + P[C] - P[(AB) C]
= P[A] + P[B] - P[AB] + P[C] - P[AC] - P[BC]
= P[A] + P[B] + P[C] - P[AB] - P[AC] - P[BC]
+ P[ABC]
As the number of events increases, the probability of union

of events become very cumbersome to compute
49
Example 12
Corollary 6:
In general, for n events,
Determine the probability of obtaining at least one 1 in

2 tosses of a six-sided dice
n
n
P A P[ Ak ] P[ A j Ak ] ... ()n 1 P[ A1 ... An ]
k 1 k k 1
jk
S {11, 12, ... 16, 21, ... 26, ... 66}

62
36
P[ A j ]
j 1
Corollary 7:
If A B, then P[A] P[B]
B = A (Ac B)
AB
P[B] = P[A] + P[Ac B]
A
B
S
P[A], since P[Ac B] 0
These axioms and corollaries provide us with the rules (or
law) for computing the probability of events
c
50
51
P[11 12 ] P[11 ] P[12 ] - P[11 12 ]

6 / 36 6 / 36 - 1/ 36
11/ 36
52
Example 13
Probability Problems - 1
Determine the probability of obtaining at least one 1 in

3 tosses of a 6-sided dice
S = {111, 112, , 121, , 211, , 666}
= 63 = 216
Probability problems are classified into Discrete or

Continuous
Discrete Sample Space:
finite and countably infinite sample spaces
defined on {S, F}
S = {a1, a2, a3, an}; F = all subsets of S
all events are distinct
all events are mutually exclusive
P[1112 13] = P[11] + P[12] + P[13]

- P[1112] - P[1113] - P[12 13]
+ P[111213]
= 1/6 + 1/6 + 1/6
- 1/36 -1/36 - 1/36 + 1/216
= 91/216
53
P[ B] P[a1] P[a2 ] P[am ] P[ak ]

k 1
The probability of discrete sample space is the

probability of the elementary events, and is called
the probability mass function
If the events are equiprobable, then
1
P[a1] P[ a2 ] P[am ]
n
m
P[ B ]
n
P[ak] = pk is the weight attached to outcome ak , e.g.,

B = {a1, am}
Continuous Sample Space

Sample Space, S, is uncountable
Sample Space, S, is a domain on a line, plane or volume
and events are points within the domain
In other words, it is defined on a measurable region, R
F is a real valued function defined on a region R (such an F
gives rise to the probability density function)
Event of interest consist of experiments on
An interval of a real line
A 2-D region covered by a regular polygon and the
complements, unions, intersections of these events, e.g.,
y
y
x
y
x
Example 14
A voluminous region (3-D)

Areas or Volumes, of the domain A to the Length, Area
or Volume of the entire domain
Find the probability that sum is 8 in the toss of 2 dice

Find the probability of getting a 5, 7, or 8 in the toss of
2 dice.
L( A)
L(S ) , Length
( A)
, Area
P[ A]
(S )
V ( A)
V (S ) , Volume
Solution
Let S = {all possible occurrence}={36}
But a better understanding of the continuous sample

space is through the use of probability distribution and
density functions
(a)
(b)
E ={sum is 8}
F ={sum is 5}
T ={sum is 7}
5
4
6
P E ; P F ; P T ;
36
36
36
4
F
F
F
F
T
T
E
T
E
T
E
6
T
E
T
E
58
Example 16
Two dice are thrown

a) What is the probability that both show even numbers?
b) What is the probability that sum is odd?
Solution
Let S = {all possible occurrence}={36}
O ={sum is odd}
(a) P E 9 distinquishable
36
6
PE
indistinquishable
21
1 2 3 4 5
1
O
O
2 O E O E O
3
O
O
4 O E O E O
5
O
O
6 O E O E O
6
O
E
O
E
O
E
18
P O
36
4 6 5 15

36 36 36 36
Example 15
(b)
1
2
3
4
5
6
P F , T or E P F T E P F P T P E
59
A fair coin is tossed 3 times. What is the probability of the

following:
A = {1st toss is head}
B = {2nd toss is head}
C = {exactly 2 heads are tossed in a row}
Solution
Let 1 = Head; 0 = Tail
4
4
2
P A ; P B ; P C .
8
2
2
P A B ; P B C .
8
8
1
P A B C ;
8
1
2
3
4
5
6
7
8
X
0
0
0
0
1
1
1
1
Y
0
0
1
1
0
0
1
1
Z
0
1
0
1
0
1
0
1
OUTCOMES
B
B
A
A
A
A
B
B
C
C
60
Conditional Probability - 1
In many cases, we have only partial knowledge of outcome of

events
Conditional probability is the situation whereby probability
of one event is influenced by that of another event
We denote this conditional probability by
P[A|B] = Probability of event A given that B has occurred.
Conditional Probability
Theory
We define
The most important questions of life are, for

the most part, really only problems of
probability?
61
P (( A C ) B ) P ( AB CB )
.
P(B)
P( B)
But AB BC , hence P ( AB CB ) P ( AB ) P ( CB ).
P[A|C]
P[C]
P ( AB ) P (CB )
P ( A | B ) P (C | B ),
P( B)
P( B)
satisfying all probability axioms.
P( A C | B)
P[D]
Properties:
BC
P[A|D]
AD
P[B|D]
BD
P AB
P B
1,
P B
P B
in a dice tossing experiment.
P[B|C]
since if B A then occurrence of B implies automatic

occurrence of the event A. As an example, but
A {outcome is even}, B={outcome is 2},
P[A/B] is small
AC
1. If B A, AB = B, then P A|B
Thus the definition of conditional probability is a legitimate

probability measure
P(A) is sometimes called the a priori probability
P(A|B) is sometimes called the a posteriori probability
62
The idea of conditional probability can often be drawn

out in the form of a tree diagram (probability tree)
P( A C | B)
(iii) Suppose A C = , then
provided P(B) 0.
P[A|B] is large
P[ AB ]
,
P[ B ]
Note: Above definition satisfies all probability axioms discussed

earlier. That is,
P AB 0
P[ A | B ]
0,
(i)
P B 0
P[ B ] P[ B ]
P[ | B ]
1,
since B = B.
(ii)
P[ B ]
P[ B ]
- Laplace Pierre Simon

P[ A | B ]
63
64
2. If A B, AB = A, and
P AB
P A
P A|B
P A ,
P B
P B
But AiAj = BAiBAj = , so that we have

n
n
P( B ) P( BAi ) P( B|Ai ) P( Ai )
i 1
(In a dice experiment, A {outcome is 2}, B ={outcome is even},

so that A B. The statement that B has occurred (outcome is
even) makes the odds for outcome is 2 greater than without
that information).
3. We can use conditional probability to express the probability of
a complicated event in terms of simpler related events
Let A1, A2, An be pair wise disjoint and their union is . Thus
n
and AiAj = and
Ai .
i1
Thus B B ( A1 A2 An ) BA1 BA2 BAn .

65
i 1
For 3 events, the conditional probability equation can also be

written as follows
P( A B C ) P( B C )
P( A B C )
P( C )
P( B C ) P( C )
P A|( B C ) P( B | C )P( C )
If in an experiment the events A and B can both occur, then

P[A B] = P[A] P[B|A]
Since events A B and B A are equivalent, it follows that
P[A B] = P[B A] = P[B] P[A|B]
Example 17
Let A and B be events with P[A] = 1/2, P[B] = 1/3 and

P[AB] = 1/4. Find
a) P[A|B],
b) P[B|A],
c) P[AB],
d) P[Ac|Bc],
e) P[Bc|Ac]
Solution
1
3
P[ A B ]
4
a) Find P[A|B] P[ A | B]
P[ B ]
c) Find P[AB]
67
P[ A B ] P[ A] P[ B ] P[ A B ]
1
P[ B A]
1
4
1
2
P[ A]
2
b) Find P[B|A] P[ B | A]
d) Find P[Ac|Bc]
66
Example 17
1 1 1 7

2 3 4 12
P[ Ac | B c ]
P[ Ac B c ]
P[ B c ]
Example 17
But
P[ B c ] 1 P[ B ] 1
Example 18
1 2
3 3
A B c Ac Bc P Ac Bc P A B c
7
5
c
P A B 1 P A B 1
12 12
A test for cancer is 90% effective. That is, 90% of

those with the disease react positively. Also, 5% of
those without disease react positively. If 1% of the
patients have cancer, what is the probability that a
patient who reacts positively has cancer?
e) Find P[Bc|Ac]
P[ B c | Ac ]
5
5
P[ B c Ac ]
12
c
1
6
P[ A ]
2
Example 18
Example 18
Let
C+ = {has Cancer};
C- = {no Cancer};
R = {positive reaction}
Therefore,
P[R|C+] = 0.9; P[R|C-]= 0.05;
P[C+] = 0.01; P[C-] = 0.99
P R | C
P R | C
P S
P S
P C | R) P R
P R | C P C P R | C P C
(0.9)(0.01)
(0.9)(0.01) (0.05)(0.99)
0.154
S R C R C
P C | R) P R
P C | R) P R
P R | C
P C | R) P R
P S
Example 18
P R | C
P R | C ) P R
P S
P R | C ) P R
Independence
P R | C P C P R | C P C
(0.9)(0.01)
(0.9)(0.01) (0.05)(0.99)
If there is a 5050 chance that something can go

wrong, then 9 times out of 10 it will.
0.154
Paul Harvey
Independence - 1
Independence - 2
If the occurrence of an event B does not alter the

occurrence of event A, then A and B are said to be
independent
Definition: A and B are said to be independent if
Suppose A and B are independent, then
P [ AB ] P [ A ] P [ B ]
It is easy to show that if A, B are independent, then
AB ; A , B ; A , B
are all independent pairs.
If A and B are independent, so are A and Bc .

(A, B, independent A, Bc independent)
74
75
P[ A | B ]
P[ A B ]
P[ A ]P[ B ]
P[ A]
P[ B ]
P[ B ]
Thus if A and B are independent, the event that B has

occurred does not shed any more light into the event A.
It makes no difference to A whether B has occurred or not
Three events A, B and C are said to be independent iff
P[A B C] = P[A] P[B] P[C], and
P[A B]
= P[A] P[B], and
= P[A] P[C], and
P[A C]
= P[B] P[C]
P[B C]
All the pairwise intersection must be checked
76
Independence - 3
Example 19
We can express conditional probabilities as follows:
In an experiment, one card is selected from an ordinary

deck of cards. Define event A as select a king, B as
select a jack or queen, and C as select a heart. Is A, B
and C independent?
P[ A B ] P[ A | B ]P[ B ]
P[ A B ] P[ B | A]P[ A ]
P[ B A]
P[ A B ]
,
P[ A]
P[ A]
P [B |A [
P [A |B ]=
P [A ] B a y e s ' T h e o r e m
P [B ]
P[ B | A]
Drawing cards from a deck of 52 card
suit
Diamond
suit
Spade
suit
Heart
suit
10
11
12
13
Club
Jack
King
Queen
Ace
77
Example
Example
For each suit the sample space consist of ace, two, ...,
ten, jack, queen, king and it is indicated as {1, 2, ..., 13}
Let A = {king is drawn}, B = {club is drawn}
Describe the events
a) A B = {either king or club (or both i.e., king of
clubs)}
b) A B = {both king and club (king of clubs)}
c) Since B = {clubs}, Bc, = {not club} = {hearts, diamond,
spade}.
d) Ac Bc = {not king or not club}

e) A-B = {king but not club }. This is the same as
(A Bc) = {king and not club}
f) Ac-Bc ={not king or not club} = {not king and club} =
{any club except king}
g) (A B) (A Bc) = {king and club} or {king and
not club} = {king}
This can be seen by expanding the
(AB) (A Bc) = A
Hence A Bc = {king or hearts or diamond or spade}

Page 79
Page 80
Example
Example
Solution:
P[ A]
Also
4
8
13
; P[ B ] ; P[C ] ;
52
52
52
It is not possible to simultaneously select a King and a

Jack or Queen
This implies that
P[ A B ] 0
A and C are independent as a Pair

B and C are independent as a Pair
But A and B are NOT independent
Therefore
1
2
P[ A C ] ; P[ B C ] ;
52
52
This implies that

P[ A B] 0 P[ A] P[ B]
1
1
P[ A] P[C ] ;
52
52
2
2
P[ B C ]
P[ B] P[C ] ;
52
52
P[ A C ]
32
;
52 52
Thus, A, B and C are NOT independent
What you should learn in this Lecture

Counting Techniques & Markov

Chains
Partition Law
Bayes Rule
Laws of Total Probability
Introduction to Markov Chains
Counting Techniques
Sampling of Different Kinds
1. Sampling with replacement and with ordering
2. Sampling without replacement and with ordering
3. Sampling without replacement and without ordering
4. Sampling with replacement and without ordering
The 50-50-90 rule: Anytime you have a

50-50 chance of getting something right,
there's a 90% probability you'll get it wrong.
Andy Rooney
Binomial Coefficient and Theorem

83
84
Partition - 1
If a region is divided into non-overlapping (mutually

exclusive) parts, the parts are said to partition the
region
A partition of a set B, is a set {B1, B2, ... ,Bn} having the
following properties:
i) Bj B,
j = 1,2, , n
k, j = 1, 2, , n, k j
ii) Bj Bk = ,
iii) B = B1 B2 ... Bn
A partition of a set B is a set of subsets of B [property
i] that are disjoint [property ii] and mutually exhaustive
[property iii]
Partition
(Law of Total Probability)
The true logic of this world is the calculus of

probabilities.
James Clerk Maxwell
85
Partition - 2
86
Partition - 3
Every element of B is a member of one and only one

of the subsets in the partition
In the diagram below, the set {A Bi} partitions A and
from property (ii)
The expression above says that the total probability

of an event can be obtained by summing the set of
mutually exclusive and exhaustive ways of the
event occurring.
But since
P[A B] = P[A|B]P[B]
...
B3
B1
Bn-1
...
B2
Bn
i.e.,
A =A S
= A (B1 B2 ... Bn )
= (A B1) (A B2) ... (A Bn )
k 1
P[A B] = P[B|A]P[A],
we may write probability as follows
P A P A B1 P A B 2 P A B n
P[
or equivalently
A B k ]
P[ A] P[ A | B1]P[B1] P[ A | B2 ]P[ B2 ] P[ A | Bn ]P[ Bn ]
P A Bk
n
P[ A| Bk ]P[ Bk ]
k 1
k 1
87
88
Partition - 4
Example 20
Hence
P[ A] P[ A| Bk ]P[ Bk ]
k 1
This is known as Partition Law or Law of Total Probability
If the events B1, B2, , Bn constitute a partition of the sample

space S such that P[Bk] 0, k=1,2, , n, then for any A of S,
n
k 1
k 1
P[ A] P A Bk P A| Bk P Bk
The probability of one of the events in the partition of

B is given by
P[ A] P[ A| Bk ]P[ Bk ]
89
There are 30% Freshmen, 25% Sophomores, 25% Juniors

and 20% Seniors in the IEEE student organization. 50%,
30%, 10%, and 2% of IEEE members are Freshmen,
Sophomores, Juniors and Seniors respectively are
enrolled in Random Signals. If a member of IEEE is
selected at random, what is the probability that the
member is enrolled in Random Signals?
Let E = selected member is enrolled in Random Signals
E1 = selected member is a freshman
E2 = selected member is a sophomore
E3 = selected member is a junior
E4 = selected member is a senior
Example 20
There are 4 partitions as shown bellow
P E P E | E1 P E1 P E | E2 P E2
P E | E3 P E3 P E | E 4 P E 4
0.50 0.3 0.30 0.25
0.10 0.25 0.02 0.20
0.254
Bayes Rule
Everything should be made as simple as possible,

but not one bit simpler.
- Albert Einstein
92
Bayes Rule - 1
Bayes Rule - 2
Bayes Rule:
If the events B1, B2, , Bn constitute a partition of the sample
space S such that P[Bk] 0, k=1,2, , n, then for any event A in
S such that P[A] 0,
P Bk | A
P A Bk
P A| Bk P Bk
n
P[ A]
P[ A| Bk ]P[ Bk ]
k 1
Now, apply conditional probability theory to both

numerator and denominator
P A | Bk P Bk
P Bk | A
n
P[ A | Bk ]P[ Bk ]
k 1
Proof:
By definition of conditional probability
P A Bk
P Bk | A
PA
and then using partition law or total probability law for the
denominator, we obtain
P Bk | A
P A Bk
k 1
P[ A Bk ]
93
94
Introduction to Markov Chains - 1
CME621 Stochastic Processes

Markov Chains
Our brains are just not wired to do probability
problems very well.
Persi Diaconis
95
Markov chains deal with the sequence of dependent

experiments
The outcome of a given experiment determines which
experiment is performed next
Consider a sequence of experiments X1, X2, , Xn
We interpret Xn as being the state of the experiment at time n,
and we can say that the system is in state x at time n if Xn = xn
Hence we seek the conditional probability
P P X n1 xn 1 | X n xn , X n 1 xn 1,, X 1 x1, X 0 x0 ,
If the structure of the process {Xn, n = 0, 1, 2, ...} is such that
the conditional probability distribution of Xn+1 depends on the
value of Xn and is independent of all previous values, we say
that the process is a Markov Chain
Hence Pij P Xn 1 j| Xn i , i, j 0,1, 2,
96
The sequence of random experiments is said to form a Markov

Chain if each time the system is in state k there is some fixed
probability, say Pij, that it will next move to state k
Since pij are conditional probabilities, they satisfy probability
requirements
P 0,
ij
for all i, j and
P01
P10
PM 0
P11 P1M

PM1 PMM
Knowledge of transition probabilities and the distribution of

X0 enables us to compute all probabilities of interest.
For instance, the joint probability of X0, X1, , Xn is
Pij i = 0,1, 2,
j 0
The values
Pjk P X n jn , X n 1 jn 1,, X 1 j1, X 0 j0 ,
Pij P Xn 1 j| Xn i , i, j 0,1, 2,
P X n jn | X n 1 jn 1,, X 1 j1, X 0 j0 ,
Pjn i jn P X n 1 jn 1,, X 1 j1, X 0 j0 ,
Pjn 1 jn Pjn 2 jn 1 Pj1 j2 P X j0 ,
are called transitional probabilities

It is convenient to arrange the transition probabilities in matrix
form giving rise to the transition matrix
P0 M
P00
97
Example 21
98
Example 21
A sequential experiment involves repeatedly drawing a

ball from one of two Boxes, noting the number on the
ball, and replacing the ball in its Box. Box 0 contains
a ball with the number 1 and two balls with the number
0, and Box 1 contains five balls with the number 1 and
one ball with the number 0. The Box from which the
first draw is made is selected at random by flipping a
fair coin. Box 0 is used if the outcome is heads and
Box 1 if the outcome is tails. Thereafter the box used
in a sub experiment corresponds to the number on the
ball selected in the previous sub experiment.
Solution
The sample space of this experiment consists of sequences
of 0s and 1s.
Each possible sequence corresponds to a path through the
"trellis" diagram shown. The nodes in the diagram denote
the box used in the nth sub experiment, and the labels in the
branches denote the outcome of a sub experiment. Thus the
path 0011 corresponds to the sequence:
99
The coin toss was heads so the first draw was from box 0;
the outcome of the first draw was 0, so the second draw was
from box 0; the outcome of the second draw was 1, so the
third draw was from box 1; and the outcome from the third
draw was 1, so the fourth draw is from box 1.
100
Example 21
Find P[0011] ?
Counting Techniques
But to us, probability is the very guide of life.
Bishop J. Butler
P 0011 P 1 |1 P 1 | 0 P 0 | 0 P 0
5 / 6 1 / 6 2 / 3 1 / 2
101
Counting Techniques - 1
102
Since the probability of an event is the outcome of that

event divided by total number of outcomes, the
calculation of probability sometimes reduces to
counting the number of outcome of an event.
Hence, a technique to count the number of the events
and the number in the sample space for large
experiments is necessary.
Suppose there are n objects in all and we are going to
make k selections, the question is:
To answer this question, we need to know the rules of

the selection
How many different ways can we make the selection?
Are objects similar or not (distinguishable?)

Can objects be chosen more than once and if so, can we
choose with or without replacement?
Are we concerned with ordering?
Answers to these questions, lead to different counting

techniques
We will phrase this random selection (sampling)
process in terms of how:
a) balls can be allocated or drawn from a container
b) cards can be drawn from a deck of cards
103
104
Counting problems are classified into:

1. Sampling with replacement and with ordering
2. Sampling without replacement and with ordering
3. Sampling without replacement and without ordering
4. Sampling with replacement and without ordering
1. Sampling with Replacement and with Ordering
We will use N(S) to denote the total number of

elements
Make k selections from a set A containing n distinct

objects
Let Nk(S) = total number of distinct elements in S = nk
Each of the k selections from the n objects are
independent (i.e., n possible outcomes for each k)
Since ordering is important, experiment produces an
ordered k-tuple ( xk ,xk , ,xk ) where xi A
Hence the probability is Pk 1k
n
105
106
2. Sampling without Replacement and with Ordering
Then the total number of ways (distinct ordered k-tuple) of

performing this operation is, N(S) = n1 n2 nk
N(S) can also be interpreted as follows (for k sets of elements):
the 1st set contain n1 elements,
the 2nd set contain n2 elements,
.
the k-th set contain nk elements
If we arrange the elements such that each arrangement contains
only one element from each set, then an arrangement of this
nature will be obtained
Since no object is chosen more than once, the choice cannot be

made if k > n
This type of sampling is popularly known as PERMUTATION
Permutation: the arrangement of a set of elements into a particular
order, e.g.,
{123} {123, 132, 213, 231, 321, 312}
For large set, it may not be possible to enumerate the ordered
set
Suppose there are
n1 independent ways of doing 1st operation
n2 independent ways of doing 2nd operation
nk independent ways of doing k-th operation

107
a11, a12 , ,
a1n1
a21, a22 , , a1n2
a k1, a k 2 , , a knk
108
Total number of arrangements = N

In general, the number of n distinct elements taking n elements at
a time is called permutation and is denoted by
P(n, n) = n1 n2 nk
This is equivalent to choosing n different elements to fill n
different positions,
n1
=n
choices for the 1st position
= n-1 choices for the 2nd position
n2
= n-(n-1) = 1 choice for the k-th position

nk
Hence
In permutation, we count the selection of ball i followed by ball j as

being different from the selection of ball j followed by ball i; i.e.,
({i,j} {j,i})
Often we are interested in a limited number of the total elements;
i.e., permutation of n objects taking k elements at a time
P( n,n ) n( n 1 )( n 2 )( n n 1 ) n !
P( n, k )
n!
( n k )!
Also written as: n Pk P( n, k )

The number of permutation of n distinct objects arranged in a
circle is (n - 1)!
In the permutation each distinct elements appear only once in

each arrangement
P( n, k ) n ( n 1 )( n 2 )( n k 1 )
st
Total elements in the last experiment
Total elements in the 1
experiment
[ n( n 1 )( n 2 )( n k 1 )]( n k )!
( n k )!
109
110
Stirlings Formula: For large n
Suppose that a club consist of 25 members and that a

President and Secretary are to be chosen from the
membership. How many ways can the positions be filled?
n
n! ~ n
2n or n! ~ 2 nn 1/ 2en
e
n!
1
lim
n
n
n
2n
e
0! 1
ej
P 25, 2
ej
25!
25 24 600
25 2 !
So far we had assumed distinct elements.

When the elements in a set are not distinct, the number of
permutations is affected
In this case, the number of permutations of n elements
taking n at a time, when k1 are of one kind, k2 is of another
kind, km is another kind of counting called Multinomial
Coefficient
111
112
Department of...Telecommunications Engineering
Multinomial Coefficient
Suppose n distinct elements are divided into k
different groups (k 2), for j = 1, , k, the j-th group
contains exactly nj elements where n1 + n2 ++ nk = n
We want to determine the number of ways in which
the n elements can be divided into k groups, i.e,
How many ways can k distinguishable balls be
distributed into n different boxes so that there are ni
balls in box i?
n
n1
n1
n n1
n2
n2
nk 1 nk
n n1 n2
n3
nk 1
n3
nk 1
Hence
n , n ,..., nk
P 1 2
n!
n1 !n2 !...nk !
n
n
n
n
,
,...,
k
1 2
This is the arrangement of elements of more than two or

more distinct types
113
Definition: For any number x1, x2, , xk and any positive integer
n,
m
k k
n!
k
x1 x k n
x11 x22 xmm
i k1 ! k2 !km !
F n I F n I F n n I F n n n I F n I n!
GH k , k , , k JK GH k JK GH k JK GH k JK GH k JK k !, k !, k !
1
This is equivalent to partitioning the n distinct set into m subsets

B1, B2, Bm, such that Bm is assigned km elements satisfying the
condition k1 + k2 + + km = n
That is if the same elements appear more than once in the same
permutation, then interchange of the elements will not produce a
different permutation
For 2, 3, , like elements, divide total number of permutation
by 2!, 3!, .
The multinomial coefficient appears in multinomial theorem which
can be stated as follows:
The number of distinct permutation of n things of which n1

are of one kind, n2 of a second kind, , nk of the k-th kind is
n!
n1 !, n2 !, , nk !
The number of ways of partitioning a set of n objects into r

cells with n1 elements in the first cell, n2 in the second cell,
, nk elements in the k-th cell is
F n I
GH n , n , , n JK
1
114
It can also be written as
nk
nk
nk
n n n1 n n1 n2 nk 1 nk
N ( s)
n1 n2 n3 nk 1
115
n!
n1 !, n2 !, , nr !
116
3. Sampling without Replacement & without Ordering
4. Sampling with Replacement and without Ordering
Same as sampling without replacement and with ordering,

except that the actual order of events is not important
Selection of ball i followed by ball j is same as selection of ball j
followed by ball i ({i,j} ={j,i})
Choosing k objects out of n objects, order not important, without
replacement, amounts to dividing n objects into two categories those that are selected and those that are not selected
To obtain the combinations, we basically divide P(n, n) by the
number of possible arrangements of k objects
This technique is commonly known as combination, which is
defined as follows:
Suppose we choose k objects from n distinct objects

Each time we choose an object, we record that the object
was selected and then replace it
We want to determine how many times an object has been
selected
FG IJ
HK
n
P(n, k )
n!
Ck C(n, k )
n
k
k!
k !(n k )!
117
xxxx
URN
xx
1
xxxxxx
2
...
x
n
# of bars = n-1 (outer bars not counted)
N ( S)
an1 kf! FG n1 kIJ FG n1 kIJ

H k K H n1 K
(n 1)! k !
Experiment involve how many ways to put stars and bars in order
118
Definition of Random Variable - 1

Random Variables (RVs) are functions defined on the

Sample Space (S or ) of a probability space
Consider the experiment of flipping a coin twice!
Outcome of the experiment is S = {HH, HT, TH, TT}
From the sample space, we can identify 16 events as
follows:
{HH}, {HT}, {TH}, {TT}
{HH, HT}, {HH, TH} {HH, TT}, {HT, TH}, {HT, TT},
{TH, TT}
{HH, HT, TH}, {HH, HT, TT} {HH, TH, TT}, {HT, TH,
TT}
{HH, HT, TH, TT} and {}
Random Variable
The degree of understanding a phenomenon is
inversely proportional to the number of variables
used for its description
- Unknown Physicist
119
Federal University 120

of Technology, Minna
We would like to perform several analysis on these

events and their probabilities
However, working with symbols such as H Head
and T Tail is not conducive
Thus, we can associate real numbers to these
events
These quantities of interest (real value functions

defined on the sample space) are known as random
variables
When these random outcomes are mapped (or
transformed) into numerical values, (real numbers) a
random variable is obtained
Often, we are interested in the outcome such as sum

of two dice but not in the separate values on the dice
E.g., we may want to know that sum is 7 but we are
not interested in the actual outcomes such as (1,6),
(2,5), (3,4)

sj
s21
X(t;s i )
s4
s1
Set A
s10
sk
s5
s15
Interval I
P X I = P[ A]
= s1, s2,, s is the set of outcomes
k
s50
Random Variables (RVs) map the outcome of a

random experiment to points on the real line, R
HT
TT
0
A mapping of
S = {HH, HT, TH, TT}
into the real line
R
1
x


set A S maps to I R1
si
TH
HH

Definition:
Suppose that (S, F, P) is a probability space in
which S is not necessarily countable. A Random
Variable, X, defined on this space is a function from
S into the real line such that the set {|X() x} F
for every real x
A Random Variable, X, defined on the probability
space is a function that assigns real value number
X() to every random outcome S
Translated, a Random Variable is a real value
function that associate a real number with each
element in the sample space


Example 25
Note:
The function that assigns value to each outcome is
fixed and deterministic, e.g., number of heads in
three tosses of coin
However, the outcome of the experiment is not
known
No matter how careful a process is run, an
experiment is performed, or a measurement is
taken, there will be variability when the action is
repeated
If the outcome is already a numerical value, then we
can make the assignment X() =


Examples of random variables are:

population of a city or country
time of failure of a machine
stress level in a structure
current or voltage level in electric circuit
gas pressure in a pipeline, etc.
126
Example 26 (21)
A) Toss a coin 3 times; define X = number of heads

S = {HHH, HHT, HTH, THH, HTT, THT, TTH, TTT}
= {HHH, HHT, HTH, THH, HTT, THT, TTH, TTT}
X() =
Thus, X has a range SX = {0, 1, 2, 3}

B) Throw a pair of dice. Let Z = Sum, M = product
= (1, 6) one possible outcome
Thus, Z() = 7 and M() = 6


Example 27 (22)
Toss a coin 10 times. Let X = number of heads

SX = {0, 1, 2, , 10} range of X
= (H, T, T, T, H, H, H, T, H, T)
( one possible outcome)
N(S) = 210 = 1024 and X() = 5
Y
= (number of heads)/10 1/2
Z
= X2
Z() = 25
G
= sin X
G() = sin 5
Cumulative Distribution Function

If there is a 5050 chance that something can go
wrong, then 9 times out of 10 it will.
Paul Harvey

Cumulative Distribution Function - 1

The cumulative distribution function (CDF), or simply

distribution function, of a random variable X is defined
as
Discrete Probability Distribution

A RV is discrete if its set of possible outcome is
countable
A discrete RV assumes each of its values with a
certain probability
In discrete probability, the statement the
probability that the random variable X is equal to x
written as P[X = x], is given a numerical value by
the probability function P
P[X = x] is the P value assigned to the event
{|X() = x}
Continuous
P[ X x],
FX ( x)
P X xk u ( x xk ), Discrete
k
131
132
2. Continuous Probability Distribution
3. Mixed Probability Distribution

A RV is mixed if its set of possible outcome is
partly countable and partly uncountable
A mixed RV assumes some of its values with a
certain probability and some other values with
uncertain probability
FX ( x) P[ X x]
CDF of continuous RV takes on a value in the set (-, x]. The

event of interest is semi-infinite interval on the real line, R
CDF is a probability and satisfies all the axioms and
corollaries of probability!
FX(x)
FX(x)
1
1/2
1/6
Continuous x
pf
Discrete S x 0, 1, 2
Both continuous and discrete RVs shown above have similar

shapes in that they start from zero and build up to 1, from left to
right, always increasing

Properties of CDF - 1
1) 0 FX ( x) 1, - x (from Axiom I and Corollary 2)
2) lim FX ( x) 1 or FX () 1, (from Axiom II)

x
3) lim FX ( x) 0 or FX () 0, (from Corollary 3)

x
Since all real numbers are > -, then {X - } is empty
6) P[a X b] FX (b) FX (a), if a b

All probability questions about X can be answered in terms of
the CDF
7) P[ X b]
4) FX ( x ) is a non-decreasing fucntion FX ( a ) FX (b ), if a < b

FX(x)
R|F (b) F (b-)

S|0,
T
X
if FX (x) is continuous at b
P[a X b] P[a X b] P[a X b] P[a X b]
1
3/4
1/4
1
5) FX ( x ) is continuous from the right, i.e., for any b, and for h > 0
FX (b) = lim FX (b h) FX (b )
If the CDF is continuous at the end points x = a and x = b, then
8) P[ X x] 1 FX ( x)
h0
134
Properties of CDF - 2
If a < b, then FX(a) FX(b)


Summary Properties of CDF
Example 28 (23)
Continuous
P[ X x],
FX ( x)
P X xk u ( x xk ), Discrete
k
1) 0 FX ( x) 1
Compute FX(x) if X = # of heads in 2 tosses of a coin

S = {HH, HT, TH, TT}
2) FX () 1
3) FX ( ) 0
4) FX ( x) is a non-decreasing fucntion F (a ) F (b), if a < b

X
X
5) FX ( x) is a continuous from the right,
i.e., for h>0, FX (b) lim FX (b+h) FX (b + )

h0
6) P[a < X b] = FX (b) FX (a )

7) P[X = b] = FX (b) FX (b )
8) P[X > b] =1 FX ( x)
0,
1
,
4
( x)
3,
4
1,
137
Given that
1 e 2 , x 0
FX x
0,
x0
Determine if the function FX(x) is a valid CDF

Solution
(2) FX 0
(3) FX 1e 1
(4) FX x1 FX ( x2 ), x1 x2
FX x
0 x1 # of heads = 0
1 x 2 # of heads is at least 1, [1,2]
x2
# of heads 2
Example 29 (24)
(5)
# of heads is < 0
Properties 2, 3, 4 and 5 are used to

show that a given function is a valid
CDF
x0
X ( x)
FX x P X x P X x
FX ( x ) P X x P X x
Example 30 (25)
The CDF of a RV X is given by
R| 0
x
F ( x) S
|T161,
4
x0
0 x 2
2 x
Compute P[1/2 < X 3/2 ]
FX(x)
1

3
3
F P[ X ]
2
2
3
1 3 4
1 1 4
1
3
1
P X F F

X 2
X 2 16 2
2
2
16 2
YES, FX(x) is a valid CDF

138

Computing Probabilities using CDF
Example 31 (26)
A particular Random Variable has CDF given by
1) P a X b FX b FX a
1 e x , 0 x
FX x
else
0,
2) P a X b FX b FX a P X a
3) P a X b FX b FX a P X a P X b
4) P a X b FX b FX a P X b
(a)Find the probability that X > 0.5

FX ( x) P X x 1 P X x 1 1 FX ( x)
1
P X 0.5 1 FX 0.5 e 2 0.6065
5) P a X 1 FX a
(b)Find the probability that X 0.25

P X 14 FX
6) P X a 1 P[ X a ] 1 FX a
14 1 e
1
4
0.2212
(c) Find the probability that 0.3 < x 0.7

P 0.3 X 0.7 FX 0.7 FX 0.3
0.2442
141

Probability Density Function - 1

Probability density function (PDF) of a random

variable X denoted by fX(x) is defined as
X
Probability Density Function

(PDF)
fX
Everything should be made as simple as possible,

but not one bit simpler.
- Albert Einstein
dFdx( x ) ,
( x)
d
dx P X xk u x x , p ( x ) x x ,
143
continuous
discrete
PDF, fX(x), measures how likely a random variable is to lie

at a particular value or how fast the CDF is increasing
fX(x) represents the density of probability at some point x
If the derivative of FX(x) exists then fX(x) exist
Derivative of FX(x) does not exist at points where the FX(x)
is not continuous
144
Properties of PDF
1) 0 P[ X xk ]
1) 0 f X ( x)
2)
2) P[ X x k ] 1
f X ( x)dx FX () 1
3) FX ( x)
Properties 1 & 2 are sufficient to determine if a given

function is a valid PDF
Notice that integration in the continuous case is simply
replaced by summation in the discrete case
Discrete PDF:
f X ( x)dx
3) F ( x) P[ X x k ]
4) P[a X b] FX (b) FX (a)

b
b
a
= f X ( x)dx f ( x)dx a fX ( x)dx
4) P[a X b] P[ X x k ]
k a
If we let a = b, we obtain P[ X a] za f X ( x)dx 0

That is, probability that a continuous RV will
assume any fixed value is zero
Hence for a continuous RV,
b
P[ X a] P[ X a] F(a)
f (x)dx
Example 32 (27)
x 0
else
146
Computing Probabilities using PDF
z f xdx
2) P a X b z f xdx
3) P a X b z f xdx
4) P a X b z f xdx
5) P a X z f xdx
1) P a X b
Solution
1
x dx 01 xdx 01 xdx
0
x2
x2
2 1 2 0
1
Yes, fX(x) is a valid PDF
Determine if the pdf function fX(x) is valid

x,
fX x
0,
145

Note
for any real number a, a- < a < a+, with a-, a+
arbitrarily close to a
148
Conditional CDF
Example 33 (28)
Conditional Distribution
From the definition of conditional probability, we obtain the
definition for conditional CDF
For the given pdf below, find P X v

f X ( x ) ce x , x
P[ X x B]
P[ A B]
FX ( x|B) P[ X x| B]
P[ B]
P[ B]
where A is the event {X x}
Solution
First find C
P[ A|B]
1
ce x dx 0 ce x dx
ce x dx
2 0 ce x dx
Properties:
1) 0 F( x| B) 1
c x
2
e
c
2
0
P X v
v e
2
1 e v
2) F(| B) 1
3) F(| B) 0
4) F( x) is non-descreasing F(a| B) F(b|B), if a b
5) F( x) is continuous from the right, i.e., F( x | B) F( x|B), if a b
dx 2 0v e x dx
2
6) P[ x1 X x2 | B] F( x2 | B) F( x1| B), if x1 x2

150
Conditional PDF
Conditional Density
From the definition of conditional probability, we obtain the
definition for conditional CDF & PDF.
f X ( x| B) dFX ( x| B)
dx
Discrete Random Variable
Properties
Discrete Random Variables
1) 0 f ( x|B), for all x

2)
3) F ( x|B)
f ( y| B)dy
4) P[ x1 X x2 ] F ( x2 |B) F ( x1| B)
Bernoulli RV
Binomial RV
Negative Binomial RV
f ( x|B)dx FX () 1
x2
x1
f ( y| B)dy
Poisson RV
Hypergeometric RV
Zeta RV
Discrete RVs are specified by their probability mass

function (pmf)
151
152
Repeated Trials - 1
Bernoulli Trial - 2
Given n experiments 1, 2, , n, and their associated Fi and

Pi, i = 1 n, let
1 2* n
represent their Cartesian product whose elementary events are
the ordered n-tuples 1, 2, , n, where i i.
Events in this combined space are of the form
A1 A2 An
Since the number of occurrences of A in n trials must be an

integer k = 0, 1, 2, , n, either X0 or X1 or X2 or or Xn
must occur in such an experiment. Thus
P ( X 0 X 1 X n ) 1.
If all these n experiments are independent, and Pi(Ai) is the

probability of the event Ai in Fi then as before
But Xi, Xj are mutually exclusive. Thus
**
We will discuss techniques to analyze such problems with an

example.
Bernoulli trial: consists of repeated independent and identical

experiments each of which has only two outcomes A or Ac with
and P(A) = p and P(Ac) = 1-p = q
The probability of exactly k occurrences of A in n such trials is
given by (***).
Let
X k " exactly k occurrence s in n trials" .
where Ai Fi. and their unions an intersections.
P ( A1 A2 An ) P1 ( A1 ) P2 ( A2 ) P ( An ).
P(X 0 X1 X n)
153
Bernoulli Trial - 3
k 0
P(X k)
k p
k 0
q nk .
154
Bernoullis Theorem - 1
Suppose for a given n & p we want to find the most likely value of k?
From Fig. below, the most probable value of k is that number which
maximizes Pn(k).
Let A denote an event whose probability of occurrence in a single trial

is p. If k denotes the # of occurrences of A in n independent trials,
then
k
pq
Pn (k )
n 12,
p
P
n
p 1 / 2.
To obtain this value, consider the ratio

( n k )! k !
Pn ( k 1)
n! p k 1 q n k 1
k
q
.
( n k 1)! ( k 1)! n! p k q n k
Pn ( k )
n k 1 p
Thus
if
or
Pn ( k ) Pn ( k 1 ),
k (1 p ) ( n k 1 ) p
k ( n 1) p .
Thus Pn(k) as a function of k increases until
k ( n 1) p
if it is an integer, or the largest integer kmax less than (n+1)p.
The equation **** represents the most likely number of successes (or heads)
in n trials.
155
Equation above states that the frequency definition of probability of

an event k/n and its axiomatic definition ( p) can be made compatible
to any degree of accuracy.
Proof:
To prove Bernoullis theorem, we need two identities. Note that
with Pn(k) direct computation gives
n 1
n
n
n!
n!
k n k
(
)
p k q n k
p
q
k
P
k
k
n
( n k )! k!
k 1
k 1 ( n k )! ( k 1)!
k 0
n 1
n 1
( n 1)!
n!
p i q n 1i
p i 1q n i 1 np
(
1
)!
!
(
)!
!
n
i
i
n
i
i
i 0
i 0
np ( p q ) n 1 np .
156
Proceeding in a similar manner, it can be shown that

n
n
n
n!
n!
k 2 Pn ( k ) k
p k q n k
p k q n k
(
)!
(
1
)!
(
)!
(
2
)!
n
k
k
n
k
k
k 1
k 2
k 0
Alternatively, the left side of (#*) can be expressed as

n
( k np )
k 0
Pn ( k )
n!
p k q n k n 2 p 2 npq .
k 1 ( n k )! ( k 1)!
n
k 0
n
k 0
2 Pn ( k ) n 2 2 .
k 0
Pn ( k )
k 0
p
P
n
#*
Pn ( k ) 2 np k Pn ( k ) n 2 p 2
k 0
157
Pn ( k ) n 2 2
k np n
Pn ( k )
Pn ( k )
pq

.
n 2
Note that for a given 0, pq / n can be made arbitrarily small

by letting n become large.
Thus for very large n,k we can make the fractional occurrence
(relative frequency) n of the event A as close to the actual
probability p of the event A in a single trial.
158
Some Useful Binomial Identities
Thus the theorem states that the probability of event A

from the axiomatic framework can be computed from
the relative frequency definition quite accurately,
provided the number of experiments are large enough.
Since kmax is the most likely value of k in n trials, from
the above discussion, as n , the plots of Pn(k) tends
to concentrate more and more around kmax.
Note
That the expression
n
( x y )n
k 0
Symmetry
Pascals Triangle
FG nIJ FG n IJ
H kK H n kK
Factorial
FG nIJ n FG n1IJ
H k K k H k 1K
Addition
FG nIJ FG n1IJ FG n1IJ

H k K H k K H k 1K
Product
FG nIJ FG k IJ FG nIJ FG n jIJ

H k K H j K H jK H k j K
Computational Methods
FG nIJ n j 1
H kK
j
FG nIJ FG nIJ FG n j k 1IJ
H kK
H kK
H j K
n k n k
k x y

Is known as Binomial Coefficient (Binomial Theorem)

k np n
n p npq 2 np np n 2 p 2 npq . #**

2
(k np )
Using #* and #**, we get the desired result
We can rewrite the left side of #* as follows
( k np )
Pn ( k )
n 2 2 P k np n .
is equivalent to ( k np ) 2 n 2 2 ,
( k np ) 2 Pn ( k )
( k np )
k np n
which in turn is equivalent to

n
k np n
Note that
k
p
n
(k np )
j 1
159
n 1
n 1
j k 1
j k 1
FG nIJ FG nIJ 1
H 0K H nK
FG nIJ FG n IJ
H r K H n rK
FG nIJ FG n IJ FG n1IJ , 1 r n
H r K H n rK H r K
FG 0IJ
H 0K
FG1IJ
FG1IJ
H 0K
H1K
FG 2IJ
FG 2IJ
FG 2IJ
H 0K
H1 K
H 2K
FG 3IJ
FG 3IJ
FG 3IJ
FG 3IJ
H 0K
H1K
H 2K
H 3K
Each row begins and ends with a
160
Discrete Random Variables - 1
Bernoulli Random Variable

A Bernoulli trail is a probabilistic experiment that can have one
of two outputs classified as either success or failure and in
which the probability of success is p
We refer to p as the Bernoulli probability parameter
It is sometimes referred to as an indicator function of the RV X
For Example, if you roll a die until 6 appears. Let X = number of

rolls. Find the probability mass function of X
5 k
pk ( x) P[ X k ] 1
, k 1, 2,
6 6
Some modification of Bernoulli trial sequences, results to other forms
of well known distributions:
Binomial,
Geometric,
Pascal, and
Negative binomial
These RVs are based on sequences of independent Bernoulli trials
I X ( )
RS1,
T0,
k p
x
x
S X 0, 1
1-p
p
B1,p
Px(1) = P[X = 1] = p, Px(0) = P[X = 0] = 1 - p

The Bernoulli RV corresponds to selecting one item (k=1)
with probability p of success
0,
x 1
FX ( x) 1 p, 0 x 1
p,
x 1
f X ( x)
1,
x 1
1 p, x 0
Binomial RV: = number of successes in n trial

Geometric RV: = number of failures before the first success
Negative binomial RV: = number of failures before the kth success
Pascal RV: integer version of the negative binomial
R|
S|
T
RS
T
FH IK
161
Binomial Random Variable

Consider n experiments, each of which results in success
with probability p or failure with probability 1-p
Let X = number of success
For a sample consisting on n independent selections, with
replacement, the binomial RV, B(n,p), is the number of
successes denoted by
pk ( x) P[ X k ]
FG nIJ p (1 p)
H kK
k
nk
, k 1, 2,
Geometric Random Variable, (G1, p)

Perform an experiment until one success occurs (G1, p)
Given a sequence of independent Bernoulli trails, the
geometric RV is the number of failures before the first
success
If X = the number of trials, then geometric distribution is
given by
pk ( x) P[ X k ] p(1 p) k 1, k 1, 2,
FG nIJ # of different sequences of the n outcomes

H kK leading to k success and n-k failures
Bn, p B1, p B1, p B1, p
2
1-p
1
162
1-p
p
...
1
1-p
p
1-p
p
G1,p
Bn,p
163
164
Negative Binomial Random Variable, (Gn, p)

Perform an experiment until a total of k success occur is
(Gn, p)
It computes the number of failures before the k-th success
If X = the number of trials required, then
Poisson Random Variable

Is used to determine the number of occurrences of an event in
a certain time interval, e.g., rate of growth or decay
A Poisson RV X with parameter taking on one of the values
0, 1, 2, is given by
R|F k 1I p (1 p)
p ( x) P[ X k ] SH r 1K
|T0,
r
k r
, k r, r 1,
1-p
...
p
1-p
p X ( k ) P[ X k ]
1-p
p
Gn,p
165
For large n and moderate
k
n l arg e, p small, = np e
k!
e1 n jk e , e1 n jn k 1, n(n 1)k(n k 1) 1
n
Hence
P[ X k ]
n k
nk
P[ X k ] p (1 p )
k
166
Proof:
n! k
1
n k !k ! n
n
k e
k!
nk
f X ( x) e
k 0
1
n ( n 1) ( n k 1)! k n
k ! k
nk
k
xk
FX ( x) e
1
n
A Poisson RV is a limiting case of the Binomial RV

nk
, k 0,1, 2,...
Items are uniformly scattered

Occurrence of items are independent
Never have two items at same time
It is assumed that
= average number per unit of time
FG k 1IJ pr (1 p)k r for the rth success to occur in k trials,

H r 1K
there must be r-1 in the first trials
FG nIJ p (1 p)
H kK
k!
k 0,1, , r 1
Gn, p G11, p G12, p G1n, p
k 0
167
k!
k
u xk
k!
168
Examples of RV that obey the Poison probability

distribution
Zeta Random Variable (Zipf)
The number of wrong telephone numbers dialed in a day

The number of customers entering a post office on a given
day
The number of radioactive particles discharged in a fixed
interval of time
169
pk ( x) P[ X k ]
where
C
k
, k 1, 2,
LM F 1I OP 1
N H kK Q
1
k 1
Can be used to describe the distribution of family income in

a given country
170
Summary of Discrete Distribution - 1
Hypergeometric Random Variable
1. Bernoulli: X takes the values (0,1), and
Take a random sample of size k from a population of n elements with

a successes and b failures
The number of successes in such a sample is a Hypergeometric RV
Let X = number of successes
a
A) Sampling with replacement will giveX ~ b x; n,
FH
I
a b K
B) Sampling without replacement will give
FG aIJ FG b IJ
X ~ ha x; n, a, bf H xK H n xK
FG a bIJ
HnK
P ( X 0) q,
P ( X 1) p .
2. Binomial: X ~ B(n,p)
P(X k)
n
P ( X k ) p k q n k ,
k
k 0 ,1 , , n .
12
3. Poisson: X ~ P()
P ( X k ) e
Hence the distribution function can be written as
FG aIJ FG b IJ
p ( x) P[ X k ] H xK H n xK , k 0,1,, a
FG a bIJ
HnK
k
k!
, k 0 ,1 , 2 , , .
P(X k)
171
172
Summary of Discrete Distribution - 2

4. Hypergeometric:
P( X k )
m

k

N m
n k
,
N

n

max(0, m n N ) k min( m, n )
5. Geometric: X ~ g(p)
P ( X k ) pq k , k 0 ,1 , 2 , , ,
Continuous Random Variable
q 1 p.
6. Negative Binomial: X ~ NB(r,p)

k 1 r kr
P(X k)
,
p q
r 1
Statistics, likelihoods, and probabilities mean

everything to men, nothing to God.
k r , r 1, .
7. Discrete-Uniform:
P(X k)
Richelle E. Goodrich
1
, k 1, 2 , , N .
N
173
Some Commonly used Random Variables

Uniform Random Variable X ~ U(a, b), a < b
Continuous Random Variables:
An uniform RV is given by
Uniform RV
Gaussian (Normal) RV
Cauchy RV
Rayleigh RV
Nakagami RV
Beta RV
Chi-squared RV
Pareto RV
Exponential RV
Gamma RV
Laplacian RV
Rician RV
Weilbull RV
Log-normal RV
fX
R| 1 ,
( x) S b a
|T0,
R|0,
x a
F ( x) S
,
|T1b, a
Erlang RV
Student F distribution, etc
f X ( x)
a x b
1
ba
otherwise
a
xa
a x b
b x
175
FX ( x)
1
a
Continuous RVs are specified by their probability

density function (pdf)

Exponential Random Variable - 1
Exponential Random Variable - 2
An exponential RV with parameter is given by
R|e(x a) ,
S| 0,
T
R|1 e(x a) ,
F ( x) S
T|0,
f X ( x)
This means that
f X ( x)
xa
xa
P[ X s t | X t ] P[ X s ]
xa
FX ( x)
Proof:
From conditional probability definition, one obtains
xa
P[ X s t | X t ]
This function often arises in practice and is used to describe the

amount of time until some specific event occurs, e.g.,
Amount of time until a phone call is received
Amount of time until an earthquake occur
Models the reliability of electronic components
Exponential RV is the only continuous distribution characterized
with lack of memory (memoryless)
Rayleigh Random Variable
R|
( x a)
1
f ( x) S ( x a)e 2 ,
|T0,
R| (x a)
F ( x) S1 e 2 , x a
|T0,
xa
2
A Rice RV X with parameter , 2 > 0 is describe by the pdf
R| x L x O F I
f ( x) S expMN 2 PQ I H xK ,
T|0,
2
xa
xa
f X ( x)
a=0
The Rayleigh RV with parameter =1 corresponds to the Chisquared with 2 degree of freedom
The square of Rayleigh RV with parameter corresponds to the
exponential RV with parameter 1/(2)
The Rayleigh PDF and CDF are commonly used in
communication
178
Rice (or Rician) Random Variable
A Rayleigh RV X with parameter > 0 is describe by
P[ X s t ]
P[ X t ]
P[ X s ] P[ X t ]
P[ X t ]
P[ X s ]

P[ X s t , X t ]
P[ X t ]
179
x0
x0
where Io(x) = zeroth order modified Bessel function of the 1st

kind
Rice PDF was developed in the 1940s in the study of noise
in communication channels
Its CDF is given by
where
FH a , x IK
b b
FX ( x ) 1 Q
2
2
Q , x exp x I dx
2
180
Nakagami Random Variable
Cauchy Random Variable
A Cauchy RV with parameters and , is described

by
A RV X is said to be Nakagami if its pdf is described by
R| 2 e mj x
S|0, m
T
m
f X ( x)
2m 1
LM
N
exp
OP
Q
m 2
x , x0
x0
f X ( x)
/
x
2
, x
f X ( x)
where m = Nakagami fading parameter

1
Ex
2
E x , m
var( x)
2
2
FX ( x) P[ X x]
1 1
x
tan 1
m = 1 Rayleigh PDF
m = 0.5 One sided Gaussian
Also known as m-distribution
181
Gamma Random Variable, X ~ G(,)
182
Erlang Random Variable
The continuous RV X has Gamma distribution, with parameters

and , if its density function is given by
Is a special case of Gamma RV with parameter = n

(n is a positive integer)
R| x
S|0,
T
f X ( x)
1 x
, x 0, 0, 0
f X ( x)
1 x
f X ( x)
x e , x0

x0
where
x 0 x 1e x dx,
x n 1 k k
FX ( x ) 1e
k! x
k 0
FX ( x ) G
x, 1 Incomplete Gamma Function

Note that
e21j
1
m1 m!, m 1, 2, m m1!
183
184
Laplace Random Variable
Weilbull Random Variable
A Laplace RV X with parameter is described by
The continuous RV X has Weibull distribution, with

parameters and , if its density function is given by
f X ( x) e x 1 , x
2
R|21 e (x a) ,
F ( x) S
|T1 21 e (x a) ,
X
fX
x a
a x
R
|
( x) S
|T0,
FH x a IK
x a 1 exp
R|1 expLF (x a) I O,
MN H K PQ
F ( x) S
|T0,
(x)
, x a, 0, 0
xa
xa
xa
x
185
Beta Random Variable, X ~ B(, )
186
Chi-squared Random Variable, X ~ 2(, )
A Beta RV X with parameter , is described by
The continuous RV X has a chi-squared distribution, with

parameters and , if its density function is given by
fX
a f
R
x
|
( x ) S a f
|T0,
R a f
F ( x) S
T0,
I X , ,
1 x 1 ,
0 x 1
f X ( x)
otherwise
x 1
n 1
x
x 2 exp , x 0
n
2
2n / 2
2
f X (x)
f X ( x)
FX ( x) G n , x
2 2
x 1
0
where n is a +ve integer and G(a,b) is the incomplete gamma

function
Note that
2 (n) G n2 , 2
187
188
Fisher F-Random Variable
Pareto Variable
The F distributed RV X with is described by
Pareto Variable
f X ( x)
21 m n mm / 2 n n / 2 x m / 2 1
e 21 j e 21 j
m n
mx n
mn
2
f X ( x)
This distribution arises in problems of testing

hypothesis in which 2 or more normal distributions are
compared
189
R| 1
S|0, x
T
1 ,
x
otherwise
190
t (student) Random Variable

The student distribution has PDF given by

f X ( x)
F
e2 j H
2
1
x
1 2 1
1
I
K
Gaussian (Normal) Random

Variable
This pdf is commonly used in statistical inference
Based on the law of probability

Everything is possible because
The sheer existence of possibility
Confirms the existence
Of impossibility.
Dejan Stojanovic
fT ( t )
191
192
Gaussian (Normal) RV - 1
Standard Normal Distribution: = 0 and = 1
Normal distribution - If a continuous random

variable has distribution that is symmetric and
bell-shaped we call it a normal distribution
99.7% of data are within 3 standard deviations of the mean
95% within
2 standard deviations
Curve is bell shaped

and symmetric
68% within
1 standard deviation
34%
Score
34%
2.4%
2.4%
0.1%
0.1%
13.5%
-3
193
+2
+3
194
Once & are specified, Gaussian curve is uniquely determined

The Gaussian PDF is symmetric about x =
A RV X with probability density function (PDF)

( x )2
exp
, x , 0
2 2
2 2
1
is said to be a Gaussian or Normal density function

It is commonly denoted as N(,2), where
2 2
X ~ N , 2
3
1
2
b g
0607
. a
2
Gaussian curves with 1 2 and 1 2
f X ( x)
= mean (average) value, 0,

= standard deviation, and
2 = variance
-2
f X ( x)
13.5%
x
195
1 2
2
x
It is the most important of all densities and models more different random
occurrences than any other PDF
The most widely used model of noise in communication systems
196
Gaussian (Normal) RV 6
Characteristics of the Normal Curve

The curve is bell-shaped and symmetrical.
The mean, median, and mode are all equal.
The highest frequency is in the middle of the curve.
The frequency gradually tapers off as the scores approach
the ends of the curve.
The curve approaches, but never meets, the abscissa at
both high and low ends.
It is so important that it is the only density in the world

to earn a place in a banknote (a German Banknote)
Importance of Gaussian PDF stems from the central limit theorem

which states that the sum of RVs (or average of the sum) of almost
any type of RV approaches Gaussian density as n
Gaussian density is encountered in all areas of engineering and
science
197
198
From definition, the Gaussian distribution (CDF) is

given by
The tabulated function is normalized (standardized) Gaussian

RV denoted by N(0,1)
That is, a standard Normal RV has zero mean and unit
variance
A standard Normal RV have zero mean ( = 0) and unit
variance (2 = 1)
FX ( x) P[ X x]
x

(t ) 2
exp
dt
2
2
2
1
Standard Normal Distribution
This integral cannot be evaluated in closed form

However, because of its importance, FX(x) for the
Gaussian RV have been tabulated by means of
numerical integration and approximation techniques
=1
=0
0
199
200
Hence,
X ~ N ( , 2 )
( x)
X ~ N (0, )
2
X ~ N (0,1)
1
2
F t I dt
H 2K
exp
(x)
( x)
Let y t s tan dardization, dy 1 dt dt dy
LM
N
201
Hence
OP F I
Q HK
x 2
a
exp
dx
2 2
2 2
1
202
2
a exp x dx
2 2
2 2
x
Q( x)
Q( x) 1 x
Q( x) 1 Q x
Q(0) 1
2
exp
y
( dy )
2
2
y2
1
a
exp
dy Q
a
2

2
1
2
z FH y2 IK dy
exp
x
Q function
In many cases, the probability of error in communication system is given

directly in Q(x)
Q(x) is often referred to as the upper tail of the Gaussian density fn.
Q( x) 1 FX ( x)
FX (a ) P[ X a ]
FX (a )
FX (a) P[ X a]
X a
a
P
F x I is the CDF of standard Normal RV

HK
x 1 x
1
2
exp y dy
2
2
x
y2
x
1
exp dy

2
2
Also
FX ( x)
203
If the value of Q(k) is given, the value of a can be determined

from the Q-function table directly, e.g,
Q(k) = 0.2005 k = 0.84
Sometimes, linear interpolation may be necessary
E.g., if Q(k) = 0.02, then value of k lies between 2.05 & 2.06.
Q(2.05) = 0.02018, Q(2.06) = 0.01970
Hence, by interpolation, we obtain
k 2.05
F 0.020180.02 I2.062.05 2.054

H 0.020180.01970K
204
Calculating probability with the Q-function (area under

the curve)
Some important Properties:
F a I
HK
The sum of n independent normal RVs is a normal RV with

mean n and variance n2
n
2
2
N ( , ) ~ N n , n
2) P X a 1 P X a 1 Q
i 1
Any fixed linear transformation of a Gaussian RV is also a

Gaussian RV
a bN (, 2 ) ~ N a , b 2
3) P a X b FX b FX a P X a P X b
or
a
b
Q
P a X b Q
F I F I
HK HK
a I
4) P X a 1 P X a 1 QF
HK
The sum of squares of -independent unit Gaussian RV, N(0,1), is a chi-squared

RV (central type) with degrees of freedom
2
N (0,1) ~
i 1
The ratio of two independent unit Gaussian RV, N(0,1), is the standard Cauchy
The sample mean of n-independent and identically distributed RV each with
mean m and variance 2, tend to be Gaussian distributed with mean m and
variance 2/n, as n 0
205
Lognormal RV
Example 35 (29a)
Lognormal Random Variable
If X is a normal RV with parameter =3 and 2 =9,

find
A lognormal distribution if the RV Y = ln(X) has a normal

distribution with mean a and standard deviation .
The resulting density function of X is given by
a) P[2 < V<5],

b) P[X > 0],
c) P[|X-3| > 6]
R| 2
L ln(x a) b OP , x a
f ( x) S 2 x a expMN
Q xa
2
|T0,
R| 1 L ln(x a) b O z x expF z I dz, x a
PQ H 2 K
F ( x) S 2 MN
2
|T0,
xa
2
207

206
Example 35.. (29a)
Example 35 (29a)
(a)
2 3 X 3 5 3
1 X 3 2
P 2 X 5 P
3
3
3
3
3
3
1
2
P z
3
3
Q x 1 x
2
1

3
3
2
1
2
1
1 1
3
3
3
3
b) P X 0 P X 3 0 3 P z 1
3
3
1 1 1 1 1 1
0.8413
c) P X 3 6 P X 3 6 P X 3 6
P X 3 P X 9
X 3 3 3
X 3 9 3
P
3
3
3
3
P z 2 P z 2
2 1 2
1 2 1 2 2 1 2
2 1 0.9772 0.0456
From Table, we obtain

2
1
P 2 X 5 1
3
3
0.7486 0.6293 1 03779


Example 36 (29b)
The velocity V of the wind at a certain location is a

normal RV with = 2 and = 5.
Determine P[-3 V 8].
3 2 v 2 8 2
P -3 v 8 P
5
5
5
v2
6
P 1 z
1 Q 1 Q 1
5
Q 1 Q
5
1 Q 1 Q 1.2
1 0.1587 0.1151 0.7262
Statistical Properties of
Random Variables
Rowes Rule: the odds are six to five that the light
at the end of the tunnel is the headlight of an
oncoming train.
Paul Dickson
211
212
Statistical Properties of RV
Expectation of a Random Variable - 1
Some statistical characteristics or parameters that are

used to describe the behavior of random variables
These properties convey the information about the
shape of the function, the symmetric point (or center
point), the variation from this point, etc.
Knowing some of the properties, the behavior of a RV
can uniquely be determined
Some of these properties include the Mean, Variance,
Characteristic Function, etc.
For example, the mean and variance are universally
used to represent the overall properties of the RV and
its PDF
The Expected Value of a RV, X, is defined as
213
- xf X ( x)dx, continuous
x m x X E[ X ] n
xk p ( xk ), discrete
k =1
E[X] is also known as the

Mean,
Average Value,
First Moment
E[X] is probably the most important concept in probability
theory and Random Processes - a must know concept
The concept of expectation is analogous to the physical
concept of the center of gravity of a distribution
Expectation of a RV - 2
214
To explain this concept, consider 2 figures shown

below:
If the density of the rod at each point is equal to f(x),

then the center of gravity will be located at point E[X]
and the object will be balanced if supported at that
point
E[X] will exist iff
f ( x4 )
f (x3)
f ( x5)
f ( x2 )
f ( x1)
EX
EX
x1
x2
x3
(a)
x4
x5
(b)
For figure (a), the x-axis may be considered as a long

weightless rod to which weights are attached
If weights equal to f(xj) are attached to this rod at
each point xj, then the rod will be balanced, iff it is
supported at point E[X]
For figure (b), the x-axis may be regarded as a long
rod over which the mass varies continuously
215
xf X ( x ) dx x f X ( x) dx
i.e., only when the integral converges absolutely
In general, if fX(x) has one peak @ X = x1 and is
symmetric about x1, then E[X] = x1, else the mean
value do not necessarily lie @ X = x1
Note that the notation E[X] is not a function of X
216
Expectation of a Function
Properties of Expectation (Must Know)
Given a function of a RV X, Y = g(x), we want to

compute the mean E[X]
1) E c c, c isaconstant
E Y E g ( x) ?
2) E cX cE X
3) E X c E X c
First find the PDF of Y and then use the definition to find
E[Y], or
Calculate the expectation directly using the definition as in
4) E X Y E X E Y
5) E X E Y , if P X Y 1
6) E[ X ] E X
- g ( x) f X ( x)dx
g ( x) E[ g ( x)]
g ( xk ) P X xk
k
7) E X 1 X 2 X N E[ N ] E X
If Xi, i = 1, , N are independent and identically

distributed (iid)
217
218
Example 38 (30b)
Example 37 (30a)
A RV X is uniformly distributed in the interval [a,b],

what is the expected value E[X]?
b) A RV X is uniformly distributed in the interval [0, 10],

what is the expected value E[X]?
E X x
xf X ( x) dx
2
1 x
ba 2
1 b
xdx
ba a
fX x
0,
1 1
1 1
b 2 a 2
2 b a b a b a
2 ba
E X
3
1 b 2
1 x
x f X ( x) dx
x
dx
ba a
ba 3
10
1
dx
10
010 x
x2
5
20 0
10
else
1
10
E X
xf X x dx
1 1
b3 a 3
3 ba
0 x 10
b a
1 1
1 1
b 2 a 2
b a b a
2 ba
2 ba
2
Also
fX(x)
101 ,
219
220
Example 39 (31)
Example 40 (32)
Find the mean of a RV with PDF given by
A discrete RV X has Xk = k2, k = 1,2,,5,

which occur with probability 0.4, 0.25, 0.15, 0.1,
and 0.1, respectively. Find E[X].
Kebx , x 0
f X ( x)
x 0
0,
First we compute the value of K
bx
f X ( x)dx 0 Ke dx 1
K
e bx 1 K b
0
b
Solution
5
E X x xk p xk
Then we compute the mean of the random variable X
k 1
E X 0 xbe bx dx
xb e
bx
0
xb e bx
(1) 2 (0.4) (2) 2 (0.25)
0 e bx dx
1
b
e bx
(3)2 (0.15) (4) 2 (0.1) (5) 2 (0.1)

1
b
6.85
221
Mean Squared Value
N-th Moment / N-th Central Moment
The 2nd moment or mean square value is defined as
The nth moment is given by
m X
2
X
- x 2 f X ( x)dx, continuous
X E[ X ] n 2
discrete
xk p ( xk ),
k=1
2
222
E X n X n - x n f ( x )dx
The n-th central moment is given by
This is analogous to the power of a signal
E[ X x ] X x
- X x
The RMS value is the square root of the mean square

value
f ( x)dx
X
X RMS E[ X 2 ]
223
224
Example 41 (33)
Example 41 (33)
a) A RV X is uniformly distributed in the interval [a, b],

what is the mean square value E[X2]?
f X ( x ) e x , x 0
We can also use the formula

x m e ax dx
Solution:
we evaluate the integral shown below
2
E X 2
x f X ( x)dx
E X
3 b
1 b 2
1 x
a x dx
ba
ba 3
1 1
b3 a 3
3ba
x m e ax m m 1 ax
x e dx
a
a
x 2e x
2 x
E X
0 xe dx
0
2
2 x
0 x e dx
2 x 2 e x
1 x
e dx
0 0
225
Variance of a Random Variable - 1
Recall that the n-th central moment is given by
E[ X x ] X x
n
X x f X ( x)dx
n
Special Cases:
when n = 1, the first central moment is zero
when n = 2, the 2nd central moment is called the variance, i.e., the
variance is the second central moment
A small value of the variance indicates that the probability density is

tightly concentrated around the mean and vice versa
The variance is the moment of inertia about the center of mass
Note that
var[ X ] X2 X x E[ X x ]
2
E X 2 xE[ X ] xE[ X ] E[ xx ]
X2 X X X X X X
x2 var X E[(X x ) 2 ]
X2 X X
- (X x ) 2 f X ( x) dx
continuous

2
g (X k x ) P X xk discrete
k
E[ X ] E[ X ]
2
Standard deviation:
The variance provides a measure of the spread or

dispersion of the density around the mean
E X 2 Xx Xx xx
The variance of a random variable X is given as
226
Variance of a Random Variable - 2
2 1 x
2
e 2
var[ x]
227
228
Properties of Variance - 1
Properties of Variance - 2
Proof:
Suppose that n = 2,
E[X1] = 1, E[X2]= 2, then E[X1+X2] = 1 + 2
Let a and b be constants

1) Var[a] 0
2) Var aX b a Var X
2
var X 1 X 2 E[ X 1 X 2 1 2 ]
2
If E X , then E aX b a b
E[ X 1 1 X 2 2 2 X 1 1 X 2 2 ]
2
E[ X 1 1 ] E[ X 2 2 ] 2 E[ X 1 1 X 2 2 ]
2
Var aX b E aX b a b
If X1 and X2 are independent, then
It follows that Var[aX] = a2var[X]
E[ X 1 1 X 2 2 ] E[ X 1 1 ]E[ X 2 2 ] ( 1 1 )( 2 2 ) 0
a f
3) Var X Y Var X Var Y 2 E[ X x Y y ]
Hence
var X 1 X 2 var[ X 1] var[ X 2 ]
229
Example 42 (34)
A RV X is uniformly distributed in the interval [a,b],

what is the Variance of the RV?
b a
E X 2
1 1
b3 a 3
3 ba
The variance is
ba
Vax X E x 2 E x
b a 1 1 b3 a3 2
3 b a
1 b
ab 2
dx
a x
2
b 1
3
1 1 b a a b
3 b a 8
8
You can also use the brute force method shown below
2 E x 2 E x
1 1
ab
x
3 b a
2
3b
1 b
ab
2
2 Var X E x x
a x
dx
The variance can be computer as
230
Example 42 (34)
EX x
var[ X 1 ] var[ X 2 ] 2 E[ X 1 1 X 2 2 ]
E aX a 2 a2 E X 2 a2Var X
3
1 1 b a
3 b a 4
231
b a
12
232
Example 43 (35)
Example 43... (35)
b) Then we compute the variance of the random variable X
Find the variance of a RV with PDF given by
2
2
X2 E X x 0 X x f X ( x)dx
Kebx , x 0
f X ( x)
x0
0,
0 X b1 be bx dx
2
b 0 x 2 e bx dx b2 0 xe bx dx b12 0 e bx dx
b xb e bx 2b2x e bx b23 e bx
0
2
0
b
2
b2
xe
bx
b13 e bx b b13 e bx
0
0
b b23 b23 b13 b12

233
234
Functions that Give Moments

Functions that Give Moments

Medicine is a science of uncertainty and an
art of probability.
William Osler
235
Because of the importance of the n-moments (n-th order

expected value), several other techniques can be used to
evaluate them
These techniques are widely used in determining the moments
of important distributions for large value of n
These alternative procedures exist for determining the
moments of random variables especially when n > 2
These procedures or functions are:
Characteristic Function
Moment Generating Function
Probability Generating Function
Laplace Transform
These transforms are handy when computing the statistical
behavior of sums of large random variables
236
Characteristic Function - 1
Characteristic Function (CF) of a random variable X is

given by E[ejX] and is denoted by X(), such that
If X() is known, then fx(x) can be found from the

inverse FT with sign of reversed
X ( ) E e j X
f X ( x)
- e j X f X ( x)dx, continuous
j X
e k p X ( xk ), discrete
k
X ( )e
jX
237
z x(t)e j2ft dt
z X ( f )e j2ft df
x(t ) F 1 X ( f )
CF is especially useful in evaluation the moments of RVs when

n>2
Consider the following,
E X
xf X ( x)dx
z
z
E X
2
E X
x f X ( x)dx
x 3 f X ( x)dx
E X
n
x n f X ( x)dx
238
Now consider the derivatives of the CF, X(), evaluated at = 0
It can be seen that
X ( )
jxf X ( x)e j X dx 0
d
0
E X
E X
2
2
d2
( )
jx f X ( x)e j X dx 0
d 2 X
0
Hence,
j 2 x 2 f X ( x )dx j E X 2
LM
N
OP
Q
1 d
X ( )
j d
0
jxf X ( x )dx jE X
X ( f ) F x(t )
f X ( x) X ( )
The characteristic function will exist only if the integral or

the sum specified above converges
X() can be interpreted as the expectation of a function of
X, denoted as Y = ejX, with unspecified
X() can also be interpreted as the Fourier Transform (FT)
of the PDF fx(x) of the random variable X with the sign of
reversed
1
2
LM
N
LM
N
OP
Q
1 d
2
2 X ( )
j d
0
OP
Q
d
E X n 1n
X ( )
n
0
j d
n
dn
( )
jx f X ( x)e j X dx 0
d n X
0
This implies that if we know the CF of a RV, we can easily find

the n-th moment of the RV.
The Characteristic Function of a random variable always exist
j n x n f X ( x )dx j E X n
239
240
Moment Generating Function - 1
Example 37
Find the characteristic function of the exponential RV

with PDF given by
The Moment Generating Function (MGF) of a RV X is given by

E[etX] and is denoted as MX(t). Hence
|Rz e
S
|T e
e x , x 0
f X ( x)
x0
0,
M X (t ) E e
tX
-
n
tX
tX k
k 1
Hence,
discrete
t 2 E[ X 2 ] t 3 E[ X 3 ]
2!
3!
M X (0) 1
M (0) E[ X ]
n
X
P X xk
Expanding the exponential as a power series and taking the

expectation implies that
M (t ) 1 tE[ X ]
f X ( x)dx, t 0 continuous
241
Moment Generating Function - 2
X M X (0) M 1X (0)
2
242
Example 38
MGF is the same as the characteristic function with the j-term in

the exponent removed
MGF is used more often - since CF is related to Fourier
Transform
MGF may not always exist, e.g., find the MGF of f(x) = 2/x3
Like the Characteristic Function we find that
LM d
Ndt
E X
n
OP
Qt 0
M X (t )
Property:
Y aX b MY (t ) ebt M X at
M Y (t ) E e yt E et aX b E ebt e atX ebt E e(at ) X
ebt M X at
243
244
Probability Generating Function - 1

Probability Generating Function - 2

d2
E X 2 2 GX ( z )
P X x x( x 1) z x 2
dz
z 1 x 0
z 1
The Probability Generating Function (PGF) defined for

nonnegative discrete random variable X is given by
GX (z) E z X pX ( xk )z x
x( x 1) P X x E[ x( x 1)] E[ X 2 ]
x0
x 0
The PGF is essentially the z-transform of a RV X with the z

replaced by z-1.
If we know the PGF, we can find the probability mass function
k
pX (k) P[ X k] 1 d k GX (z)
z 0
k ! dz
dn
E X n GX ( z )
E[ x( x 1) ( x n 1)]
dz
z 1
n
This is sometimes called the factorial moments

We can also compute the variance using the PGF as
follows
The PGF can also be used to compute moments
P X k zk 1
E X GX ( z )
dz
z 1 k 0
z 1
xP X x E[ X ]
x 0
245
LM
N
OP
Q
d2
d
d
GX ( z ) GX ( z )
GX ( z )
dz 2
dz
dz
z 1
z 1
z 1
246
Laplace Transform
Example 46
Laplace Transform (LT) of a positive RV X with PDF fX(x) is
LX ( s ) E e sX 0 e sX f X ( x)dx
where s is a complex number with positive real part

The Inverse Laplace Transform (ILT) can be obtained as follows
f X ( x) 1 cc jj L X (s)e sX ds
j2
We can compute the moments of a RV from the LT
LM
N
E X (1)
n
dn
dz
L X (s)
OP
Q
s0
It is also possible to invert the above equation to get

E[ X n ] n
s
n!
n0
This means that the LT and fX(x) can be computed in principle from
the knowledge of the moments
LX (s)
247
248
Example 47
Tail Inequalities
It is always better to be approximately right, than
precisely wrong.
- Unknown Engineer
249
Tail Inequalities - 1
250
Probabilities of the form P[X k] and P[|X| k] are

known as Tail Probabilities
Sometimes we want to estimate (upper bound) of
these probabilities without actually evaluating them
The following 3 bounds provide us with various
estimates of the Tail Probabilities
1. Markov Inequality
2. Chebyshevs Inequality
3. Chernoff Inequality
1. Markov Inequality:
If X is a RV that takes nonnegative values, then for any value k
>0
PX k
Proof:
E X
k
I st order bound
E X 0 xf X ( x)dx 0 xf X ( x)dx b xf X ( x)dx
b xf X ( x)dx b kf X ( x) dx
kP[ X k ]
Hence,
E X P X k
This simple inequality is surprisingly useful and various

other well known inequalities are derived from it
251
252
Since (X- )2 > 0, we can apply Markov Inequality such that
2. Chebyshevs Inequality:
Chebyshev Inequality (CI) gives a conservative estimate of
the probability that a random variable X assumes a value
within standard deviation of its mean,
Let X be a RV with mean and variance 2. Then for any
value k > 0 at most 1/k2 of the probability is distributed
outside the interval -k2 < X < +k2 . That is
P X k
2nd order bound
Proof:
Chebyshev Inequality is a consequence of the Markov
Inequality
P X k2
2
a f
E X
k2
But since (X- )2 k2 iff |X- | > k, then

P X k
E X
2
k
2
k
That is, if we pick a value of a RV arbitrarily, we can state the

min probability that the random value falls within a given limit
The significance of CI is that it emphasizes the general
importance of the standard deviation of a RV
Sometimes the following forms of CI are used
2
P X k 1 2
253
or P X k 1
2
or
P X k 1
254
3. Chernoff Inequality:
If X is a RV and for any value k > 0
k2
CI holds for all distribution positive or negative and cannot be

improved
It provides intuition about the meaning of the variance of a
RV
This is because it shows that wide diversions from the mean
E[X] are unlikely if the variance 2 is small, e.g., let var[X] =
2 and k = n
P X n 2 12
n
n
2
P e k
tX
M X t
k
where MX(t) is the Moment Generating Function

That is, Chernoff bound requires the knowledge of the MGF
R|P X k ekt M
S| P X k ekt M
T
(t ), k E[ X ]
(t ), k E[ X ]
Although that Chebyshev Inequality is correct, the upper

bound is not tight; i.e., it usually different from actual value
Chernoff Inequality is a much tighter bound than Chebyshev

Inequality but more complex
Thus, we expect Chernoff bound to be tighter Markov bound
It applies to any RV whether positive or not
255
256
Laws of Large Numbers - 1
Two laws of large numbers deal with the behavior of n as n

becomes arbitrarily large
Var[n] 0 as n suggest that PDF of n becomes
narrower and narrower and approaches delta function
Strong law of Large Numbers (SLLN)
Consider a sequence of independent and identically distributed
(iid) RVs, X1, X2, , XN, each with mean
Then for > 0
or
P lim n 0
n
P lim n 1
n
This means that n as n

SLLN is the basis for justifying simulations and analysis of all
experimental results
257
Weak Law of Large Numbers (WLLN)

Let X1, X2, , XN, be a sequence of iid RVs, each with mean
Then for > 0
lim P n 1
n
Since is arbitrary, in the limit, the density of n

WLLN is an easy consequence of the Chebyshev inequality
Central Limit Theorem (CLT)
The CLT is one of the most remarkable results in probability
theory
It is concerned with the PDF of the sum of independent RVs
It states that the sum of large number of independent RVs (any
distribution) has a distribution that is approximately Gaussian
under certain conditions
Let X1, X2, , XN, be a sequence of iid RVs, each with finite
mean and finite variance 2
Let Sn = X1 + X2+ + Xn, n > 1, and let Zn be a sequence of
unit variance, zero mean RVs, defined as
This concept is very important in Engineering, for

example:
Z Sn n
n
n
Then,
x2
1 z 2
dx
lim P Z n z N 0,1
e
n
2
That is, for all n, E[Zn] = 0, Var[Zn] = 1

Hence even as n the mean and variance of Zn will not change
In other words, the CDF of the normalized sum approaches a
Gaussian CDF no matter what the distribution of the component RVs
258
259
Electrical noise is often the result of superposition of

voltages due to large number of charge carriers
Turbulent boundary-layer pressure variations on an aircraft
skin are the result of superposition of minute pressures due
to numerous eddies
Random errors in experimental measurements are due to
many irregularities
In all these cases, Gaussian approximation is valid
260
Transformation of a Random Variable - 1

Frequently, one encounters the need to derive the probability

distribution of one or more RVs.
X
fX(x)
g(.)
The laws of probability, so true in general,

so fallacious in particular.
g: X Y
261
Transformation of a Random Variable - 2
SX
(S, F, P)
Edward Gibbon
If input X is a RV, output Y is

also a RV
Suppose that the CDF or PDF of one RV X is given, we wish to

compute the CDF or PDF of another RV Y = g(X), where g is a
function
Transformation of a
Random Variable
Y
fY(y)
(A, E(A), PX)
SY
(S, F, P)
(B, E(B), PY)
262
Transform of Distribution Function - 1
Y is induced by X such that Y = g(X), where g(.) is a real valued

function
Given CDF of X, we want to find the CDF of a related RV

Y = g(X)
(A, E(A), PX)
FY ( y) P[Y y]
P[ g( x) y] P[ X g 1 ( y)]
1
FX g ( y )
(B, E(B), PY)
In general, we call the above black box transformation or

data processing
Transformation may be classified as memoryless or with
memory. Only memoryless cases are treated in this class
If input X is a RV, output Y is also a RV
The basic idea here is to relate the event A = {Y y} to an
equivalent event that involves X, B = {X g-1(y)}
Y g( x)
Hence
Steps:
1) Solve for x in the given equation in terms of y
2) Substitute into the above equation
X g ( y)
1
FY ( y) FX g ( y)
In general g(.) may not always be inevitable

263
264

Some Important CDF Transforms

1. Linear Transformation: Y = aX + b
a, b are constant. We know the CDF of X
y dy
y
Case 1: a > 0
X

Y aX b
y b
a
y b
y b
F a yf P L X
MN a OPQ F FH a IK
Y
R|F FH y bIK ,
a
F a yf S
F
|T1 F H yabIK ,
X
Case 2: a < 0
a0
Yy
y b
X
a
x y b
a
a f LNM
OP LM
Q N
F
IK
1 F H
FY y P X
X
y b
y b
P
X
a
a
y b
a
OP
Q
a0
y b
a
Yy
265

266
Transformation of Density Function - 1

2. Square Function: Y = X2
X
fX(x)
X y
g(x)
Y
fY(y)
Given PDF of one X, we want to find the PDF of a related RV Y. In general
Case 1: y 0
n
f X ( xk )
f Y ( y) d
k 1
g ( x)
af
FY y P y x y FX
dx
b y g F b y g
where xk, k = 1, 2, , n are real roots of the equation y = g(x) in terms of y

For a one-to-one transformation,
Case 2: y < 0
There is no value of X for which x2 <y. Hence FY y P 0
af
RF b y g F b y g,
F a yf S
T0,
X
a 0
x x dx
yx
a 0
dx
fY ( y) f X ( x) f X ( x) f X ( x)
d
dy
dy
g( x)
dx
y 0
y 0
dx
Steps:
1) Given y = g(x), solve for x in the given equation in terms of y
2) Find d
dy
dx
g ( x)
dx
3) Substitute into the formula and simplify

267
268
Some Important PDF Transformations

1) Linear Transformation: Y = aX + b
(a, b are constant and we know the PDF of X)
yb
a
dy
d
aX b a
dx
dx
X
f Y ( y)
1 y
f
,
f
f y 2 ay X a X a
Y
0,
FH IK
y
a
x y
x y
269
xo = cos ( y), 0 x
1
y
1
af
2
1
cos ( y)
2 cos 1 ( y)
af
d 2 cos 1 y
f X 2 cos 1( y)
dy
1
f X cos 1( y) fX 2 cos 1( y)
1 y 2
f X ( x) 1
d cos 1 y
fY ( y) f X cos 1( y)
dy
LM 1 1 OP
N2 2 Q
R|0,
sin ( y)
F ( y) S 1
,
2
|T1,
1
y 1
1 y 1
y 1
The same density transformation holds for sine function

The cosine and sine RV has an arcsine distribution function
For an interval of (-, ), the sine or cosine will have
infinitely many solutions, e.g.,
Y a sin x
1 y2
1
1 , 1 y 1
2
1 y 2
1 y
By integration
-1
270

fY ( y)
x1 = 2 cos1( y), x 2
2
1
1
2 2
5) Cosine Function: Y = cos(X)

X is a RV uniform in the interval [0, 2)
For -1<y<1, g(x) = cos(x) has two solutions:
g
g
I
K
1
fY ( y) 1 f X ln y
a
y
b
b
F
H
dx
1
1
ln y ax x ln y
a
dy ay
FG JI
HK
a
fY ( y) a2 f X
y
y
1
a
dy
a
g' ( x )
2 y2
y
dx
x
a
4) Exponent Function: Y = exp(aX)
dy d 2
y
aX 2 ax 2 a
2 ay ,
dx dx
a
1
dx
dy 2 ay
y0
3) Ratio Function: Y = a/X
y b
1
fX
a
a
2) Square Transformation: Y = aX2, a > 0

x
y0
271
x0
x1
x2
x3
x4
272

y
Y asin x x sin1F I
H aK
dy a cos x a y
an f
dx
fY ( y)
1
2
a y
Multiple Random Variables

(2 Random Variables)
f X ( xn ), y a
6) Tangent Function: Y = tan(X)

xn tan 1 y dy 1 1 y2
dx cos2 x
All knowledge degenerates into probability.
fY ( y) 1 2 f X ( xn )
1 y n
David Hume
273
Two Random Variables - 1
When there are more than one RV, we talk about joint
events from the same sample space
Any ordered pair of numbers (x, y) can be considered
as a point in the xy plane
In diagram below, notice that event A B defined in

sample space corresponds to the joint event {X x}
and {Y y}
A X x
Y
SJ
Y
S
S1
X(s2), Y(s1)
Y
S2
SJ
k p
X x Y y X x, Y y
A
A B
Let A = {X x} and B = {Y y}
Events A and B refer to the sample space S, while events {X
x} and {Y y} refer to the joint sample space SJ
274
275
Comparisons of events in S and SJ
k p
B Yy
This new sample space in SJ is called the range

sample space or 2-D product space, but we will just
call it joint sample space
276
In the study of multiple RVs, we characterize events

by the following:
Joint Cumulative Distribution Function
Joint Probability Density Function
Concept of joint PDF is an extension of joint
probability
Marginal Density and Distribution Function
Given joint PDF or CDF, find the PDF or CDF of
one of the RVs
Joint Expectation of 2 Random Variables
Conditional Expectation of Random Variables
Independence of one Random Variable and another
277
Correlation of Random Variables

The relationship between the 2 RVs in terms of
their means
Covariance of Random Variables
The relationship between the 2 RVs in terms of
their variances
Correlation Coefficient
The normalized 2nd order joint central moments
Functions of two Random Variables
Transformations of Random Variables
As in one RV, multiple RVs can also be
transformed
More difficult to compute, etc.
278
Joint Cumulative Distribution Function - 1

Considering only two Random Variables X and Y

If X and Y are RVs, then the joint cdf of X and Y is given by
P X x, Y y , continuous
F (x,y ) =
XY
discrete
P X x, Y y ,
FXY(a,b) is the probability that X and Y lie in the semi-infinite
region of the (x, y) plane
Joint CDF
y
b
Properties:
The properties of joint CDF is similar to that of the single variable
279
280
1) 0 FXY ( x, y) 1
2) FXY ( x, y) is a nondecreasing function of both x and y
P[ X a, Y ]
3) FXY (, ) FXY (, y) FXY ( x, ) 0

This means that it is impossible for X or Y or both to assume a value
less than - (boundary conditions)
P[a1 X a2 , b1 Y b2 ], a1 a2 , b1 b2
(a2,b2)
b1
(a1,b1)
(a2,b1)
This is the probability of this rectangle

x
a2
a1
281
Computing Probabilities with Joint CDF - 1
282
1) P[ X a, Y b] FXY (a , b )
6) P[a1 X a2 , b1 Y b2 ]
=FXY (a2 , b2 )+FXY (a1 , b1 )-FXY (a1 , b2 )-FXY (a2 , b1 )+P X =a1 , b1 <Y b2
2) P[ X a, Y b] 1 FX (a )-FY (b )+FXY (a ,b )
3) P[a1 X a2 , b1 Y b2 ]
where
FXY (a2 , b2 ) + FXY (a1 , b1 )-FXY (a1 , b2 )-FXY (a2 , b1 )
P X a, b1 Y b2
lim FXY (a 1n , b2 )- lim FXY (a 1n ,b1)- lim FXY (a 1n ,b2 ) lim FXY (a 1n , b1)
4) P[a1 X a2 , Y b] FXY (a2 ,b ) FXY (a1 ,b )

y
7) P[a1 X a2 , b1 Y b2 ]
b2
a2
Computing Probabilities with Joint CDF - 2
a1
Note:
The first 5 properties are just the 2-dimensional extension of
properties of one random variable
Properties 3, 4, and 5 may be used to test whether a given
function is a valid joint CDF
As in the case of a single RV, joint CDF can be used to compute
probabilities of unions and intersection of semi-infinite rectangles
5) FXY (a2 , b2 ) FXY (a1 , b1 ) FXY (a1 , b2 ) FXY (a2 , b1 )

(a1,b2)
P[ X , Y b]
It is certain that X and Y assume a value less than (boundary

conditions)
7) FXY (, y) FY (y)
4) FX (,) 1
b2
6) FXY ( x,) FX ( x)
FXY (a2 , b2 )+FXY (a1 , b1 )-FXY (a2 , b1 )-FXY (a1 , b2 )-P Y b2 , b1 <Y b2
b1
5) P[ X a, b1 X b2 ] FXY (a , b2 ) FXY (a , b1 )
283
284
Marginal Distribution Functions

In the study of several RVs, the statistics of each RV

can be obtained from the joint RV. This is known as
Marginal function
The marginal CDFs of the RVs X and Y are
FX ( x) =
RSF (x,)
Tz z f (, y)ddy
Joint PDF
XY

XY
F (,y)
XY
F (y) =

Y
f (x, )d dx
XY
285
Joint Probability Density Function - 1
Joint Probability Density Function - 2
The joint density of X and Y is defined as
Properties:
f XY ( x, y) = d FXY ( x, y)
dxdy
It is assumed that X & Y are jointly continuous, else
the derivative may not exist
It follows that
XY
( x, y) =
zz
x
286
1) f XY ( x, y) 0
2)
z z
f XY ( x, y)dxdy FX () 1
z z
4) F ( x) z z f
5) F ( y) z z f
3) FXY ( x, y)
x
XY
f XY ( , )dd
XY
( , )dd
( , )dd
6) P a1 X a2 , b1 Y b2 bb2 a 2 f XY ( x, y)dxdy
a
f XY ( , )dd
7) f X ( x)
f XY ( x, y)dy
8) f Y ( y) f XY ( x, y)dx
Note:
Properties 1 and 2 are sufficient to test the validity of joint
PDF
287
288
Computing Probabilities with Joint PDF

As in joint CDF, the joint PDF can be used to compute the

probabilities of random variables
PA
zz
f XY ( x, y)dxdy
z z f ax, yfdxdy
2) P a X b, c Y d z z f a x, yfdxdy
5) P a X b, c Y z z f a x, yfdxdy
6) P X a, c Y d z z f XY a x, yfdxdy
1) P a X b, c Y d
b d
a c
XY
b d
a c
XY
Joint PMF
b d
a c
XY
b d
a c
XY
a c
XY
a d
a c
Care should be exercised with the limits of the integration when

discrete or mixed RVs are involved.
You may have to integrate a (.) on the boundary
289
290
Marginal Distribution/Density Functions
Joint Probability Mass Function

Considering only two discrete Random Variables X

and Y
The joint pmf of X and Y is given by
In the study of several RVs, the statistics of each RV can be

obtained from the joint RV. This is known as Marginal function
The marginal CDFs of the RVs X and Y are
XY
(x,y ) =P X x, Y y
pXY(a,b) is the probability that X and Y equal to some

value (x, y)
FXY (x , )
(x ) = x
f XY ( ,y )d dy
F (,y)
XY
F (y) =
y
Y
f (x, )d dx
XY
The marginal PDFs of the RVs X and Y are
The properties of joint PMF is similar to that of the

single variable
z
z
f X ( x) = f XY ( x, y)dy = d FXY ( x,)
dx
f Y ( y) = f XY ( x, y)dx = d FXY (, y)
dy
291
292
Relationship Between X and Y - 1

1. Independence of X and Y
Statistical independence can be depicted in terms of joint

distributions, joint densities, and joint probability
functions
Recall that if events A and B are independent,
P A B P A P B
Hence
P X x, Y y P X x P Y y
Statistical properties of two

Random Variables
Joint Expectation (i.e., joint moments)
Covariance of a Random Variables
Correlation of X and Y
Correlation Coefficient, etc.
Conditional Expectation and Variance
FXY ( x, y ) FX ( x) FY ( y )
f XY (x ,y ) = f X (x )fY (y )
Implies that if X and Y are independent, their jpdf and jcdf factor into
2 marginal densities or distributions, respectively
Also
293
294
2. Joint Moments of X and Y

The joint expected value of two RVs X and Y is defined as
ij X iY j E[ X iY j ]
The sum of i+j is called the order of the moments

Given a function z = g(x,y), we can 1st compute the PDF of
Z and then compute the mean of z
E[ Z ]
zf Z ( z )dz
E[ X i ]E[Y j ]
Thus
10 X , 01 Y 1st order moments
20 E X 2 , 02 E Y 2 , 11 E XY , 2nd order moments
or we can compute directly as follows


- g ( x, y ) f XY ( x, y )dxdy, continuous
E[ g ( x, y )]
discrete
g xi , y j p XY xi y j ,

n k
If X and Y are independent, then

ij E[ X iY j ] - xi f X ( x)dx - y j fY ( y )dy
i j

- x y f XY ( x, y )dxdy, continuous
i j
discrete
xn yk p XY xi y j ,

n k
P X x, Y y P X x P Y y
295
296
Correlation of X and Y - 1
Correlation of X and Y - 2
The correlation of X and Y is defined as
This means that it is possible for X and Y to be

uncorrelated and yet not independent (except for
the jointly Gaussian RV)
If RXY = 0, then X and Y are orthogonal (X Y)

11 RXY E[ XY ]
- xyf XY ( x, y )dxdy
This is, the joint moment when i = j = 1

Measures relationship between the mean of X & Y
The ij-th joint central moment of X and Y is given by
RXY E[ X ]E[Y ]
ij E[ X X Y Y ]
i
Note:
When RXY = E[X]E[Y], X and Y are said to be
uncorrelated
Independence Uncorrelatedness
Uncorrelatedness (not always) Independence
j

i

- x X y Y f XY ( x, y )dxdy
297
Conditional Distribution and Density
298
Conditional PMF - 1
In practice, the outcome of many experiments are not

independent
For example, the output of a communication channel
Y is usually dependent on the input X in order to
convey the proper information
From probability, we know that
Discrete
The joint conditional pmf of X given that Y = y is given
by
P A|B
P A B
PB
P X xi , Y y j
P Y y j | X xi
The definition of joint conditional CDF and PDF can be

directly obtained from conditional probability
P X xi |Y y j
299
P Y yj
P X xi , Y y j
P X xi
pXY xi , y j
pY y j
pXY xi , y j
p X xi
The conditional PMF satisfies all the properties of

PMF
300
Conditional PMF - 2
Conditional Density - 1
If X and Y are independent
For the continuous RV, the denominators of * and ** are zero,

i.e.,
P X xi |Y y j
P X xi P Y y j
P X xi p X ( xi )
P Yyj
P Y yi | X xi
P X xi P Y y j
P Y y j pY ( y j )
P X xi
P Y y j P X xi 0
Hence * and ** are undefined for continuous RV. Fortunately,
the numerators are also zero
We say that * and ** are limiting cases for continuous RV
For X and Y jointly continuous, we obtain the following
x y y
f XY , d d
y
P X x | y Y y y
y y
fY d
y
The conditional CDF of X given that Y = y is

FX x| y j F
X|Y y j P X x|Y y j
XY
P X x, Y y j
P Y yj
Similarly
ab g z dz b c g (c), a c b
FY y | xi FXY Y | X xi P Y y | X xi
P Y y, X xi
P X xi
P X x | y Y y y
301
x
f XY , y d
FXY x | y
fY y
f XY x | y f X x ,
Consequently
f XY y | x fY y
f x, y
d
f XY x | y FXY x | y XY
,
dx
fY y
302

f XY x, y f X x fY y ,
y y
f XY x | y
FXY x | y lim P X x | y Y y y
Also,
f XY y | x
y f XY , y ' d
, y y ', y '' y y
y fY y ''
f x, y
d
FXY y | x XY
fX x
dy
f XY y | x f X x
,
fY y
f XY y | x
f XY x | y fY y
fX x
303
304
Conditional Expectation - 1
Conditional Expectation - 2
Conditional expectation of Y given X = x is given by
Theorem 1:
For any random variables X and Y E[ E Y | x ] E Y

yfY ( y | x)dy, continuous
E Y | x
y j pY y j | x , discrete
j
Proof
Note that E[Y|x] is defined at a given point X = x and is

not defined for any other value of x (zero any other
place)
E[Y|x] is the center of mass associated with the
conditional PDF/PMF
Since E[Y|x] is a function of X, it is itself a RV with its
own probability distribution
E Y | x f ( x)dx
E[ E Y | x ]
X
yf
y | x f ( x) dxdy
XY
X
y f XY x, y f ( x)dxdy
E[ E Y | x ]
f x X
X
yf
( x, y ) dxdy
XY
E[Y ]
305
Joint Central Moment - 1
306
Cov X , Y E[ X X Y Y ]
A) Covariance:
E[ XY Y X X Y X Y ]
E[ XY ] Y E X X E Y X Y
Covariance measures the relationship between variance of

X and Y
The 2nd order joint central moment is known as the
Covariance of X and Y, .i.e,
E[ XY ] Y X X Y X Y
Thus
C XY Cov X , Y
Cov X ,Y E[ XY ] X Y RXY X Y
Note:
If X and Y are either independent or uncorrelated, then
E[ XY ] E[ X ]E[Y ] Cov XY 0
E[ X X Y Y ]

- x X y Y f XY ( x, y )dxdy
If X and Y are orthogonal, then RXY = 0

Cov XY E[ X ]E[Y ]
307
308

B) Correlation Coefficient ()
The normalized 2nd order joint central moment is called the
Cov XY
XY
, 1 XY 1
XY
By definition,
XY E
where
LMa X fa X
N
X
fO, 1
PQ
XY
Transformation in Two
Dimension
If nature has taught us anything it is that the

impossible is probable
2X Var X X 2X
Y2 Var Y Y Y2
Ilyas Kassam
Note that if X and Y are uncorrelated, XY = 0

309
One Function of 2 RVs - 1
310
Transformation from two RVs 1 Random Variable

Given 2 RVs X and Y, we form a new RV Z such that
1. Distribution/Density of Sum of 2 RVs:
Z g ( x, y)
Z aX bY y
z aX
b
Y
The event of interest is {Z z}.

Let Rz denote a region on XY plane such that {Z z} =
g(x,y) z
z
b
z
a
X
Z=aX+bY
aX+bY < Z
FZ z P Z z P g ( x, y ) RZ
R f ( x, y )dxdy
This is an important case because it is frequently

found in the analysis of physical system
f Z z P z Z z dz
R f ( x, y )dxdy
z ax
FZ z P Z z P aX bY z b f ( x, y )dydx
311
312
We fix a value of x and then let y vary from - to (zax)/b

f Z z
d
FZ z
dz
Applications of A1 is seen a lot in the analysis of Linear

Systems
For example, consider the system shown below whereby the
received signal is the convolution of the input signal plus
noise and the impulse response
z f ex, z bax jdxdy
A special case of interest is when X and Y are independent
zz
z ax
b
y
FZ z
f Z z
Note:
f ( x) f ( y)dydx
Signal + noise
f ( x) f z ax dx ...................................... A1
b
h(t)
Receiver Output
The convolution of two functions is often calculated using

Fourier Transform (FT) which is related to the Characteristic
Function (CF) of a random variable
A1 is a convolution integral like the ones encountered in linear

system and communications
This means that if two RVs are independent, then the density of
their sum is equal to the convolution of their marginal densities
Proficiency in evaluating A1 is very important in Electrical
Engineering
(s+n)
a(t )*b(t ) A( f ) B( f )
Since the CF is closely related to the FT, we may write

Z ( ) X ( )Y ( )
313
314
Examples of Convolution
Convolution of two rectangles

fX(x)
fY(y)
fz(z)
a+c
b+d
a+c
b+d
If (b-a) = (d-c), then

fX(x)
fY(y)
fz(z)
These situations arise a lot in communications
Transformation in 2 Dimensions
(1 function of 2 RVs)
315
Sum of Two RVs

Product of Two RVs
Ratio of Two RVs
Minimum and Maximum Functions
316
Expected Value of Sum of 2 RVs - 1
Variance of Sum of 2 RVs - 1
Let
Let
Z X Y
Z X Y , z x y
such that
such that
a f E aZ a ff
E a X Y a ff E ka X f aY fp
var Z E Z z
E Z E X Y x y f ( x, y )dxdy
Expanding
xf ( x)dx yf ( y )dy
E Y Y
fa
2E X X Y Y
That is
Var Z Var X Var Y 2Cov X ,Y
E Z E X Y E X E Y
2Z 2X Y2 2CXY
For arbitrary constants a and b
2Z 2X Y2 2 X Y XY
E Z E aX bY aE X bE Y
317
Variance of Sum of 2 RVs - 2
318
Characteristic Function of Sum of X and Y
If X and Y are uncorrelated, then Cxy = 0 and

hence
2Z 2X Y2
Z ( ) E e j ax by X (a ,b )
ja X
jb Y
Z ( ) E e 1 E e 2
Var aX bY a 2 Var X b2Var Y 2abCov X ,Y
Z aX bY
If X and Y are independent,
For arbitrary constants a and b
Var Z E X X
Hence
xf ( x, y )dxdy yf ( x, y )dxdy
X (a1 )Y (b 2 )
319
320
Moment Generating Function of Sum of X and Y

Let X1 & X2 be jointly continuous RVs with joint PDF fX1X2(x1, x2)
We want to find the PDF of the random variables Z = X1 + X2
Define two new RVs Y1 and Y2 as a function of X1 and X2
Z aX bY
M Z (t ) E et ax by
Y1 g1 x1 , x2 ; Y2 g 2 x1 , x2
Assume that the function g1 & g2 satisfy the following conditions

y1 = g1(x1, x2) and y2 = g2(x1, x2) can be uniquely solved for x1 and x2 in
terms of y1 and y2 with the solution given by
t ax by

f x, y dxdy
e
x1 h1 y1 , y2 ; x2 h2 y1 , y2
t ax
t by
M Z (t )
e
f x dx
e
f y dy
M X (t ) M Y (t )
i.e, h1 and h2 are inverse functions of , g1 and g2

h1 and h2 have continuous partial derivatives at all points (y1, y2) such
that
Note
The technique for the sum of two RVs is applicable to the
difference of two RVs
321
J h1h2
h1
y1
y1, y2
h2
y1
h1
y2
0 Jacobian
h2
y2
322
Under these conditions, Y1 and Y2 are jointly continuous with

joint pdf
fY Y ( y1 , y 2 ) f X X ( x1 , x2 ) J h1h2 y1 , y2
12
12
Finally, transform the joint PDF of Y1 and Y2 in terms of the
original variables
Compute the Jacobian
Jh h
1 2
h1w, z
w, z w
h2 w, z
w
a f
h1w, z
1 0
z
1
h2 w, z 1 1
z
Apply to the fundamental equation

In terms of the
original variables
fwz (w, z) f XY ( x, y)1
Sum of 2 independent RVs
Transform to original variables and integrate
Z X Y
f
(w, z w)dw
XY
f z (z)
Define two new functions
a f
a f
z x y g1 x, y
w x g2 x, y
f
( x, z x)dx
XY
f z ( z)
Determine the inverse function
x w h1w, z
y z x h2 w, z z w
Matrix Formulation for Functions of 2 RVs

f ( x) f (z x)dx f (z y) f ( y)dy
Y
Y
X
X
This is known as the convolution integral

323
324
Product of 2 Random Variables
Apply the fundamental equation
Z XY
fwz (w, z) f XY ( x, y) 1 f XY e x, wz j 1
a f
a f
z xy g1 x, y
w x g2 x, y
f z ( z)
1
f XY x, z dx
x
x
e j
x w h1w, z
y z h2w, z
w
Jh h
1 2
h1w, z
w, z w
h2 w, z
w
a f
h1w, z
1
z
z
h2 w, z
w2
z
0
1
1
w
w
325
a f
a f
z x g1 x, y
y
w y g2 x, y
f z ( z)
a f
y f XY zy, y dy
f ( z ) y f X zy fY y dy 2 x f X x fY x dx
z
z
z

h1w, z
w, z w
h2 w, z
w
If X and Y are independent
x zy h1w, z
y w h2w, z
326
fwz (w, z) f XY ( x, y) w f XY e x, wz j w
1 2
Apply the fundamental equation
Z X
Y
a f
Ratio of 2 Random Variables
Jh h
1
1
f ( z ) f X x fY z dx f X z fY y dy
z
x
x
x
y
Maximum Function
h1w, z
z w
z
w
h2 w, z 1 0
z
Z max X,Y
Minimum Function
Z min X,Y
327
328
Jointly Gaussian Random Variables

Gaussian RVs are important because they show up in every

area of engineering and science
Suppose the PDF of 2 Gaussian RVs X and Y (bivariate
Gaussian density) are
The PDF is centered at x and y and the shape

depends on the values of x, y and xy
Since
fXY ( x, y ) fX ( x) fY ( y )
x x 2
exp
2 x2
2 2
1
fX x
X and Y are not independent

But observe that if xy = 0, then fXY(x,y) = fX(x) fY(y)
We can conclude that any uncorrelated Gaussian RVs
are also independent
exp y y
2
2
2
If X and Y are jointly Gaussian, then joint PDF is
fY y
c h
f XY x, y
2
y
1
2 x y 1 2xy
where |xy|1
LM LMMFH
expM N
MM
N
x x
x
I 2 2 xy F x x I FG y y IJ FG y y IJ 2 OP O
K
H x K H y K H y K PQ P
PP
2FH1 2xy IK
PQ
329
330

Statistical Properties of Sum of Two RVs

Expectation of sums of Two RVs
Characteristic Function of sums of Two RVs
etc.,
Multiple Random Variables

(More than 2 Random Variables)
When you have eliminated the impossible, what
ever remains, however improbable, must be the
truth.
Jointly Gaussian Random Variables

Functions of More than 2 Random Variables (a vector)
Multiple Random Variables (more than 2 RVs)
Large Numbers and their properties
Central limit theorem
Sir Arther Conan Doyle

331
332
Multiple Random Variables - 1
Events involving many RVs greater than two

An extension from two RVs to N RVs can be made
without much problem using the concept of 1dimensional vector or matrix
Let X1, ..., XN be the components of an N-dimensional
vector RV, i.e.,
A.Joint Distribution of Vector Random Variables

For N random variables X1, X2, ..., XN, the joint CDF is defined as
P[ X 1 x1 , X 2 x2 , , X N xN ],
FX ( x1 , , xN )
P[ X 1 x1 , X 2 x2 , , X N xN ]
Continuous
Discrete
Properties are similar to the case of two RVs

1) 0 FX ( x1,, xN ) 1, X R N
2) FX (, ,,) 1
X = X1 , X 2 , , X N
3) FX ( x1,, xN ) 0 when xk for some k 1, 2,, N
4) FX ( x1,, xN ) is continuous from the right

5) FX ( x1,, xN ) is nondecreasin g
333
334
B. Marginal CDF of Vector Random Variables

If we substitute in FX(x1, ... ,xN) certain values by we
obtain the JCDF of the remaining variables
For example,
C. Joint Density of Vector Random Variables

The joint PDF of X = [X1, X2, ..., XN], is defined as
f X ( x1,, xN )
FX ( x1,, xN )
x1x2x N
If we know the joint PDF, then
FX ( x1,, xN 1) FX ( x1,, xN 1, )
FX ( x1, x2 ) FX ( x1, x2,)
x x
n n 1
x
1
FX ( x1 , , xn ) f X ( x1 , , xn )dx1dx2 dx
n
FX ( x1, x4 ) FX ( x1, , , x4,)
335
336
D.Marginal PDF of Vector Random Variables
E.Independence of Vector Random Variables
Marginal densities are obtained by integrating out the non

required variables
The N Random Variables X1, X2, ..., XN are independent if the events
X1 x1, X2 x2, ..., Xn xn are independent
This implies that
FX ( x1,, xN ) FX ( x1) FX ( x2 )FX ( xN )
f X ( x1)
z z z f
( x1,, xn )dx2dxn
More generally, the marginal joint PDF of any k of the N RVs

can be found by integrating the PDF over the remaining n-k
variables
For example, for n = 4
f X ( x1,, xN ) f X ( x1) f X ( x2 ) f X ( xN )
pX ( x1,, xN ) pX ( x1) pX ( x2 ) pX ( xN )
It follows that any subset of xi is a set of independent random variables
For example for N = 3 and x1, x2, x3 are independent, then
f X ( x1, x2 , x3) f X ( x1) f X ( x2 ) f X ( x3)
f X ( x2 x4 ) f X ( x1 , x2 , x3 , x4 ) dx1dx3

f X ( x1, x2 ) f X ( x1) f X ( x2 )
f X ( x1, x3) f X ( x1) f X ( x3)
f X ( x2 , x3) f X ( x2 ) f X ( x3)
337
338
However, if xk are independent in pairs, they are not

necessarily independent
It is possible that
F.Conditional joint density of Vector RV

The conditional density of vector RVs is given by
f X ( xN ,, xk 1| xk ,, x1) f X ( x1,, xk ,, x N )
f X ( x1,, xk )
For example, when N = 3, one obtains f X ( x1|x2 , x3) f X ( x1, x2 , x3 )
f X ( x1, x2 ) f X ( x1) f X ( x2 )
f X ( x1, x3) f X ( x1) f X ( x3)
f X ( x2 , x3) f X ( x2 ) f X ( x3)
f X ( x2 , x3 )
We can rewrite the expression as follows

f X ( x1,, xk ,, xN ) f X ( xN ,, xk 1| xk ,, x1) f X ( x1,, xk )
but
This implies that we can use chain rule to write the joint pdf as
f X ( x1, x2 , x3) f X ( x1) f X ( x2 ) f X ( x3)
f X ( x1,, xN ) f X ( xN |x1,, xN 1) f X ( x1,, xN 1)

f X ( xN |x1,, xN 1) f X ( xN 1| x1,, xN 2 ) f X ( x1,, xN 2 )
f X ( xN |x1,, xN 1) f X ( xN 1| x1,, xN 2 ) f X ( x2| x1) f X ( x1)
Correspondingly,
FX ( xN ,, xk 1| xk ,, x1)
339
z z
xN
xk 1
f X (t N ,,t k 1|t k ,,t1)dtk 1dt N
340
G.Joint Expectation of Vector RV
For N random variables X1, X2, , XN, the (n1 + n2 +

+ nN)-order joint moment are defined by
The joint expectation of vector random variable is given by

E[ x1, x2 ,, xN ]
z z bx ,x ,,x g f
( x1 ,, x N )dx1dxN
n n
1 2
or in vector notation we have

E[XN ]
z z
For N random variables X1, X2, , XN and some function of

these random variables g(X1, X2, , XN), the expected value is
given by
h z z gbx ,x ,,x g f
E[ x1 1 , x2 2 , , xNn ]
x1 1 , x2 2 , , xNN f X ( x1 , , xN )dx1 dxN

X N f X (X N )dXN
E[ x1, x2 ,, x N ]
where n1, n2, ..., nN are all positive integers
( x1,, x N )dx1dxN
341
H.Joint Central Moments and Variance of Vector RV
I. Characteristic Functions of Vector Random Variable
For N random variables X1, X2, , XN, the (n1 + n2 + + nN)order joint moment are defined by
h c X h c X h ]
z z b X g b X g b X g
E[ X 1 1
n1
n
1
n2
X 1, 2 ,, N E[e jb 1 X1 2 X2 ,, N X N g]
nN
n
2
342
z z
e jb1X1 2 X2 ,, N X N g fX ( x1,, x N )dx1dxN
If Independent,
X 1, 2 ,, N E[e jb 1 X1g]E[e jb 2 X 2 g]E[e jb N X N g]
f X ( x1,, x N )dx1dxN
c h c h
c h
X 1 X 2 X N
J. Moment Generating Function of Vector Random Variable

MX t1,t2 ,,t N E[ebt1 X1 t2 X2 ,, t N X N g]
z z
e jbt1 X1 t2 X2 ,, t N X N g fX ( x1,, x N )dx1dxN
If independent,
t X
t X
t X
M X t1 , t2 , , t N E[e 1 1 ]E[e 2 2 ] E[e N N ]
M X t1 M X t2 M X t N
343
344
K. N Jointly Gaussian Random Variables

N random variables X1, X2, , XN, are called jointly Gaussian if
there density function can be written as
b g a2 f1 K
f X XN
N
2
1
2
LM 1 aXmf K aXmfOP
N2
Q
exp
The elements of the covariance matrix is given by
R|
C S
|TC
ij
2
Xi ,
i j
Xi X j , i j
For example, for N = 2
where X and m are column vector, and K is the covariance

matrix all defined by
LM x OP
x
X M P,
MM PP
Nx Q
LMC
C
KM
MM
NC
LM OP
mM P
MM PP
N Q
1
2
21
C12
C22
CN 2
11
N1
C1N
C2 N
CNN
OP
PP
PQ
22
345
Analysis on Random Variables
LM
MN
2
1
1 2
22
OP
PQ
346
Important Properties of Large RVs - 1
Assume that we have N random variables, X1, X2, , XN

Mean of the Sum of RVs
Sample Average (SA)

Suppose that random variables X1, X2, , XN, are independent
and identically distributed (iid), each with mean and variance
LM
N
OP
Q
E ak X k ak E X k
k 1
k 1
expectation of the sum of N RVs is the sum of their expectations
The sample average is defined as
Mean of the Product of RVs
LM
N
OP
Q
k 1
N X 1 X 2 X N N1 X k
N
E ak X k ak E X k
Independence is assumed
k 1
n is also called the arithmetic mean or normalized sum
Variance of the Sum of RVs
LM
N
OP RS c
Q T
gUVW
h b
a a E c X E X hb X E X g
a Varb X g a a Covb X X g
N
Properties of Sample Average
Var ai Xi E a j X j E X j ak Xk E Xk
i 1
j 1
k 1
N
j 1k 1
N
j 1
The expectation of n is given by

N
E N E N1 X k
k 1
N
N1 E X
k 1 k
2
j
j 1k 1
jk
k 1
347
1 N
E
N k 1
X k N1 N
348

This means that the mean value of n is the same as the

mean value of the RV Xk
Var N Var
LM
N
1
X
N k 1 k
OP ELMFH
Q N
1
X
N k 1 k
The variance of n is given by
IK OP
Q
hence
1 N
1 N N
2 E X E X X2
Var N 2 X2
X
k
N j 1
N j 1 k 1 j
2
X
1
1
Var N E X X k X2
j
N
k 1
N j 1
1
N2
X2
j k
1
(N (N
N2
1) X2 X2
X2
N
This means that the variance of n is 1/n times the variance of the RV
Xk
2
E X j Xk X
N

N 2 j 1 k 1
Var N 0 as N
1 N
1 N N
2 E X 2j 2 E X j X k X2
N j 1
N j 1 k 1
This implies that the probability that n is close to the true mean
approaches zero as N becomes larger and larger
j k
but
E X 2 X2 E X X2 X
2
349
350

In fact, this is the premise of the Chebyshev inequality

which states that
Var N
P N E N a
2
a
Substituting
Stochastic Processes
(a.k.a. Random Processes)
2
P N a 2
Na
It is remarkable that a science which began with the

consideration of games of chance should have become
the most important object of human knowledge.
The complement will be

2
P N a 1 2
Na
Laplace Pierre Simon, 1812
351
352

Definition and Specifications of one Random process

Sample distribution and density functions of random
Processes
Some important Random Processes (independent
increment)
Statistical properties of Random processes
Definitions
Expectation of Random Processes

Variance of Random Processes
Autocorrelation function and its properties
Power Spectral Density of a Random Process and its
properties
353
Stochastic Processes - 1
354
Recall that a RV is a rule for assigning a number on

the real line to an experiment S
The collection of such waveforms form a stochastic

process.
The set of {k} and the time index t can be continuous
or discrete (countably infinite or finite) as well.
For fixed k S (the set of all experimental outcomes),
X(t, ) is a specific time function
In other words, a Random Process is a rule for
assigning to every outcome of an experiment , a
function of time, X(t, )
A RP can be viewed as a function of two variables:
This assignment is only a function of the outcome of that

experiment
Often, random data collected from an experiment are

functions of time
If time factor is included in our experiment, then a
Random Process (RP) arises
Let denote the random outcome of an experiment.
To every such outcome suppose a waveform X(t, ) is
assigned.
355
an event , and time t
356
Consider random experiment specified by outcomes Si

from some sample space S
x1(t)
S
A realization,
sample path,
or sample
function
x2(t)
S1 S
2
Sn
xn(t)
t
tk
If the fixed values are denoted by a subscript, we obtain:
tk+1
Observation interval
X (t , j ) = is a deterministic time function or sample function
For a fixed time tk inside the observation space, a set of

sample functions
X (t j , ) = X (t j ) is a random variable
X j t, , j 1, 2,, n
X (t , ) = X (t ) is a random process
X (ti , j ) = X (ti , j ) is a real number
are observed, where j is a member of S

To every outcome , we assign, according to some

rule, a time function X(t, )
A specific event, say j, X(t, j) signifies a single time
function
Since a RP is a function of two variables, t and , one
or both of these may be chosen to be fixed
357
358
From above illustration, we can conclude that given a

RP, if we sample at a given time, we obtain a RV.
Equivalently, a RP is the mapping of outcome of a

random experiment to function of time.
e.g. X(t) = acos(0t + ), where is a uniformly
distributed random variable in (0, 2) represents a
stochastic process.
To distinguish a RV from a RP, we note that
1.the outcome of a RV is mapped into a number on
the real line
2.the outcome of a RP is mapped into a function of
time, t
For fixed t, X1 = X(t1, i) is a random variable.
These time indexed family of random variables {X(tk,

1),, X(tk, n)}, (X(t)) are known as RP.
The ensemble of all such realizations X(t, ) over time
represents the stochastic process X(t).
A stochastic process X(t) is a collection of time functions
corresponding to various outcomes of an experiment.
359
360
Examples of Stochastic Processes abound in nature, e.g.,

1. Stock market fluctuations
2. Brownian motion
3. Information signals such as voice (speech), TV, computer
data sequence, electrical noise, etc.
4. Brain/heart waves
(electroencephalogram/electrocardiograms)
5. Various queuing systems
6. Sound (or music) signals
7. Random sinusoidal signal
8. Buffer content of Network Routers
9. Network Link Utilization
10.Random binary sequences, etc
Classification of Random Processes:

A RP can be classified as discrete-time or continuous-time
361
Continuous RP (uncountable collection of RVs)
Random Process=
Discrete RP (countable collection of RVs)
Specifying a Random Process

Question: How do we characterize the probabilistic behavior of a RP?
Answer: We must specify the joint CDF/PDF for an infinite of RVs!
Since this is not possible, we must select a subset of k RVs and then
specify the joint probabilities
The idea is that event of interest do not necessarily involve all the
RPs
Loosely speaking, a RP is just an infinite bunch of RVs with slightly
different notation, one for each time, t
From the general definite we can obtain specific cases:

A) First-order distribution of the random process x(t) is
Let X1, ..., Xk be k RVs obtained by sampling the random

process X(t,s) at times t1, t2, ..., tk, i.e.,
a f
a f
X1 X t1,s , X2 X t2 ,s ,, Xk X tk ,s ,
FX x,t P Xt x
Notice that FX(x,t) depends on t, since for a different t, we

obtain a different RV
B) First-order density of the random process x(t) is
then the k-dimensional joint CDF is given by
af
af
af
FX x1,, xk ; t1,, tk P X t1 x1, X t2 x2 ,, X tk xk
If the RP is continuous, then the the k-dimensional joint PDF

can be obtained as follows
fX x1,, xk ; t1,, tk
k FX x1,, xk ; t1,, tk
x1x2xk
a f

f X x, t FX x,t
x
C) Second-order distribution of the random process

For t = t1 and t = t2, X(t) represents two different random variables X1
= X(t1) and X2 = X(t2) respectively. Their joint distribution is given by
Similarly, when the RP is discrete, then the JPMF is
af
af
FX x1, x2 ; t1, t2 P X t1 x1, X t2 x2
pX x1,, xk P X1 x1, X2 x2 ,, Xk xk ,
362
a f
363
364
D) Second-order density of the random process x(t) is
fX x1, x2 ;t1,t2
2 FX x1, x2 ; t1, t2
x1x2
E) This can be extended to k-order distribution or density functions. For

example, the nth order density function of a RP X(t) is
f X ( x1 , x2 , xn , t1 , t 2 , t n )
As in random variables, marginal cdf and pdf of a RP is given
by
a f a
FX x1,; t1 FX x1, ; t1 ,t2
a f z f ax , x ;t ,t fdx
f X x1;t1
2 1 2
It is important to mention that these descriptions are partial

description since full descriptions are not possible.
Complete specification of the stochastic process X(t) requires
the knowledge of
f X ( x1 , x2 , xn , t1 , t 2 , t n )
for all ti, i = 1,2, , n, and for all n.
Statistical Properties of Random

Processes
365
Properties of Random Processes - 1
The concept of randomness and coincidence will

be obsolete when people can finally define a
formulation of patterned interaction between all
things within the universe.
Toba Beta
366
A.Mean of X(t)
First-order Random Processes (i.e., function of one
random process)
The mean of a random process X(t) is given by
m X (t ) X (t ) E X t
x(t ) f X x, t dx
Time Average of X(t)

Is an alternative way of computing the mean
function of a RP X(t) by averaging it over the time
interval [-T, T] or some period
The time average is defined by
In general, the mean of a process can depend on

the time index t.
367
X t
1 T
x (t ) dt
2T T
X t
1
T
T
2T
2
or
x (t ) dt
368
C. Autocorrelation of X(t)
Autocorrelation Function (ACF) of a RP x(t) is denoted
as either RXX(t1,t2) or RX(t1,t2) or RXX(t, t+)
Autocorrelation of a random process X(t) is given by
B. Variance of X(t)
The variance of a random process X(t) is given by
X2 (t ) Var X t
2
E X t X t
E X 2 t E X t
RXX t1 , t2 E X t1 X * t2
x1 x2 f x1 , x2 ; t1 , t2 dx1dx2
It follows that
RXX t1, t2 E X t2 X * t1
a f
af af
Value of RX(t1, t2) when t1 = t2 = t is the average power

of x(t)
RXX t, t E X 2 t 0
a f
369
The last expression implies that the autocorrelation of

a random process X(t) is a positive definite function
Note that
2
E X t1 X t2 RX t1,t1 2 RX t1,t2 RX t2 ,t2
Properties of Autocorrelation Function
b a f a fg
a f
370
a f a f
1. The mean squared value of X(t)
bf
RX 0 E X 2 t
2. For WSS process RX() is an even function, i.e.,
RX RX
This implies that we may also define the autocorrelation
function as
bf b f
RX E X t X t
371
372
Proof:
Consider
c b f b fh
b f bf
E X t X t
RX E X t X t
but
b f
bf
bf b f
E X 2 t E X 2 t E X t X t
2 RX 0 2 RX 0
Hence
RX 0 R X
RX E X t X t
E X t X t
RX
3. For WSS process RX() is maximum at the origin i.e.,
RX 0 RX
4. If X(t) has a dc component, then RX() will have a

constant component
For example, if X t A then
bf b f
RX E X t X t E A2 A2
5. If X(t) has a periodic component, then RX() will also
have a periodic component with the same period
373
D. Autocovariance of X(t)
The autocovariance of a RP X(t) is given by
CX t1,t2 E X t1 X t1 X t2 X t2
E. Correlation Coefficient
The correlation coefficient of a RP X(t) is given by
a f l a f a fql a f a fq
E Xat f Xat f at fXat f at fXat f at f at f
E Xat f Xat f at f at f
1
374
X t1 , t2
C X t1 , t2
C X t1 , t1 C X t2 , t2
Thus
where
X t1 , t2 1
C X t1 , t2 RX t1 , t2 X t1 X t2
The value of C(t1, t2) when t1 = t2 = t is the variance of

X(t), i.e.,
2
C X t , t Var X t E X t X t
375
376
Power Spectral Density - 1

Power Spectral Density (PSD) is used to describe and

estimate the properties of an observed experiment in the
frequency domain
It describes the distribution of the signal power in
frequency domain
Knowledge of the Fourier Transform (FT) is important in
the understanding of the frequency domain description of
RPs
Recall that the FT of a random process X(t) is defined as
j 2 ft
X f x(t )e
dt
Power Spectral Density
and the inverse FT is given by

j 2 ft
x t X ( f )e
df
377
378
But the above equations cannot be computed for realistic

samples of all RPs
A limited definition is required assuming ergodic process
In order to find the FT, it is necessary to modify the
function and limit the samples in some observation
interval, say [-T, T]
Since X(t) have infinite energy and may not have a Fourier
Transform
For a RP X(t), let XT(t) be defined as that portion of the

sample function X(t) that exist between -T and T, i.e.,
X T (t )
Hence
RSx(t),
T 0,
T t T
else
X T f TT xT (t )e
j 2 ft
dt
From this definition, the PSD denoted by SX(f) is given by
af
af
S X f lim 1 E X T f
T
2T
1
2
S X f
XT f
2T
379
380
Properties of Power Spectral Density - 1
Another definition of PSD is obtained from the ACF

For a stationary RP X(t), the PSD SX(f) is the Fourier Transform
of the ACF
S X f F RX
That is,
R ( )e j2f d , continuous
af
R
a f |Sz R (k )e j2kf ,
|T
SX f
discrete
Conversely
af z
RX F 1 S X f
2. SX(f) is a real-valued and even function of f, SX(f) =

SX(-f)
S X f
RX e j 2 f d

RX cos 2 f j sin 2 f d
WienerKhintchine
Theorem
If we know the autocorrelation function, we can compute the

PSD and vice versa (transform pairs)
af
RX S X f
RX
cos 2 f d
RX j sin 2 f d
RX cos 2 f d

S X ( f )e j2f df
1. SX(f) is a nonnegative function of f, SX(f) 0
381
3. SX(f) uniquely determines RX()

RX
SX ( f )e j2f df
382
Properties of Power Spectral Density - 2

4. If X(t) is stationary, then the power content is

determined from the PSD as follows
RX 0 E X 2 t
SX ( f )df
This is the area under the PSD curve. It is also known as the
Average Power
Conversely,
SX 0
RX ( )d
383
Classes of Random Processes

Strict Sense Stationary
Wide Sense Stationary
Ergodic Processes
Cyclostationary Processes
384
Classes of Random Processes - 1
A.Stationary Random Processes
Two Main Types:

1. Strict Sense Stationary (SSS)
A random process X(t) is said to be stationary in the strict
sense if its statistical properties are time independent
This is, the process X(t) and X(t+c) have the same
statistics for any value of c
The CDF of X(t) is same as the CDF of X(t+c).
X(t) is said to be stationary if its statistical properties are time

independent
This means that an observation at time (t0, t1) is the same as
observation at time (t0+, t1+ ). That is,
FX (t1 ), X (tk ) x1 , , xk FX (t1+ ), X (tk ) x1 , , xk
f X (t1 ), X (tk ) x1 , , xk f X (t1+ ), X (tk ) x1 , , xk
FX t t FX t t c FX t
Intuitively, a stationary process is independent of time

The concept of stationarity of a RP is similar to the idea of
Steady State in the analysis of the response of electrical
circuits
Statistical properties are invariant with respect to time
translation
(c)
Prof.
OkeyUgweje
Ugweje
Prof.
Okey
Federal
University of Technology,
Minna
Federal University
of Technology,
Minna
The PDF of X(t) is same as the PDF of X(t+c)
f X t t f X t t c f X t
385
386
Hence a process is nth-order SSS for any c, if
f X ( x1 , x2 , xn , t1 , t2 , tn )
f X ( x1 , x2 , xn , t1 c, t2 c , tn c)
where left side represents the joint PDF of the RVs
X 1 X (t1 ), X 2 X (t2 ), , X n X (tn )

and the right side corresponds to the joint pdf of the RVs
(c) Prof.
OkeyUgweje
Ugweje
Prof.
Okey
X1 X (t1 c),
X 2 X (t2 c), ,
X n X (tn c).
To check for SSS we need to find all the CDF or

PDF as a function of time and then determine all
the moments
By definition it implies that all the moments are
equal and do not depend on time origin
Also, all the joint moments are equal and do not
depend on time
But this is very difficult, if not impossible.
ti , i 1, 2, , n, n 1, 2, and any c.
Federal
University of Technology,
Minna
Federal University
of Technology,
Minna
387
388
2. Wide Sense Stationary (WSS)

The condition on SSS random process is very
restrictive
It is difficult to prove except in limited cases
For RPs with unlimited observation times, proof of
SSS is virtually impossible
A limited definition of stationarity known as WSS RP
is used instead
389
A RP x(t) is said to be stationary in the wide sense if

it meets the following two conditions:
1.Its mean is constant
E Xt E Xt
2.Its autocorrelation (or autocovariance) depends
only on = t1 - t2
R
E X t X t
This is, autocorrelation does not depend on the actual

value of t1 and t2, but depends on difference = t1 - t2
The RP that does not satisfy the requirement of
stationary RP (SSS or WSS), is said to be non-stationary
Prof. Okey Ugweje
390

Note:
If a process is SSS, then it is also WSS
The converse is not true except when the process is
Gaussian, i.e., for a Gaussian Process, WSS also
implies SSS
B. Cyclostationary Random Process

A random process X(t) is said to be cyclostationary if
both its mean and Autocorrelation are periodic in
time with period T, i.e.,
Stochastic Process
WSS
m X (t ) E X t kT
RX RX t kT
SSS
391
392
C. Ergodic Random Process

Some stationary RPs posses the property that almost
every member of the ensemble exhibits the same
statistical behavior as the whole ensemble
By examining only one typical sample function, it is
possible to determine the statistical behavior of the
whole process
Such processes are said to be Ergodic
If the statistical average is equal to the time
average, the random process is said to be Ergodic
This statement implies that it is sufficient to examine

one realization of a process and find its time average
rather than considering a large number of realizations
and averaging over all of them
1. Ergodic in the mean
A stationary RP is Ergodic in the mean if
bf
bf
Xn t E Xn t
2. Ergodic in Autocorrelation
A stationary RP is Ergodic in autocorrelation if
bg bg
b g
X t1 X t2 RX t1, t2
A process that does not posses these properties is

non-ergodic
393
394
Some Important Random Processes
A.Independent Increment Process (IIP)

X(t) is said to have independent (uncorrelated) increments
if for any k and any choice of sampling instants
t1 < t2 < tk the RVs defined by
Examples of Independent Increment Process are:

Poison Process,
Weiner Process
If X(t) and Y(t) are such that the RVs X(t1), , X(tn)
and Y(t1) and Y(tn) are mutually independent, then the
processes are independent
Y1
Y2
Yk 1
bg bg
Xbt g Xbt g
X t2 X t1
3
bg b g
X tk X tk 1
are independent RVs

i.e., it possesses independent increments if the changes in
the value of the processes over non-overlapping time
intervals are independent
395
396
Markov Process (continuous-time Markov chains)

A RP X(t) is said to be Markov if the future of the process given

the present is independent of the past
This means that a Markov process is a stochastic process
whose past history has no influence on the future, if the
present is specified
i.e., for any k and any choice of sampling instants t1< t2 < tk,
P X t x | x
, , x P X t x | x
k
k k 1
1
k
k k 1
A RP that has independent increment is also a Markov Process

Other processes of interest include:
1. Gaussian Processes
2. Brownian Process
3. Renewal Process
4. Regenerative Processes
397
Multiple Random Processes
Multiple Random Processes - 1
398
Cross-Correlation Function
As in random variables, multiple Random Processes

(RPs) are extension of single random processes
Multiple processes arise naturally when dealing with 2
or more RPs defined on the same probability space
Off course, complete description requiring the
specification of all the joint statistical behavior for all
time samples is not possible
We will restrict our study to second-order processes (2
RPs X(t) and Y(t)), which are considered to be
stationary
The following are characteristics of second-order
Random Process
A. Cross-Correlation Function (CCF)

CCF describes the relationship between two RPs
X(t) and Y(t) and is given by
399
RXY t1 , t2 E X t1 Y t2
x(t ) y (t ) f x, y dx(t )dy (t )

1
2 XY
1
2
It is assumed that X(t) and Y(t) are jointly stationary

Note that RXY(t1, t2) = RXY(t2, t1)
400
Properties of CCF - 1
1.For two WSS processes X(t) and Y(t)
2. For 2 WSS processes X(t) and Y(t), the CCF is bounded

as follows
RXY E X t Y t
RYX E Y t X t
Thus
RXY
RXY RX 0RY 0
RYX
Note that the above equation simply indicates

symmetry. It does not necessarily indicate that the
CCF is even
The ACF of a RP is even but the CCF is not
CCF does not necessarily have its maximum at = 0. The

maximum can occur anywhere but the value is limited
3. For two WSS processes X(t) and Y(t), the CCF is bounded as
RXY 1 RX 0 RY 0
2
bf
b f
b f
E X 2 t E Y 2 t 2 E X t Y t 0
RX 0 RY 0 2 RXY 0
RXY 1 RX 0 RY 0
2
To demonstrate, consider E[(X(t) Y(t+))2] 0

401
402
4. If two RPs X(t) and Y(t) are statistically independent, then
Now, if at least one of the process is zero mean, then RXY 0
6. Generally, the correlation matrix of 2 RPs X(t) and Y(t) is given by
RXY RYX
b g LMRR bbtt ,,tt gg RR bbtt ,,tt ggOP

N
Q
If X(t) and Y(t) are WSS, then
L R R OP
R M
NR R Q
RXY E X t Yt E X t E Yt
E Yt E X t
E Yt X t
RYX
RXY t1 ,t2
XY
YX
XY
YX
XY
Note that in ACF the value at zero equals mean square

value, but in CCF the value at zero has no special
significance
7. Two RPs X(t) and Y(t) are said to be orthogonal if RXY 0

8. Sum of two Random Processes: Z(t) = X(t) + Y(t)
5. Two RPs X(t) and Y(t) are said to be uncorrelated if

RXY mX mY
RZ RX RY RXY RYX
SZ S X SY S XY SYX
S X SY 2S XY
RXY RYX
403
404
Time Cross-Correlation Function - 1
9. Equality of two random processes

Two processes X(t) and Y(t) are said to be equal if their
respective time samples are equal, i.e
B. Time Cross-Correlation Function (TCCF)

The time cross-correlation functions are defined as
b g b g
X t , Y t , , for all
2
E X (t ) Y (t ) 0, for all t
If the two processes are jointly ergodic, then

1 T
T 2T T
XY lim
Two processes X(t) and Y(t) are equal in the mean if
x(t ) y (t )dt
1 T
YX lim
y (t ) x(t )dt
T 2T T
Hence
XY RXY ,
405
YX RYX
C. Cross-Covariance (CC)
D. Cross-Power Spectral Density (CPSD)

For two RPs, it is possible to define the cross
power density
The cross-power spectral density is defined as
The cross-covariance of two processes X(t) & Y(t) is

defined as
b g m b g b grmYbt gm bt gr
R bt ,t g m bt gm bt g
CXY t1,t2 E X t1 mX t1
XY
S XY
X(t) and Y(t) are uncorrelated if
b g
R|z R ( )e j2f d ,
a f f S R (k )e j2kf ,
|T
XY
CXY t1,t2 0
406
407
XY
continuous
discrete
408
Random Processes and Linear Systems

Many physical systems involve the processing of

random signals/process, e.g.,
Prediction
Random Processes in Linear

Systems
predicting future values in terms of past values

Filtering and Smoothing
recovering signals corrupted by noise

Modulation
The most important questions of life are, for the

most part, really only problems of probability.
Laplace Pierre Simon
409
converting signals from low frequency to high

frequency
All signal processing operations involves the
transformation of signals from one time or frequency
function to another

If input of a system is random, the output is also

bound to be random
Most of the analysis in Electrical Engineering involve
the understating of the relationships between the input
and output of a linear system
With this knowledge, the engineer will design the
systems
It is assumed that the students in this class is already
familiar with the usual method of analyzing linear
systems in time or frequency domain
Most theoretical problems in EE can be summarized as follows:
410
411
x(t)
x[n]
x(ejw)
X(f)
X(z)
RX(f)
SX(f)
Linear Network
h(t)
h[n]
H(ejw)
H(f)
H(z)
y(t)
y[n]
Y(ejw)
Y(f)
Y(z)
Ry(f)
Sy(f)
Time Function
Difference Equation
Pole-Zero Plot
H - Function
Random Process
412
Now given random input of a linear system X(t), we

can find all the statistical characteristics of the output
Y(t), in terms of the input X(t)
If the system is Linear Time Invariant (LTI), then the
response of the system to an arbitrary input is given
by
If the input of a LTI system is a random process X(t),

then the output is also a random process given by
Y t
h( ) X (t )d
y t h(t ) x(t )
h( ) x(t
h(t

h(t ) X ( )d
where h() is the impulse response
Some of the statistical properties of the output are

given as follows:
)d
Mean:
) x( )d
E Y t E h( ) X (t ) d h( ) E X (t ) d
E X t E X t mx
A LTI system is completely specified by its impulse response
413

mx h( )d

Autocorrelation Function:
E Y 2 t E
X (t s )h( s )ds
X (t r )h(r )dr
E Y t Y t E h( s ) X (t s )ds h(r ) X (t r )dr
ds
X (t s ) X (t r )h( s )h(r )dr
E
E X (t s ) X (t r )h( s )h(r )dsdr
ds
E X (t s ) X (t r ) h( s )h(r )dr
RX ( s r )h( s )h(r )dsdr
But
SY f RY ( )e j 2 ft d
RX ( r s )
j 2 ft
dsdrd
h s h r RX ( s r )e
Hence
If we let u = +s-r, we obtain
E Y 2 t ds R (r s )h( s )h(r )dr

X
Power Spectral Density:
E X (t s ) X (t r ) RX (t s t r )
414
Mean Squared Value:
415
416

SY f h s h r RX (u )e j 2 f ( u s r ) dsdrdu
By taking the Fourier Transform of the cross

autocorrelation function, we obtain the cross power
spectral density of the input and output
h s e j 2 fs ds h r e j 2 fr dr RX (u )e j 2 fu du

H ( f ) H ( f )S X ( f )
S XY f H f S X f
H ( f ) SX ( f )
Cross Relationships between input and output processes:
RXY E X (t )Y (t )
Since RXY() = RXY(-), we obtain
S XY f SYX
f H f SX f
E X (t ) X t r h(r )dr
E X (t ) X t r h(r )dr
RX r h(r )dr
RX h( )
417
418

2016-CME620 Stochastic

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

2016-CME620 Stochastic

Hochgeladen von

Copyright:

Verfügbare Formate

What We Will Study?

CME620 Stochastic Processes

Department of Telecommunications Engineering

Prof. Okechukwu C. Ugweje

Nigerian Turkish Nile University, Abuja

What We Will Study? - 2

Department of Telecommunications Engineering

1. Statistical Properties of one Random Variable

Set Theorem and Venn Diagram

Probability is too important to be left to the

Federal University of Technology, Minna

(c) Prof. Okey Ugweje

Federal University of Technology, Minna

Department of Telecommunications Engineering

Department of Telecommunications Engineering

(c) Prof. Okey Ugweje

Federal University of Technology, Minna

(c) Prof. Okey Ugweje

Federal University of Technology, Minna

Department of Telecommunications Engineering

Only one occurrence of an element in a set is allowed

Sometimes it is easier to describe a set by describing what is

Definition: The complement of a set of all elements in the

Federal University of Technology, Minna

Department of Telecommunications Engineering

For more than two elements

(c) Prof. Okey Ugweje

Federal University of Technology, Minna

Department of Telecommunications Engineering

Department of Telecommunications Engineering

Notice that c = and c =

Definition: The sets A and B are said to be mutually

Federal University of Technology, Minna

(c) Prof. Okey Ugweje

Federal University of Technology, Minna

A Venn diagram is a geometric representation of sets

, = subsets and equality

= not a subset, = is an element of, not an

Department of Telecommunications Engineering

(c) Prof. Okey Ugweje

Also for infinite union of sets, we have

Many more union relationships can be developed

(c) Prof. Okey Ugweje

Federal University of Technology, Minna

Department of Telecommunications Engineering

Department of Telecommunications Engineering

Mathematical expression: A B = AB = {x: x A and x B}

(c) Prof. Okey Ugweje

If A B = then A and B are said to be mutually exclusive

Also for infinite intersection of sets, we have

Federal University of Technology, Minna

(c) Prof. Okey Ugweje

Department of Telecommunications Engineering

Partition: A partition of is a collection of mutually

Mathematical expression: Ac = {x: x S and x A}

Complements (Inversion, Opposite)

Consist of elements of set A not in set B

Mathematical expression: Ac = {x: x S and x A}

Department of Telecommunications Engineering

Federal University of Technology, Minna

Federal University of Technology, Minna

(c) Prof. Okey Ugweje

Federal University of Technology, Minna

Example 7 Venn Diagram