Sie sind auf Seite 1von 105

# What We Will Study?

- 1

## CME620 Stochastic Processes

Department of Telecommunications Engineering

CME620

2016

Week
1

f X ( x)

1
2

b g

0607
. a

X ~ N ,
3

px(k)

Lecture Guide

## Prof. Okechukwu C. Ugweje

Prof. Okey Ugweje

## Nigerian Turkish Nile University, Abuja

Topics
1.
2.
3.
4.

Course Introduction
Set Theory and Venn Diagrams
Unions, Intersections, Compliments, etc.
Probability Theory
Probability Space and Probability Measure
Axioms of Probability
Conditional Probability
Independence of Events (mutually exclusive events)
Partition- Law of total probability
Bayes' Rule
2
1. Definition and Characterization of One Random Variable
Probability Distribution Function (cdf) and their properties
Probability Density Function (pdf) and their properties
Probability Mass Function (pmf) and their properties
2. Conditional distributions and densities
3. Important Random Variables (Discrete and Continuous)
Discrete Binomial, Bernouli, Poisson, Hypergeometric,
- Uniform,
Exponential,
Rayleigh, Nakagami, 2
(c) Prof. Okey Ugweje Continuous Federal
University
of Technology,Gaussian,
Minna

## What We Will Study? - 2

Department of Telecommunications Engineering

## 1. Statistical Properties of one Random Variable

Expected value (mean value)
Mean Square Value
Time Average
Statistical Average versus Time Average
Variance
2. Transformation of a Random Variable (cdf and pdf)
3. Calculating probabilities through cdf and pdf
END OF COMBINED COURSES

## Probability is too important to be left to the

mathematician
- Unknown Engineer
(c) Prof. Okey Ugweje

Set Theory - 1

Set Theory - 2

## Department of Telecommunications Engineering

Definition:
A set is a collection of distinct objects called elements
Usually written as a list of elements enclosed in
brace { }
Since elements must be distinct, 2 or more
elements in a set cannot be the same
Example 1:
{1,2,3} is a valid set whereas {1,1,3} is not
Set can be made up of elements which are
themselves sets
Set can be finite or infinite

Example 2:
The set of all positive integers {0,1,2,3, } is countably
infinite, whereas the set of all real number [0,1] is
uncountably infinite
All sets are subsets of the sample space
Definition:
The union of two sets A and B (denoted as A B) is a
set that contains all elements in either A or B

## Federal University of Technology, Minna

A B | A or B
For more than two elements

Set Theory - 3

## Department of Telecommunications Engineering

Example 3:
If A = {1,2,4}, and B = {1,3,5}, then A B = {1,2,3,4,5},

Definition:
A set A is a subset of a set B (denoted as A B) if all the elements
of the set A are also in the set B.

## Only one occurrence of an element in a set is allowed

Example 5:
Set A = {1, 2} is a subset of set B = {1, 2, 3, 5}

Definition:
The intersection of two sets A and B (denoted as A B) is a
set that contains only the elements that appear in both sets

## Sometimes it is easier to describe a set by describing what is

not in the set. This leads to the concept of complement.
In general, if S contains n elements, then there are 2n subsets

A B | A and B
n

i 1

i 1

## Definition: The complement of a set of all elements in the

universe that are not in the set.
A x|x A

Ai Ai

Example 6:
If ={1, 2, 3, 4, 5}, the complement of the set B = {1, 2, 3},
is the set Bc = {4, 5}

Example 4:
If A = {1, 2, 4}, and B = {1, 3, 5}, then A B = {1},
(c) Prof. Okey Ugweje

Set Theory - 4

n
n
Ai Ai
i 1
i 1

Set Theory - 5

Set Theory - 6

## Notice that c = and c =

With above definitions, we can describe complex
collection of objects
Some relationships with set are important enough
to have special names

Set Operators:

## Definition: The sets A and B are said to be mutually

exclusive (or disjoint) if they have no elements in
common; i.e., A B =
Definition: The sets A and B are said to be mutually
exhaustive if they contain all the elements of the
universe; i.e., A B =
(c) Prof. Okey Ugweje

= universal set

= null set

= union

= intersection

, = subsets
element of

## (c) Prof. Okey Ugweje

Venn Diagrams - 1

## Federal University of Technology, Minna

10

Venn Diagrams - 2
Department of Telecommunications Engineering

## A Venn diagram is a geometric representation of sets

A
B
Union
All elements of both A and B
S
At least one of A or B occurs
A B A B
Parallel systems
Mathematical expression: AB = {x: x A or x B}
In a situation where one or more of the events A
occurs, we have
n
n
Ak A1 A2 An Ak
k 1
k 1
Federal University of Technology, Minna

= S

## Also for infinite union of sets, we have

Ak A1 A2 Ak ... Ak
k 1
k 1

## Many more union relationships can be developed

especially when restrictions are placed on some sets
Some useful Union relationship:
A B B A

A A

A A A

A S S

A A S

A B C A B C

A B A if B A

11

## Federal University of Technology, Minna

12

Venn Diagrams - 3

Venn Diagrams - 4

## Mathematical expression: A B = AB = {x: x A and x B}

In a situation where events occur in all experiment we have

Intersection (Product)
Elements common to all sets
Elements contained in all sets
Events occur in all experiment
Series systems

## If A B = then A and B are said to be mutually exclusive

Some useful intersection relationship:
A B B A
A

A
B
C

AB

Ak A1 A2 Ak Ak
k 1
k 1

S
A

n
n
Ak A1 A2 Ak Ak
k 1
k 1

ABC

13

A A A

A S A

A A

A B C A B C

## (c) Prof. Okey Ugweje

Venn Diagrams - 5

## Partition: A partition of is a collection of mutually

exclusive subsets of such that their union is .
Ai A j , and

## Mathematical expression: Ac = {x: x S and x A}

Some useful relationship:

i 1

A1
B

Aj

A2

Ai
An

A
Ac

A A S

A A

A A

A B A B

A B A B

## Consist of elements of set A not in set B

A - B = A Bc = A- (A B)
A
c
A-B
B - A = A B

B
Ac

S , S

Difference

A B

Ac

## Mathematical expression: Ac = {x: x S and x A}

(c) Prof. Okey Ugweje

14

Venn Diagrams - 6

mutually
exclusive

B
B-A
S

15

## Federal University of Technology, Minna

16

Venn Diagrams - 7

## Department of Telecommunications Engineering

Subsets
B

A
S

B
C

EF

AB
ABC
De-Morgan's Law

A B A B ;
A

A B
(c) Prof. Okey Ugweje

A B

Ec

A B A B
A

E G

A
n

A B

i 1

17

i 1

F G

Aic ;

## Example 8 Venn Diagram

k 1

E F G
E G F G

Bk Bkc
k 1

Federal University of
18 Technology, Minna

Example 8

## Before launching a new academic program at the Federal

University of Technology (FUT) Minna, the office of the Vice
Chancellor conducted a survey of 130 engineering students to
determine the suitability of one of the following names:
A: Communications Engineering;
B: Communication Systems Engineering; and
C: Communications Technology
The findings of the survey are summarized as follows: 51 liked
name A; 25 liked name A and B; 63 liked name B; 18 liked name
A and C; 47 liked name C; 23 liked name B and C; 10 liked name
A and B and C.
a) Draw a Venn Diagram representing the above survey indicating
all the necessary numbers on the diagram.
b) If a participating FUT Minna student is selected at random, what
is the probability that he or she disliked all 3 program names?

## a) The number in the sample space is 130 (i.e.,

N(S)=130) and the Venn diagram is shown below

## (c) Prof. Okey Ugweje

Federal University of
19 Technology, Minna

Page 18

Page 19

## b) Let Z = "people that like none of the names". From

Venn diagram in (a), we have N(Z)=25.

PZ

NZ
NS

25
130

0. 192

Federal University of
20 Technology, Minna

Page 20

Probability Theory - 1
Department of Telecommunications Engineering

## Probability theory is concerned with the solution of

problems that involve uncertainty and randomness
It is important in the solution of many engineering
problems
Many of todays practical systems work in a chaotic
environment and in order to design efficient, reliable
and cost effective systems, probabilistic models must
be used
Through Random Variables and Random Processes,
we can talk about quantities and signals which are

## Review of Probability Theory

Probability is too important to be left to the
mathematician
- Unknown Engineer
(c) Prof. Okey Ugweje

21

## (c) Prof. Okey Ugweje

Probability Theory - 2

## Federal University of Technology, Minna

22

Some Applications - 1

## Department of Telecommunications Engineering

For example
Data sent through a communication system is
random since the outcome at the receiver is not
certain
Noise, interference and fading introduced by the
channel are random processes and can only be
modeled as such
The measure of performance (e.g., Bit Error Rate)
is probabilistic since it is an estimate of the received
signal compared to the transmitted signal

## Random Input Signals

Input Signal
(Forcing Function)

System

Output Signal

## Input of many physical systems involve a certain degree of

uncertainty/unpredictability that justifies random treatment,
e.g.,
Speech/music signal input of a communication system
Digits applied to a computer
Random signals applied to an aircraft flight control system
Random inputs to process control systems
Steering wheel movements in an automobile power-steering
system

23

## Federal University of Technology, Minna

24

Some Applications - 2

Some Applications - 3

## Random Input Disturbances

System

s(t) + n(t)

System

Output Signal

Input

Output Signal

Noise n(t) is almost always random in nature and calls for the
use of probabilistic methods even if the signal s(t) is not, e.g.,
Thermal noise
Thermal motion of the conduction electrons in the amplifier
input circuit
Random variations in the number of electrons (or holes)
passing through a transistor
Since there are millions of electrons, one cannot calculate
the value of this kind of noise at every instant of time, but
can calculate:

Noise n(t) is almost always random in nature and calls for the
use of probabilistic methods even if the signal s(t) is not, e.g.,
Thermal noise
Thermal motion of the conduction electrons in the amplifier
input circuit
Random variations in the number of electrons (or holes)
passing through a transistor
Since there are millions of electrons, one cannot calculate
the value of this kind of noise at every instant of time, but
can calculate:

## Federal University of Technology, Minna

25

Some Applications - 4

## Federal University of Technology, Minna

26

Some Applications - 5

## Department of Telecommunications Engineering

Quality Control
An important method of improving system reliability
is to improve quality of the individual elements.
This is often done by an inspection process since it
will be too costly to inspect every element

## Information Theory (IT)

Information theory deals with the info content of
message signals such as printed pages, speech,
graphical data, velocity, radiation intensity, etc.
Since such messages and observations are
unknown in advance & random in nature, they can
only be described with probability/random process
The communication channels are subject to
random disturbances that limit their ability to
convey information. To analyze them, probabilistic
models are indispensable

## Thus, it is very necessary to develop rules for

inspecting the elements selected at random. These
rules are based on probabilistic models

27

## Federal University of Technology, Minna

28

Some Applications - 6
Department of Telecommunications Engineering

## It is clear by now that almost any engineering

endeavor involves some degree of uncertainty and
randomness that makes the use of probability and
stochastic concepts a fundamental requirement.
In communication Systems, Randomness is a
CERTAINTY!!

Probability Concepts
We see that the theory of probability is at heart
only common sense reduced to calculations ...
- Laplace Pierre Simon

29

## (c) Prof. Okey Ugweje

Probability Concepts

## Federal University of Technology, Minna

Probability Spaces

## Probability theory deals with the study of random

phenomena
Experiment that do not yield the same outcome in
repeated trials or observations under the same
condition
Averages of phenomena occurring sequentially or
simultaneously
The observed averages approach a constant as
the number of experiments increases
When an experiment is performed, certain
elementary events, Ai occur in different but
completely uncertain ways

## The triple (S, A, P) is called the probability space

where
S = sample space
A = event space
P = a mapping function

30

31

## Federal University of Technology, Minna

32

Probability Spaces

## Sample Space (S)

Set of all possible outcome of an experiment or trial or
observation
Individual outcomes are called elements or points in
the sample space, S = {s1,s2,s3,...}
Number of points in a sample space may be
a) finite (or bounded)
b) countable infinite (or discrete or can be
enumerated but not end)
c) simply infinite (continuous or unbounded)
Sometimes, S can include outcomes that are
impossible

## Simple examples of sample spaces

Consider tossing a coin:

33

## Consider tossing two coins:

S = {TT, TH, HT, HH} = {00, 01, 10, 11}

## Consider tossing three coins:

S = {(000), (001), (010), , (111)}

## Consider throwing a pair of dice:

S = {(1,1), (2,1),, (6,1), (5,6), (6,6)}

## Consider two cards from a deck:

S = {(1,2), (2,1), , (51,52)}
= {(x,y): 1 x 52, 1 y 52, xy}
Federal University of Technology, Minna

## Example 10 Sample Spaces

Department of Telecommunications Engineering

Tossing of 2 Dice
a) Dice are distinguishable
S1 = {(1,1), (1,2), , (1,6); (2,1); (2,2), , (2,6);
(3,1); (3,2), , (3,6); (4,1); (4,2), , (4,6);
(5,1); (5,2), , (5,6); (6,1); (6,2), , (6,6)}
= {6}+{6}+{6}+{6}+{6}+{6} = 36 elements (or 62)
b) Dice are indistinguishable
S2 = {(1,1), (1,2), , (1,6); (2,1); (2,2), , (2,6);
(3,1); (3,2), , (3,6); (4,1); (4,2), (4,3), , (4,6);
(5,1); (5,2), , (5,6); (6,1); (6,2), (6,3), (6,4), (6,5), (6,6)}
= {6}+{5}+{4}+{3}+{2}+{1} = 21 elements
c) May also use Tabular method
(c) Prof. Okey Ugweje

## Federal University of Technology, Minna

34

Event - 1
Department of Telecommunications Engineering

## In most experiments, we are interested in a specific outcome

that satisfies a given condition
Outcome of interest defines a Subset of the Sample Space

A
Definition:
An Event, A, is a set of outcomes;
a subset of the sample space
Event is any possible outcomes of
an experiment. It is the simplest random phenomenon
Event is usually known as the information space
Each Event has associated quantity which characterizes the
objective likelihood of occurrence of that event
That quantity is the probability of the event
(c) Prof. Okey Ugweje

## Federal University of Technology, Minna

36

Example 11 - Events

Event - 2

## In toss of 3 coins, we are interested in the occurrence of

the following events:
A = {more heads than tail}
= {(111), (011), (101), (110)}
B = {same outcome}
= {(111), (000)}
C = {at least 2 heads}
In throwing a pair of dice, sum of dots that show up to be
even
S = {2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}
D = {sum is even} = {2, 4, 6, 8, 10, 12}

Special events
There are two special events of interest:
1) Universal Set ( or S)
Set containing all elements
The totality of all elementary event i, known a priori,

1 , 2 , , k ,

## Also known as Certain Event

2) Impossible (or null) Event ()
- never occurs or contains no outcome
- arises when none of the outcomes satisfy the given
condition
37

## Federal University of Technology, Minna

38

Definition of Probability
Department of Telecommunications Engineering

## Axiomatic Probability widely accepted definition

Probability based on a set of axioms or rules
Based on the concept of probability space sample space, elements of a sample space and set
theory
Axiomatic probability assigns a number to an event

Axioms of Probability
The theory of probability as a mathematical
discipline can and should be developed from axioms
in exactly the same way as geometry and algebra.
Andrey Kolmogorov
(c) Prof. Okey Ugweje

39

## Federal University of Technology, Minna

40

Axioms of Probability - 1

Axioms of Probability - 2

## From Axiomatic Probability definition, we say that

the probability P(A), of an event A, is a number
assigned to the event satisfying the following axioms:

Note:
(iii) states that if A and B are mutually exclusive (M.E.)
events, the probability of their union is the sum of their
probabilities, i.e.,

## P A 0 (Probability is a nonnegative number)

ii: P 1 (Probability of the whole set is unity)
iii: If A B , then P A B P A P B .
i:

P A B P A P B ,
If A and B cannot occur simultaneously
This is the minimum number of axioms required to
establish the remaining concept of probability.
These axioms allow us to view events as object
with properties.

1

## A A = for all i j , then

i

k 1

k 1

P[ Ak ] P[ Ak ]
(c) Prof. Okey Ugweje

## Federal University of Technology, Minna

41

Axioms of Probability - 3

## The following conclusions follow from these axioms:

a) Since A A , we have using (ii) that

P A A P 1 .

## How does one compute P ( A B ) ?

To compute the above probability, we should re-express A B
in terms of M.E. sets so that we can make use of the
probability axioms.
From Figure, we have
A
AB

## But since A A , and using (iii)

P A A P A P A 1

A B A AB ,
where A and A B are clearly M.E. events.

P A 1 P A .
Federal University of Technology, Minna

42

## These axioms provide us with consistent rules that

any valid probability assignment must satisfy.

## Federal University of Technology, Minna

Axioms of Probability - 4
b) Similarly, for any A, A .
Hence it follows that P A P ( A ) P ( ) .
But A A , and thus P 0 .

or

43

## Federal University of Technology, Minna

A B
44

Axioms of Probability - 5

Axioms of Probability - 6

## Thus using axioms (iii)

P A B P ( A AB ) P ( A ) P ( AB ).

## Additional useful properties (or rules) of probability

theory or direct consequence of the axioms can be
developed as "Corollaries".

B B B ( A A)

## Here are some useful Corollaries:

( B A) ( B A) BA B A

Corollary 1:

P B P ( BA) P ( B A),

P[Ac] = 1 - P[A]
P[S] = P[A Ac] = P[A] + P[Ac] = 1

Thus P AB P ( B ) P ( AB )
since BA AB and B A AB are M.E. events.
Hence,
P A B P ( A) P ( B ) P ( AB ).
and using other relations
(c) Prof. Okey Ugweje

Corollary 2:

45

## (c) Prof. Okey Ugweje

Axioms of Probability - 7

## Federal University of Technology, Minna

46

Axioms of Probability - 8

## Department of Telecommunications Engineering

Corollary 3:

Corollary 5:
P[AB] = P[A] + P[B] - P[A B]

AcB

Corollary 4:

then

## (c) Prof. Okey Ugweje

k 1

A B = A (Ac B)
P[A B] = P[A] + P[Ac B]
P[A] = P[A B] - P[Ac B];
Substituting in will yield
P[A B] = P[A] + P[B] - P[A B]

## Federal University of Technology, Minna

AB

A+B

PL A O P[ A ], n 2
MN PQ
k 1

47

## (c) Prof. Okey Ugweje

B = (A B) (Ac B)
P[B] = P[A B] + P[Ac B]
P[Ac B] = P[B] - P[A B]

## Federal University of Technology, Minna

48

Axioms of Probability - 9

Axioms of Probability - 10

## General expression for probability of union of events

involve
1) adding the probabilities of a single event
2) subtracting the probabilities of the intersection of
double events,
3) adding all the probabilities of the intersection of triple
events
4) Etc.

P[ABC] = P[(AB) C]
= P[AB] + P[C] - P[(AB) C]
= P[A] + P[B] - P[AB] + P[C] - P[AC] - P[BC]
= P[A] + P[B] + P[C] - P[AB] - P[AC] - P[BC]
+ P[ABC]

## As the number of events increases, the probability of union

of events become very cumbersome to compute

49

## Federal University of Technology, Minna

Axioms of Probability - 11

Example 12

## Department of Telecommunications Engineering

Corollary 6:
In general, for n events,

## Determine the probability of obtaining at least one 1 in

2 tosses of a six-sided dice

n
n
P A P[ Ak ] P[ A j Ak ] ... ()n 1 P[ A1 ... An ]
k 1 k k 1
jk

## S {11, 12, ... 16, 21, ... 26, ... 66}

62
36

P[ A j ]
j 1

Corollary 7:
If A B, then P[A] P[B]
B = A (Ac B)
AB
P[B] = P[A] + P[Ac B]
A
B
S
P[A], since P[Ac B] 0
These axioms and corollaries provide us with the rules (or
law) for computing the probability of events
c

50

51

## P[11 12 ] P[11 ] P[12 ] - P[11 12 ]

6 / 36 6 / 36 - 1/ 36
11/ 36

## Federal University of Technology, Minna

52

Example 13

Probability Problems - 1

## Determine the probability of obtaining at least one 1 in

3 tosses of a 6-sided dice
S = {111, 112, , 121, , 211, , 666}
= 63 = 216

## Probability problems are classified into Discrete or

Continuous
Discrete Sample Space:
finite and countably infinite sample spaces
defined on {S, F}
S = {a1, a2, a3, an}; F = all subsets of S
all events are distinct
all events are mutually exclusive

## P[1112 13] = P[11] + P[12] + P[13]

- P[1112] - P[1113] - P[12 13]
+ P[111213]
= 1/6 + 1/6 + 1/6
- 1/36 -1/36 - 1/36 + 1/216
= 91/216
Federal University of Technology, Minna

53

## (c) Prof. Okey Ugweje

Probability Problems - 2
Department of Telecommunications Engineering

k 1

## The probability of discrete sample space is the

probability of the elementary events, and is called
the probability mass function
If the events are equiprobable, then
1
P[a1] P[ a2 ] P[am ]
n
m
P[ B ]
n
Federal University of
55 Technology, Minna

Probability Problems - 3
Department of Telecommunications Engineering

B = {a1, am}

## (c) Prof. Okey Ugweje

Federal University of
54 Technology, Minna

## Continuous Sample Space

Sample Space, S, is uncountable
Sample Space, S, is a domain on a line, plane or volume
and events are points within the domain
In other words, it is defined on a measurable region, R
F is a real valued function defined on a region R (such an F
gives rise to the probability density function)
Event of interest consist of experiments on
An interval of a real line
A 2-D region covered by a regular polygon and the
complements, unions, intersections of these events, e.g.,
y

y
x

## (c) Prof. Okey Ugweje

y
x

Federal University of
56 Technology, Minna

Probability Problems - 4

Example 14

## A voluminous region (3-D)

Areas or Volumes, of the domain A to the Length, Area
or Volume of the entire domain

## Find the probability that sum is 8 in the toss of 2 dice

Find the probability of getting a 5, 7, or 8 in the toss of
2 dice.

L( A)
L(S ) , Length

( A)
, Area
P[ A]
(S )
V ( A)
V (S ) , Volume

Solution
Let S = {all possible occurrence}={36}

## But a better understanding of the continuous sample

space is through the use of probability distribution and
density functions

(a)
(b)

E ={sum is 8}
F ={sum is 5}
T ={sum is 7}
5
4
6
P E ; P F ; P T ;
36
36
36

Federal University of
57 Technology, Minna

4
F

F
F
F
T

T
E

T
E

T
E

6
T
E

T
E

## Federal University of Technology, Minna

58

Example 16
Department of Telecommunications Engineering

## Two dice are thrown

a) What is the probability that both show even numbers?
b) What is the probability that sum is odd?
Solution
Let S = {all possible occurrence}={36}
O ={sum is odd}
(a) P E 9 distinquishable
36
6
PE
indistinquishable
21

1 2 3 4 5
1
O
O
2 O E O E O
3
O
O
4 O E O E O
5
O
O
6 O E O E O

6
O
E
O
E
O
E

18
P O
36

## (c) Prof. Okey Ugweje

4 6 5 15

36 36 36 36

Example 15
Department of Telecommunications Engineering

(b)

1
2
3
4
5
6

P F , T or E P F T E P F P T P E

59

## A fair coin is tossed 3 times. What is the probability of the

following:
A = {1st toss is head}
B = {2nd toss is head}
C = {exactly 2 heads are tossed in a row}
Solution
Let 1 = Head; 0 = Tail
4
4
2
P A ; P B ; P C .
8

2
2
P A B ; P B C .
8
8
1
P A B C ;
8
(c) Prof. Okey Ugweje

1
2
3
4
5
6
7
8

X
0
0
0
0
1
1
1
1

## Federal University of Technology, Minna

Y
0
0
1
1
0
0
1
1

Z
0
1
0
1
0
1
0
1

OUTCOMES
B
B
A
A
A
A

B
B

C
C

60

Conditional Probability - 1
Department of Telecommunications Engineering

## In many cases, we have only partial knowledge of outcome of

events
Conditional probability is the situation whereby probability
of one event is influenced by that of another event
We denote this conditional probability by
P[A|B] = Probability of event A given that B has occurred.

Conditional Probability
Theory

We define

## The most important questions of life are, for

the most part, really only problems of
probability?

61

## (c) Prof. Okey Ugweje

Conditional Probability - 2
P (( A C ) B ) P ( AB CB )

.
P(B)
P( B)
But AB BC , hence P ( AB CB ) P ( AB ) P ( CB ).

P[A|C]

P[C]

P ( AB ) P (CB )

P ( A | B ) P (C | B ),
P( B)
P( B)
satisfying all probability axioms.
P( A C | B)

P[D]

Properties:

BC

P[A|D]

P[B|D]

BD

P AB
P B

1,
P B
P B

P[B|C]

## since if B A then occurrence of B implies automatic

occurrence of the event A. As an example, but
A {outcome is even}, B={outcome is 2},

P[A/B] is small

AC

1. If B A, AB = B, then P A|B

## Thus the definition of conditional probability is a legitimate

probability measure
P(A) is sometimes called the a priori probability
P(A|B) is sometimes called the a posteriori probability

62

## The idea of conditional probability can often be drawn

out in the form of a tree diagram (probability tree)

P( A C | B)

## (iii) Suppose A C = , then

provided P(B) 0.

Conditional Probability - 3

P[A|B] is large

P[ AB ]
,
P[ B ]

## Note: Above definition satisfies all probability axioms discussed

earlier. That is,
P AB 0
P[ A | B ]
0,
(i)
P B 0
P[ B ] P[ B ]
P[ | B ]

1,
since B = B.
(ii)
P[ B ]
P[ B ]

## - Laplace Pierre Simon

(c) Prof. Okey Ugweje

P[ A | B ]

63

## Federal University of Technology, Minna

64

Conditional Probability - 4

Conditional Probability - 5

## Department of Telecommunications Engineering

2. If A B, AB = A, and
P AB
P A
P A|B

P A ,
P B
P B

## But AiAj = BAiBAj = , so that we have

n
n
P( B ) P( BAi ) P( B|Ai ) P( Ai )
i 1

## (In a dice experiment, A {outcome is 2}, B ={outcome is even},

so that A B. The statement that B has occurred (outcome is
even) makes the odds for outcome is 2 greater than without
that information).
3. We can use conditional probability to express the probability of
a complicated event in terms of simpler related events
Let A1, A2, An be pair wise disjoint and their union is . Thus
n
and AiAj = and

Ai .

i1

## Thus B B ( A1 A2 An ) BA1 BA2 BAn .

(c) Prof. Okey Ugweje

65

i 1

## For 3 events, the conditional probability equation can also be

written as follows
P( A B C ) P( B C )
P( A B C )
P( C )
P( B C ) P( C )
P A|( B C ) P( B | C )P( C )

## If in an experiment the events A and B can both occur, then

P[A B] = P[A] P[B|A]
Since events A B and B A are equivalent, it follows that
P[A B] = P[B A] = P[B] P[A|B]
(c) Prof. Okey Ugweje

Example 17

## Let A and B be events with P[A] = 1/2, P[B] = 1/3 and

P[AB] = 1/4. Find
a) P[A|B],
b) P[B|A],
c) P[AB],
d) P[Ac|Bc],
e) P[Bc|Ac]

Solution
1
3
P[ A B ]
4
a) Find P[A|B] P[ A | B]
P[ B ]

c) Find P[AB]

67

P[ A B ] P[ A] P[ B ] P[ A B ]

## (c) Prof. Okey Ugweje

1
P[ B A]
1
4
1
2
P[ A]
2

b) Find P[B|A] P[ B | A]

d) Find P[Ac|Bc]

66

Example 17

## Federal University of Technology, Minna

1 1 1 7

2 3 4 12

P[ Ac | B c ]

P[ Ac B c ]
P[ B c ]

Federal University of
68 Technology, Minna

Example 17
Department of Telecommunications Engineering

But

P[ B c ] 1 P[ B ] 1

Example 18
Department of Telecommunications Engineering

1 2

3 3

A B c Ac Bc P Ac Bc P A B c
7
5
c
P A B 1 P A B 1

12 12

## A test for cancer is 90% effective. That is, 90% of

those with the disease react positively. Also, 5% of
those without disease react positively. If 1% of the
patients have cancer, what is the probability that a
patient who reacts positively has cancer?

e) Find P[Bc|Ac]
P[ B c | Ac ]

## (c) Prof. Okey Ugweje

5
5
P[ B c Ac ]
12
c
1
6
P[ A ]
2

Federal University of
69 Technology, Minna

## (c) Prof. Okey Ugweje

Federal University of
70 Technology, Minna

Example 18
Department of Telecommunications Engineering

Example 18
Department of Telecommunications Engineering

Let

C+ = {has Cancer};
C- = {no Cancer};
R = {positive reaction}
Therefore,
P[R|C+] = 0.9; P[R|C-]= 0.05;
P[C+] = 0.01; P[C-] = 0.99

P R | C

P R | C
(c) Prof. Okey Ugweje

P S

Federal University of
71 Technology, Minna

P S

P C | R) P R

P R | C P C P R | C P C
(0.9)(0.01)
(0.9)(0.01) (0.05)(0.99)

0.154

S R C R C

P C | R) P R

P C | R) P R

P R | C
(c) Prof. Okey Ugweje

P C | R) P R
P S

Federal University of
72 Technology, Minna

Example 18
Department of Telecommunications Engineering

P R | C

## Department of Telecommunications Engineering

P R | C ) P R
P S

P R | C ) P R

Independence

P R | C P C P R | C P C
(0.9)(0.01)
(0.9)(0.01) (0.05)(0.99)

## If there is a 5050 chance that something can go

wrong, then 9 times out of 10 it will.

0.154

## (c) Prof. Okey Ugweje

Paul Harvey

Federal University of
73 Technology, Minna

Independence - 1

Independence - 2

## If the occurrence of an event B does not alter the

occurrence of event A, then A and B are said to be
independent
Definition: A and B are said to be independent if

## Suppose A and B are independent, then

P [ AB ] P [ A ] P [ B ]
It is easy to show that if A, B are independent, then

AB ; A , B ; A , B
are all independent pairs.

## If A and B are independent, so are A and Bc .

(A, B, independent A, Bc independent)
(c) Prof. Okey Ugweje

74

75

P[ A | B ]

P[ A B ]
P[ A ]P[ B ]

P[ A]
P[ B ]
P[ B ]

## Thus if A and B are independent, the event that B has

occurred does not shed any more light into the event A.
It makes no difference to A whether B has occurred or not
Three events A, B and C are said to be independent iff
P[A B C] = P[A] P[B] P[C], and
P[A B]
= P[A] P[B], and
= P[A] P[C], and
P[A C]
= P[B] P[C]
P[B C]
All the pairwise intersection must be checked
(c) Prof. Okey Ugweje

76

Independence - 3

Example 19

## In an experiment, one card is selected from an ordinary

deck of cards. Define event A as select a king, B as
select a jack or queen, and C as select a heart. Is A, B
and C independent?

P[ A B ] P[ A | B ]P[ B ]
P[ A B ] P[ B | A]P[ A ]
P[ B A]
P[ A B ]

,
P[ A]
P[ A]
P [B |A [
P [A |B ]=
P [A ] B a y e s ' T h e o r e m
P [B ]

P[ B | A]

## Drawing cards from a deck of 52 card

suit

Diamond

suit

suit

Heart

suit

10

11

12

13

Club

Jack
King
Queen

Ace
(c) Prof. Okey Ugweje

77

## (c) Prof. Okey Ugweje

Federal University of
78 Technology, Minna

Example

Example

## Department of Telecommunications Engineering

For each suit the sample space consist of ace, two, ...,
ten, jack, queen, king and it is indicated as {1, 2, ..., 13}
Let A = {king is drawn}, B = {club is drawn}
Describe the events
a) A B = {either king or club (or both i.e., king of
clubs)}
b) A B = {both king and club (king of clubs)}
c) Since B = {clubs}, Bc, = {not club} = {hearts, diamond,

## d) Ac Bc = {not king or not club}

e) A-B = {king but not club }. This is the same as
(A Bc) = {king and not club}
f) Ac-Bc ={not king or not club} = {not king and club} =
{any club except king}
g) (A B) (A Bc) = {king and club} or {king and
not club} = {king}
This can be seen by expanding the
(AB) (A Bc) = A

## Hence A Bc = {king or hearts or diamond or spade}

(c) Prof. Okey Ugweje

Federal University of
79 Technology, Minna

Page 79

## (c) Prof. Okey Ugweje

Federal University of
80 Technology, Minna

Page 80

Example

Example

Solution:

P[ A]

## Department of Telecommunications Engineering

Also

4
8
13
; P[ B ] ; P[C ] ;
52
52
52

Jack or Queen

P[ A B ] 0

## A and C are independent as a Pair

B and C are independent as a Pair
But A and B are NOT independent

Therefore

1
2
P[ A C ] ; P[ B C ] ;
52
52

## This implies that

P[ A B] 0 P[ A] P[ B]

1
1
P[ A] P[C ] ;
52
52
2
2
P[ B C ]
P[ B] P[C ] ;
52
52
P[ A C ]

32
;
52 52

## Thus, A, B and C are NOT independent

Federal University of
81 Technology, Minna

## (c) Prof. Okey Ugweje

Federal University of
82 Technology, Minna

## What you should learn in this Lecture

Department of Telecommunications Engineering

## Counting Techniques & Markov

Chains

Partition Law
Bayes Rule
Laws of Total Probability
Introduction to Markov Chains
Counting Techniques
Sampling of Different Kinds
1. Sampling with replacement and with ordering
2. Sampling without replacement and with ordering
3. Sampling without replacement and without ordering
4. Sampling with replacement and without ordering

## The 50-50-90 rule: Anytime you have a

50-50 chance of getting something right,
there's a 90% probability you'll get it wrong.
Andy Rooney

## Binomial Coefficient and Theorem

(c) Prof. Okey Ugweje

83

## Federal University of Technology, Minna

84

Partition - 1
Department of Telecommunications Engineering

## If a region is divided into non-overlapping (mutually

exclusive) parts, the parts are said to partition the
region
A partition of a set B, is a set {B1, B2, ... ,Bn} having the
following properties:
i) Bj B,
j = 1,2, , n
k, j = 1, 2, , n, k j
ii) Bj Bk = ,
iii) B = B1 B2 ... Bn
A partition of a set B is a set of subsets of B [property
i] that are disjoint [property ii] and mutually exhaustive
[property iii]

Partition
(Law of Total Probability)

## The true logic of this world is the calculus of

probabilities.
James Clerk Maxwell
Federal University of Technology, Minna

85

Partition - 2

86

Partition - 3

## Every element of B is a member of one and only one

of the subsets in the partition
In the diagram below, the set {A Bi} partitions A and
from property (ii)

## The expression above says that the total probability

of an event can be obtained by summing the set of
mutually exclusive and exhaustive ways of the
event occurring.
But since
P[A B] = P[A|B]P[B]

...

B3

B1

Bn-1

...

B2

Bn

i.e.,
A =A S
= A (B1 B2 ... Bn )
= (A B1) (A B2) ... (A Bn )

k 1

P[A B] = P[B|A]P[A],
we may write probability as follows

P A P A B1 P A B 2 P A B n
P[

or equivalently

A B k ]

## P[ A] P[ A | B1]P[B1] P[ A | B2 ]P[ B2 ] P[ A | Bn ]P[ Bn ]

P A Bk
n

P[ A| Bk ]P[ Bk ]

k 1

k 1

87

88

Partition - 4

Example 20

Hence

## Department of Telecommunications Engineering

P[ A] P[ A| Bk ]P[ Bk ]
k 1

## If the events B1, B2, , Bn constitute a partition of the sample

space S such that P[Bk] 0, k=1,2, , n, then for any A of S,
n

k 1

k 1

P[ A] P A Bk P A| Bk P Bk

## The probability of one of the events in the partition of

B is given by

P[ A] P[ A| Bk ]P[ Bk ]
(c) Prof. Okey Ugweje

89

## There are 30% Freshmen, 25% Sophomores, 25% Juniors

and 20% Seniors in the IEEE student organization. 50%,
30%, 10%, and 2% of IEEE members are Freshmen,
Sophomores, Juniors and Seniors respectively are
enrolled in Random Signals. If a member of IEEE is
selected at random, what is the probability that the
member is enrolled in Random Signals?
Let E = selected member is enrolled in Random Signals
E1 = selected member is a freshman
E2 = selected member is a sophomore
E3 = selected member is a junior
E4 = selected member is a senior
(c) Prof. Okey Ugweje

Federal University of
90 Technology, Minna

Example 20
Department of Telecommunications Engineering

## There are 4 partitions as shown bellow

P E P E | E1 P E1 P E | E2 P E2
P E | E3 P E3 P E | E 4 P E 4
0.50 0.3 0.30 0.25
0.10 0.25 0.02 0.20
0.254

Bayes Rule

## Everything should be made as simple as possible,

but not one bit simpler.
- Albert Einstein

92

Bayes Rule - 1

Bayes Rule - 2

## Department of Telecommunications Engineering

Bayes Rule:
If the events B1, B2, , Bn constitute a partition of the sample
space S such that P[Bk] 0, k=1,2, , n, then for any event A in
S such that P[A] 0,
P Bk | A

P A Bk
P A| Bk P Bk
n
P[ A]
P[ A| Bk ]P[ Bk ]
k 1

## Now, apply conditional probability theory to both

numerator and denominator
P A | Bk P Bk
P Bk | A
n
P[ A | Bk ]P[ Bk ]
k 1

Proof:
By definition of conditional probability
P A Bk
P Bk | A
PA
and then using partition law or total probability law for the
denominator, we obtain

P Bk | A

P A Bk

k 1

P[ A Bk ]

93

94

## CME621 Stochastic Processes

Department of Telecommunications Engineering

## Department of Telecommunications Engineering

Markov Chains
Our brains are just not wired to do probability
problems very well.
Persi Diaconis
(c) Prof. Okey Ugweje

95

## Markov chains deal with the sequence of dependent

experiments
The outcome of a given experiment determines which
experiment is performed next
Consider a sequence of experiments X1, X2, , Xn
We interpret Xn as being the state of the experiment at time n,
and we can say that the system is in state x at time n if Xn = xn
Hence we seek the conditional probability
P P X n1 xn 1 | X n xn , X n 1 xn 1,, X 1 x1, X 0 x0 ,
If the structure of the process {Xn, n = 0, 1, 2, ...} is such that
the conditional probability distribution of Xn+1 depends on the
value of Xn and is independent of all previous values, we say
that the process is a Markov Chain
Hence Pij P Xn 1 j| Xn i , i, j 0,1, 2,
(c) Prof. Okey Ugweje

96

## The sequence of random experiments is said to form a Markov

Chain if each time the system is in state k there is some fixed
probability, say Pij, that it will next move to state k
Since pij are conditional probabilities, they satisfy probability
requirements
P 0,

ij

P01

P10

PM 0

P11 P1M

PM1 PMM

## Knowledge of transition probabilities and the distribution of

X0 enables us to compute all probabilities of interest.
For instance, the joint probability of X0, X1, , Xn is

Pij i = 0,1, 2,

j 0

The values

## Pjk P X n jn , X n 1 jn 1,, X 1 j1, X 0 j0 ,

Pij P Xn 1 j| Xn i , i, j 0,1, 2,

P X n jn | X n 1 jn 1,, X 1 j1, X 0 j0 ,
Pjn i jn P X n 1 jn 1,, X 1 j1, X 0 j0 ,

## are called transitional probabilities

It is convenient to arrange the transition probabilities in matrix
form giving rise to the transition matrix
(c) Prof. Okey Ugweje

P0 M

P00

97

Example 28

98

Example 28

## A sequential experiment involves repeatedly drawing a

ball from one of two Boxes, noting the number on the
ball, and replacing the ball in its Box. Box 0 contains
a ball with the number 1 and two balls with the number
0, and Box 1 contains five balls with the number 1 and
one ball with the number 0. The Box from which the
first draw is made is selected at random by flipping a
fair coin. Box 0 is used if the outcome is heads and
Box 1 if the outcome is tails. Thereafter the box used
in a sub experiment corresponds to the number on the
ball selected in the previous sub experiment.

Solution
The sample space of this experiment consists of sequences
of 0s and 1s.
Each possible sequence corresponds to a path through the
"trellis" diagram shown. The nodes in the diagram denote
the box used in the nth sub experiment, and the labels in the
branches denote the outcome of a sub experiment. Thus the
path 0011 corresponds to the sequence:

## Federal University of Technology, Minna

99

The coin toss was heads so the first draw was from box 0;
the outcome of the first draw was 0, so the second draw was
from box 0; the outcome of the second draw was 1, so the
third draw was from box 1; and the outcome from the third
draw was 1, so the fourth draw is from box 1.
Federal University of Technology, Minna

100

Example 28
Department of Telecommunications Engineering

## Department of Telecommunications Engineering

Find P[0011] ?

Counting Techniques
But to us, probability is the very guide of life.
Bishop J. Butler

P 0011 P 1 |1 P 1 | 0 P 0 | 0 P 0
5 / 6 1 / 3 2 / 3 1 / 2
(c) Prof. Okey Ugweje

101

## (c) Prof. Okey Ugweje

Counting Techniques - 1

## Federal University of Technology, Minna

102

Counting Techniques - 2

## Since the probability of an event is the outcome of that

event divided by total number of outcomes, the
calculation of probability sometimes reduces to
counting the number of outcome of an event.
Hence, a technique to count the number of the events
and the number in the sample space for large
experiments is necessary.
Suppose there are n objects in all and we are going to
make k selections, the question is:

the selection

## Are objects similar or not (distinguishable?)

Can objects be chosen more than once and if so, can we
choose with or without replacement?
Are we concerned with ordering?

techniques
We will phrase this random selection (sampling)
process in terms of how:
a) balls can be allocated or drawn from a container
b) cards can be drawn from a deck of cards

103

## Federal University of Technology, Minna

104

Counting Techniques - 3

Counting Techniques - 4

## 1. Sampling with replacement and with ordering

2. Sampling without replacement and with ordering
3. Sampling without replacement and without ordering
4. Sampling with replacement and without ordering

## Make k selections from a set A containing n distinct

objects
Let Nk(S) = total number of distinct elements in S = nk
Each of the k selections from the n objects are
independent (i.e., n possible outcomes for each k)

elements

## Since ordering is important, experiment produces an

ordered k-tuple ( xk ,xk , ,xk ) where xi A
Hence the probability is Pk

105

## (c) Prof. Okey Ugweje

Counting Techniques - 5

1
nk

## Federal University of Technology, Minna

106

Counting Techniques - 6

## 2. Sampling without Replacement and with Ordering

Since no object is chosen more than once, the choice cannot be
This type of sampling is popularly known as PERMUTATION
Permutation: the arrangement of a set of elements into a particular
order, e.g.,
{123} {123, 132, 213, 231, 321, 312}
For large set, it may not be possible to enumerate the ordered
set
Suppose there are
n1 independent ways of doing 1st operation
n2 independent ways of doing 2nd operation

## Then the total number of ways (distinct ordered k-tuple) of

performing this operation is, N(S) = n1 n2 nk
N(S) can also be interpreted as follows (for k sets of elements):
the 1st set contain n1 elements,
the 2nd set contain n2 elements,
.
the k-th set contain nk elements
If we arrange the elements such that each arrangement contains
only one element from each set, then an arrangement of this
nature will be obtained

107

a11, a12 , ,

a1n1

## a21, a22 , , a1n2

a k1, a k 2 , , a knk
Federal University of Technology, Minna

108

Counting Techniques - 7

Counting Techniques - 8

## Total number of arrangements = N

In general, the number of n distinct elements taking n elements at
a time is called permutation and is denoted by
P(n, n) = n1 n2 nk
This is equivalent to choosing n different elements to fill n
different positions,
n1
=n
choices for the 1st position
= n-1 choices for the 2nd position
n2

nk
Hence

## In permutation, we count the selection of ball i followed by ball j as

being different from the selection of ball j followed by ball i; i.e.,
({i,j} {j,i})
Often we are interested in a limited number of the total elements;
i.e., permutation of n objects taking k elements at a time

P( n,n ) n( n 1 )( n 2 )( n n 1 ) n !

P( n, k )

n!
( n k )!

## Also written as: n Pk P( n, k )

The number of permutation of n distinct objects arranged in a
circle is (n - 1)!

## In the permutation each distinct elements appear only once in

each arrangement
(c) Prof. Okey Ugweje

P( n, k ) n ( n 1 )( n 2 )( n k 1 )

st
Total elements in the last experiment
Total elements in the 1
experiment
[ n( n 1 )( n 2 )( n k 1 )]( n k )!

( n k )!

109

## (c) Prof. Okey Ugweje

Counting Techniques - 9

## Federal University of Technology, Minna

110

Counting Techniques - 10

## Suppose that a club consist of 25 members and that a

President and Secretary are to be chosen from the
membership. How many ways can the positions be filled?

n
n! ~ n
2n or n! ~ 2 nn 1/ 2en
e
n!
1
lim
n
n
n
2n
e
0! 1

ej

P 25, 2

ej

25!
25 24 600
25 2 !

## So far we had assumed distinct elements.

When the elements in a set are not distinct, the number of
permutations is affected
In this case, the number of permutations of n elements
taking n at a time, when k1 are of one kind, k2 is of another
kind, km is another kind of counting called Multinomial
Coefficient
111

## Federal University of Technology, Minna

112

Counting Techniques - 11

Counting Techniques - 12
Department of...Telecommunications Engineering

## Department of Telecommunications Engineering

Multinomial Coefficient
Suppose n distinct elements are divided into k
different groups (k 2), for j = 1, , k, the j-th group
contains exactly nj elements where n1 + n2 ++ nk = n
We want to determine the number of ways in which
the n elements can be divided into k groups, i.e,
How many ways can k distinguishable balls be
distributed into n different boxes so that there are ni
balls in box i?
n
n1
n1
n n1
n2

n2

nk 1 nk
n n1 n2
n3

nk 1
n3
nk 1

Hence

n , n ,..., nk
P 1 2

n!
n1 !n2 !...nk !
n

n
n
n
,
,...,
k
1 2

## This is the arrangement of elements of more than two or

more distinct types

113

## (c) Prof. Okey Ugweje

Counting Techniques - 13

Definition: For any number x1, x2, , xk and any positive integer
n,
m
k k
n!
k
x1 x k n
x11 x22 xmm
i k1 ! k2 !km !

F n I F n I F n n I F n n n I F n I n!
GH k , k , , k JK GH k JK GH k JK GH k JK GH k JK k !, k !, k !
1

## This is equivalent to partitioning the n distinct set into m subsets

B1, B2, Bm, such that Bm is assigned km elements satisfying the
condition k1 + k2 + + km = n
That is if the same elements appear more than once in the same
permutation, then interchange of the elements will not produce a
different permutation
For 2, 3, , like elements, divide total number of permutation
by 2!, 3!, .
The multinomial coefficient appears in multinomial theorem which
can be stated as follows:

## The number of distinct permutation of n things of which n1

are of one kind, n2 of a second kind, , nk of the k-th kind is

n!
n1 !, n2 !, , nk !

## The number of ways of partitioning a set of n objects into r

cells with n1 elements in the first cell, n2 in the second cell,
, nk elements in the k-th cell is

F n I
GH n , n , , n JK
1

114

## Federal University of Technology, Minna

Counting Techniques - 14

## Department of Telecommunications Engineering

nk
nk
nk
n n n1 n n1 n2 nk 1 nk
N ( s)

n1 n2 n3 nk 1

115

## (c) Prof. Okey Ugweje

n!
n1 !, n2 !, , nr !

## Federal University of Technology, Minna

116

Counting Techniques - 15

Counting Techniques - 16

## 3. Sampling without Replacement & without Ordering

Same as sampling without replacement and with ordering,
except that the actual order of events is not important
Selection of ball i followed by ball j is same as selection of ball j
followed by ball i ({i,j} ={j,i})
Choosing k objects out of n objects, order not important, without
replacement, amounts to dividing n objects into two categories those that are selected and those that are not selected
To obtain the combinations, we basically divide P(n, n) by the
number of possible arrangements of k objects
This technique is commonly known as combination, which is
defined as follows:

## Suppose we choose k objects from n distinct objects

Each time we choose an object, we record that the object
was selected and then replace it
We want to determine how many times an object has been
selected
xxxx
URN

xx
1

xxxxxx
2

...

x
n

N ( S)

H k K H n1 K
(n 1)! k !

## Experiment involve how many ways to put stars and bars in

order

FG IJ
HK

n
P(n, k )
n!

Ck C(n, k )
n
k
k!
k !(n k )!
(c) Prof. Okey Ugweje

117

Examples

## Federal University of Technology, Minna

118

Repeated Trials - 1

## Given n experiments 1, 2, , n, and their associated Fi and

Pi, i = 1 n, let
1 2* n
represent their Cartesian product whose elementary events are
the ordered n-tuples 1, 2, , n, where i i.
Events in this combined space are of the form
A1 A2 An
where Ai Fi. and their unions an intersections.
If all these n experiments are independent, and Pi(Ai) is the
probability of the event Ai in Fi then as before
P ( A1 A2 An ) P1 ( A1 ) P2 ( A2 ) P ( An ).

**

## We will discuss techniques to analyze such problems with an

example.
(c) Prof. Okey Ugweje

119

## Federal University of Technology, Minna

120

Bernoulli Trial - 2

Bernoulli Trial - 3

## Bernoulli trial: consists of repeated independent and identical

experiments each of which has only two outcomes A or Ac with
and P(A) = p and P(Ac) = 1-p = q
The probability of exactly k occurrences of A in n such trials is
given by (***).
Let

Suppose for a given n & p we want to find the most likely value of k?
From Fig. below, the most probable value of k is that number which
maximizes Pn(k).

## (c) Prof. Okey Ugweje

k 0

k (1 p ) ( n k 1 ) p

k ( n 1) p .

## Thus Pn(k) as a function of k increases until

k ( n 1) p

n
P ( X k ) p k q n k .
k 0 k
n

p 1 / 2.

## To obtain this value, consider the ratio

Pn ( k ) Pn ( k 1 ),

## But Xi, Xj are mutually exclusive. Thus

P(X 0 X1 X n)

n 12,

( n k )! k !
Pn ( k 1)
n! p k 1 q n k 1
k
q
.

( n k 1)! ( k 1)! n! p k q n k
Pn ( k )
n k 1 p
Thus
if
or

## Since the number of occurrences of A in n trials must be an

integer k = 0, 1, 2, , n, either X0 or X1 or X2 or or Xn
must occur in such an experiment. Thus
P ( X 0 X 1 X n ) 1.
n

Pn (k )

121

## if it is an integer, or the largest integer kmax less than (n+1)p.

The equation **** represents the most likely number of successes (or heads)
in n trials.
(c) Prof. Okey Ugweje

Bernoullis Theorem - 1

## Federal University of Technology, Minna

122

Bernoullis Theorem - 2

## Let A denote an event whose probability of occurrence in a single trial

is p. If k denotes the # of occurrences of A in n independent trials,
then
k

pq

## Proceeding in a similar manner, it can be shown that

n
n
n
n!
n!
p k q n k
p k q n k
k 2 Pn ( k ) k

(
)!
(
1
)!
(
n
k
)!
(
k
2
)!
n
k
k
k 1
k 2
k 0

p
P
n

n!
p k q n k n 2 p 2 npq .
k 1 ( n k )! ( k 1)!
n

## Equation above states that the frequency definition of probability of

an event k/n and its axiomatic definition ( p) can be made compatible
to any degree of accuracy.
Proof:
To prove Bernoullis theorem, we need two identities. Note that
with Pn(k) direct computation gives
n 1
n
n
n!
n!
k n k

(
)
p k q n k
p
q
k
P
k
k

n
( n k )! k!
k 1
k 1 ( n k )! ( k 1)!
k 0
n 1
n 1
( n 1)!
n!

p i q n 1i
p i 1q n i 1 np
(
1
)!
!
(

)!
!

n
i
i
n
i
i
i 0
i 0

Note that

k
p
n

n

( k np )
k 0

## Federal University of Technology, Minna

Pn ( k )

n
k 0

2 Pn ( k ) n 2 2 .

#*

## We can rewrite the left side of #* as follows

n

( k np )
k 0

np ( p q ) n 1 np .
(c) Prof. Okey Ugweje

is equivalent to ( k np ) 2 n 2 2 ,

Pn ( k )

k 0

Pn ( k ) 2 np k Pn ( k ) n 2 p 2
k 0

2

123

## Federal University of Technology, Minna

124

Bernoullis Theorem - 3

Bernoullis Theorem - 4

## Thus the theorem states that the probability of event A

from the axiomatic framework can be computed from
the relative frequency definition quite accurately,
provided the number of experiments are large enough.
Since kmax is the most likely value of k in n trials, from
the above discussion, as n , the plots of Pn(k) tends
to concentrate more and more around kmax.

n

( k np )
k 0

Pn ( k )

( k np )

k np n

( k np )

k np n

Pn ( k )

( k np )

k np n

Pn ( k ) n 2 2

n 2 2 P k np n .

k np n

Pn ( k )

Pn ( k )

k

p
P

pq

.
n 2

## Note that for a given 0, pq / n can be made arbitrarily small

by letting n become large.
Thus for very large n,k we can make the fractional occurrence
(relative frequency) n of the event A as close to the actual
probability p of the event A in a single trial.
2

## Federal University of Technology, Minna

125

Note
That the expression
n
( x y )n
k 0

n k n k
k x y

## Is known as Binomial Coefficient (Binomial Theorem)

(c) Prof. Okey Ugweje

126

## Some Useful Binomial Identities

Department of Telecommunications Engineering

Symmetry

Pascals Triangle

FG nIJ FG n IJ
H kK H n kK
Factorial

FG nIJ n FG n1IJ
H k K k H k 1K

## FG nIJ FG n1IJ FG n1IJ

H k K H k K H k 1K

Product

FG IJ FG IJ FG IJ FG IJ
H k K H j K H jK H k j K
Computational Methods
FG nIJ n j 1
H kK
j
FG nIJ FG nIJ FG n j k 1IJ
H kK
H kK
H j K
n

n j

j 1

n 1

n 1

j k 1

j k 1

## (c) Prof. Okey Ugweje

FG nIJ FG nIJ 1
H 0K H nK
FG nIJ FG n IJ
H r K H n rK
FG nIJ FG n IJ FG n1IJ , 1 r n
H r K H n rK H r K
FG 0IJ
H 0K
FG1IJ
FG1IJ
H 0K
H1K
FG 2IJ
FG 2IJ
FG 2IJ
H 0K
H1 K
H 2K
FG 3IJ
FG 3IJ
FG 3IJ
FG 3IJ
H 0K
H1K
H 2K
H 3K

Random Variable
The degree of understanding a phenomenon is
inversely proportional to the number of variables
used for its description
- Unknown Physicist

127

128

## Random Variables (RVs) are functions defined on the

Sample Space (S or ) of a probability space
Consider the experiment of flipping a coin twice!
Outcome of the experiment is S = {HH, HT, TH, TT}
From the sample space, we can identify 16 events as
follows:
{HH}, {HT}, {TH}, {TT}
{HH, HT}, {HH, TH} {HH, TT}, {HT, TH}, {HT, TT},
{TH, TT}
{HH, HT, TH}, {HH, HT, TT} {HH, TH, TT}, {HT, TH,
TT}
{HH, HT, TH, TT} and {}

## We would like to perform several analysis on these

events and their probabilities
However, working with symbols such as H Head
and T Tail is not conducive
Thus, we can associate real numbers to these
events

## Federal University 129

of Technology, Minna

## Definition of Random Variable - 3

Department of Telecommunications Engineering

TH

HT

TT
0

## Federal University 130

of Technology, Minna

## Definition of Random Variable - 4

Department of Telecommunications Engineering

## These quantities of interest (real value functions

defined on the sample space) are known as random
variables
When these random outcomes are mapped (or
transformed) into numerical values, (real numbers) a
random variable is obtained
HH

## Often, we are interested in the outcome such as sum

of two dice but not in the separate values on the dice
E.g., we may want to know that sum is 7 but we are
not interested in the actual outcomes such as (1,6),
(2,5), (3,4)

A mapping of
S = {HH, HT, TH, TT}
into the real line
R
1
x

## Federal University 131

of Technology, Minna

set A S maps to I R1

si
sj

s21

X(t;s i )

s4
s1

Set A

s10

sk

s5
s15

Interval I

P X I = P[ A]
= s1, s2,, s is the set of outcomes
k
s50

## Random Variables (RVs) map the outcome of a

random experiment to points on the real line, R

## Federal University 132

of Technology, Minna

## Department of Telecommunications Engineering

Definition:
Suppose that (S, F, P) is a probability space in
which S is not necessarily countable. A Random
Variable, X, defined on this space is a function from
S into the real line such that the set {|X() x} F
for every real x
A Random Variable, X, defined on the probability
space is a function that assigns real value number
X() to every random outcome S
Translated, a Random Variable is a real value
function that associate a real number with each
element in the sample space

Note:
The function that assigns value to each outcome is
fixed and deterministic, e.g., number of heads in
three tosses of coin
However, the outcome of the experiment is not
known
No matter how careful a process is run, an
experiment is performed, or a measurement is
taken, there will be variability when the action is
repeated
If the outcome is already a numerical value, then we
can make the assignment X() =

## Federal University 133

of Technology, Minna

Example 20

## Federal University 134

of Technology, Minna

## Examples of random variables are:

population of a city or country
time of failure of a machine
stress level in a structure
current or voltage level in electric circuit
gas pressure in a pipeline, etc.

135

## Federal University 136

of Technology, Minna

Example 21

Example 22

## A) Toss a coin 3 times; define X = number of heads

S = {HHH, HHT, HTH, THH, HTT, THT, TTH, TTT}
= {HHH, HHT, HTH, THH, HTT, THT, TTH, TTT}

## Toss a coin 10 times. Let X = number of heads

SX = {0, 1, 2, , 10} range of X
= (H, T, T, T, H, H, H, T, H, T)
( one possible outcome)
N(S) = 210 = 1024 and X() = 5
Y
Z
= X2
Z() = 25
G
= sin X
G() = sin 5

X() =

## Thus, X has a range SX = {0, 1, 2, 3}

B) Throw a pair of dice. Let Z = Sum, M = product
= (1, 6) one possible outcome
Thus, Z() = 7 and M() = 6
(c) Prof. Okey Ugweje

## Federal University 137

of Technology, Minna

## Federal University 138

of Technology, Minna

## Cumulative Distribution Function - 1

Department of Telecommunications Engineering

## The cumulative distribution function (CDF), or simply

distribution function, of a random variable X is defined
as

## Cumulative Distribution Function

Continuous
P[ X x],
FX ( x)
P X xk u ( x xk ), Discrete
k

## If there is a 5050 chance that something can go

wrong, then 9 times out of 10 it will.
Paul Harvey
(c) Prof. Okey Ugweje

## Federal University 139

of Technology, Minna

140

## Discrete Probability Distribution

A RV is discrete if its set of possible outcome is
countable
A discrete RV assumes each of its values with a
certain probability
In discrete probability, the statement the
probability that the random variable X is equal to x
written as P[X = x], is given a numerical value by
the probability function P
P[X = x] is the P value assigned to the event
{|X() = x}

141

FX ( x) P[ X x]

## CDF of continuous RV takes on a value in the set (-, x]. The

event of interest is semi-infinite interval on the real line, R
CDF is a probability and satisfies all the axioms and
corollaries of probability!
FX(x)

FX(x)

1
1/2
1/6

Continuous x

pf

Discrete S x 0, 1, 2

## Both continuous and discrete RVs shown above have similar

shapes in that they start from zero and build up to 1, from left to
right, always increasing

## Federal University 142

of Technology, Minna

Properties of CDF - 1

## 3. Mixed Probability Distribution

A RV is mixed if its set of possible outcome is
partly countable and partly uncountable
A mixed RV assumes some of its values with a
certain probability and some other values with
uncertain probability

x

x

## Since all real numbers are > -, then {X - } is empty

4) FX ( x ) is a non-decreasing fucntion FX ( a ) FX (b ), if a < b
FX(x)

## If a < b, then FX(a) FX(b)

1
3/4
1/4
1

5) FX ( x ) is continuous from the right, i.e., for any b, and for h > 0

FX (b) = lim FX (b h) FX (b )
h0

143

## Federal University 144

of Technology, Minna

## Summary Properties of CDF

Properties of CDF - 2
Department of Telecommunications Engineering

the CDF

7) P[ X b]

## R|F (b) F (b-)

S|0,
T
X

2) FX () 1

3) FX ( ) 0

if FX (x) is continuous at b

## 4) FX ( x) is a non-decreasing fucntion F (a ) F (b), if a < b

X
X
5) FX ( x) is a continuous from the right,

## P[a X b] P[a X b] P[a X b] P[a X b]

If the CDF is continuous at the end points x = a and x = b, then

h0

## 6) P[a < X b] = FX (b) FX (a )

7) P[X = b] = FX (b) FX (b )

8) P[ X x] 1 FX ( x)
(c) Prof. Okey Ugweje

Continuous
P[ X x],
FX ( x)
P X xk u ( x xk ), Discrete
k
1) 0 FX ( x) 1

8) P[X > b] =1 FX ( x)

## Federal University 145

of Technology, Minna

## (c) Prof. Okey Ugweje

Example 24
Department of Telecommunications Engineering

Given that

## Compute FX(x) if X = # of heads in 2 tosses of a coin

S = {HH, HT, TH, TT}

x0

1 e 2 , x 0
FX x
0,
x0

## Determine if the function FX(x) is a valid CDF

Solution

# of heads is < 0

0 x1 # of heads = 0

(2) FX 0

(3) FX 1e 1

x2

## Properties 2, 3, 4 and 5 are used to

show that a given function is a valid
CDF

## Federal University of Technology, Minna

Example 23
Department of Telecommunications Engineering

0,
1
,
4
( x)
3,
4
1,

(4) FX x1 FX ( x2 ), x1 x2

(5)

FX x FX ( x)
FX x P X x P X x
FX ( x ) P X x P X x

## YES, FX(x) is a valid CDF

(c) Prof. Okey Ugweje

147

## Federal University 148

of Technology, Minna

146

Example 25

## Department of Telecommunications Engineering

1) P a X b FX b FX a

## The CDF of a RV X is given by

R| 0
x
F ( x) S
16
|T 1,
4

x0

2) P a X b FX b FX a P X a

FX(x)

0 x 2

2 x

## Compute P[1/2 < X 3/2 ]

4) P a X b FX b FX a P X b

5) P a X 1 FX a

3
3
F P[ X ]
2
2
3
1
3
1
P X F F
X
X
2

2
2
2
(c) Prof. Okey Ugweje

3) P a X b FX b FX a P X a P X b

1 3 4
1 1 4

16 2
16 2

## Federal University 149

of Technology, Minna

6) P X a 1 P[ X a ] 1 FX a

## Federal University of Technology, Minna

150

Example 26
Department of Telecommunications Engineering

1 e x , 0 x
FX x
else
0,

## (a)Find the probability that X > 0.5

FX ( x) P X x 1 P X x 1 1 FX ( x)
1

P X 14 FX

14 1 e

1
4

0.2212

## Probability Density Function

(PDF)
Everything should be made as simple as possible,
but not one bit simpler.
- Albert Einstein

## (c) Find the probability that 0.3 < x 0.7

P 0.3 X 0.7 FX 0.7 FX 0.3
0.2442
(c) Prof. Okey Ugweje

## Federal University 151

of Technology, Minna

152

## Probability density function (PDF) of a random

variable X denoted by fX(x) is defined as

Properties of PDF

( x)
d

dx P X xk u x x , p ( x ) x x ,
dFX ( x )
dx

fX

2)

continuous

2) P[ X x k ] 1

f X ( x)dx FX () 1

3) FX ( x)

f X ( x)dx

3) F ( x) P[ X x k ]

## 4) P[a X b] FX (b) FX (a)

b
b
a
= f X ( x)dx f ( x)dx a fX ( x)dx

discrete

1) 0 P[ X xk ]

1) 0 f X ( x)

## PDF, fX(x), measures how likely a random variable is to lie

at a particular value or how fast the CDF is increasing
fX(x) represents the density of probability at some point x
If the derivative of FX(x) exists then fX(x) exist
Derivative of FX(x) does not exist at points where the FX(x)
is not continuous
(c) Prof. Okey Ugweje

Discrete PDF:

153

4) P[a X b] P[ X x k ]
k a

## If we let a = b, we obtain P[ X a] za f X ( x)dx 0

That is, probability that a continuous RV will
assume any fixed value is zero
Hence for a continuous RV,
b

P[ X a] P[ X a] F(a)
(c) Prof. Okey Ugweje

f (x)dx

Example 27

## Properties 1 & 2 are sufficient to determine if a given

function is a valid PDF
Notice that integration in the continuous case is simply
replaced by summation in the discrete case

## Determine if the pdf function fX(x) is valid

x 0

x,
fX x
0,

else

Solution

1
x dx 01 xdx 01 xdx
0

x2
x2

2 1 2 0
1

155

## Federal University 156

of Technology, Minna

154

Example 28

## Department of Telecommunications Engineering

z f xdx
2) P a X b z f xdx
3) P a X b z f xdx
4) P a X b z f xdx
5) P a X z f xdx

1) P a X b

## For the given pdf below, find P X v

f X ( x ) ce x , x

Solution
First find C

1
ce x dx 0 ce x dx
ce x dx
2 0 ce x dx

Note
for any real number a, a- < a < a+, with a-, a+
arbitrarily close to a

c x
2
e
c

2
0

P X v

v e

2
1 e v

157

dx 2 0v e x dx
2

## Federal University 158

of Technology, Minna

Conditional CDF

Conditional PDF

## Department of Telecommunications Engineering

Conditional Distribution
From the definition of conditional probability, we obtain the
definition for conditional CDF

Conditional Density
From the definition of conditional probability, we obtain the
definition for conditional CDF & PDF.

P[ A B]
P[ X x B]
FX ( x|B) P[ X x| B]
P[ B]
P[ B]
where A is the event {X x}
P[ A|B]

f X ( x| B) dFX ( x| B)
dx

Properties

Properties:
1) 0 F( x| B) 1

2) F(| B) 1

2)

3) F(| B) 0

3) F ( x|B)

## 4) F( x) is non-descreasing F(a| B) F(b|B), if a b

5) F( x) is continuous from the right, i.e., F( x | B) F( x|B), if a b

4) P[ x1 X x2 ] F ( x2 |B) F ( x1| B)

f ( x|B)dx FX () 1

f ( y| B)dy

x2
x1

f ( y| B)dy

6) P[ x1 X x2 | B] F( x2 | B) F( x1| B), if x1 x2
(c) Prof. Okey Ugweje

159

160

## Discrete Random Variables - 1

Department of Telecommunications Engineering

## Bernoulli Random Variable

A Bernoulli trail is a probabilistic experiment that can have one
of two outputs classified as either success or failure and in
which the probability of success is p
We refer to p as the Bernoulli probability parameter
It is sometimes referred to as an indicator function of the RV X

I X ( )

## Discrete Random Variables

Bernoulli RV
Binomial RV
Negative Binomial RV

Poisson RV
Hypergeometric RV
Zeta RV

## Discrete RVs are specified by their probability mass

function (pmf)
(c) Prof. Okey Ugweje

161

k p

x
x

S X 0, 1

1-p
p

B1,p

## Px(1) = P[X = 1] = p, Px(0) = P[X = 0] = 1 - p

The Bernoulli RV corresponds to selecting one item (k=1)
with probability p of success
0,
x 1
FX ( x) 1 p, 0 x 1
p,
x 1
f X ( x)
1,
x 1
1 p, x 0

R|
S|
T

RS
T

162

## For Example, if you roll a die until 6 appears. Let X = number of

rolls. Find the probability mass function of X
5 k
pk ( x) P[ X k ] 1
, k 1, 2,
6 6
Some modification of Bernoulli trial sequences, results to other forms
of well known distributions:
Binomial,
Geometric,
Pascal, and
Negative binomial
These RVs are based on sequences of independent Bernoulli trials

FH IK

## Binomial Random Variable

Consider n experiments, each of which results in success
with probability p or failure with probability 1-p
Let X = number of success
For a sample consisting on n independent selections, with
replacement, the binomial RV, B(n,p), is the number of
successes denoted by
pk ( x) P[ X k ]

FG nIJ p (1 p)
H kK
k

nk

, k 1, 2,

## FG nIJ # of different sequences of the n outcomes

H kK leading to k success and n-k failures

## Binomial RV: = number of successes in n trial

Geometric RV: = number of failures before the first success
Negative binomial RV: = number of failures before the kth success
Pascal RV: integer version of the negative binomial
(c) Prof. Okey Ugweje

RS1,
T0,

1

1-p

1
163

...
p

1-p

1-p
p

Bn,p

164

## Geometric Random Variable, (G1, p)

Perform an experiment until one success occurs (G1, p)
Given a sequence of independent Bernoulli trails, the
geometric RV is the number of failures before the first
success
If X = the number of trials, then geometric distribution is
given by

## Negative Binomial Random Variable, (Gn, p)

Perform an experiment until a total of k success occur is
(Gn, p)
It computes the number of failures before the k-th success
If X = the number of trials required, then

pk ( x) P[ X k ] p(1 p)

k 1

, k 1, 2,
1-p

1
1

1-p

165

k 0,1, , r 1
1-p
p

Gn,p

166

## Discrete Random Variables - 7

Department of Telecommunications Engineering

## Poisson Random Variable

Is used to determine the number of occurrences of an event in
a certain time interval, e.g., rate of growth or decay
A Poisson RV X with parameter taking on one of the values
0, 1, 2, is given by

k
k!

## A Poisson RV is a limiting case of the Binomial RV

FG nIJ p (1 p)
H kK

k
n l arg e, p small, = np e
k!

Proof:
n k
nk
P[ X k ] p (1 p )
k

, k 0,1, 2,...

n! k
1
n k !k ! n
n

nk

1
n ( n 1) ( n k 1)! k n

k ! k
nk

## Items are uniformly scattered

Occurrence of items are independent
Never have two items at same time
Federal University of Technology, Minna

nk

It is assumed that
= average number per unit of time

, k r, r 1,

## FG k 1IJ pr (1 p)k r for the rth success to occur in k trials,

H r 1K
there must be r-1 in the first trials

## Department of Telecommunications Engineering

p X ( k ) P[ X k ]

...
p

1-p

k r

## Gn, p G11, p G12, p G1n, p

G1,p

R|FH kr11IK p (1 p)
S|0,
T
r

pk ( x) P[ X k ]

1
n

167

168

distribution

## For large n and moderate

e1 n jk e , e1 n jn k 1, n(n 1)k(n k 1) 1

## The number of wrong telephone numbers dialed in a day

The number of customers entering a post office on a given
day
The number of radioactive particles discharged in a fixed
interval of time

Hence
P[ X k ]

k e
k!

f X ( x) e

k 0

k
xk

FX ( x) e

k 0

k!

k
u xk
k!

169

pk ( x) P[ X k ]

where

C
k

## Take a random sample of size k from a population of n elements with

a successes and b failures
The number of successes in such a sample is a Hypergeometric RV
Let X = number of successes
a
A) Sampling with replacement will giveX ~ b x; n,

, k 1, 2,

FH

1
L
1I O
F
CM H K P
N k Q

170

k 1

a b

IK

## B) Sampling without replacement will give

FG aIJ FG b IJ
X ~ ha x; n, a, bf H xK H n xK
FG a bIJ
HnK

a given country

## Hence the distribution function can be written as

FG aIJ FG b IJ
p ( x) P[ X k ] H xK H n xK , k 0,1,, a
FG a bIJ
HnK
k

171

172

## 1. Bernoulli: X takes the values (0,1), and

4. Hypergeometric:

P ( X 0) q,

P ( X 1) p .

2. Binomial: X ~ B(n,p)

P( X k )

P(X k)

n
P ( X k ) p k q n k ,
k

k 0 ,1 , , n .

12

N m

n k

,
N

n

max(0, m n N ) k min( m, n )

5. Geometric: X ~ g(p)

P ( X k ) pq k , k 0 ,1 , 2 , , ,

3. Poisson: X ~ P()
P ( X k ) e

m

k

q 1 p.

## 6. Negative Binomial: X ~ NB(r,p)

k!

k 1 r kr
P(X k)
,
p q
r 1

, k 0 ,1 , 2 , , .
P(X k)

7. Discrete-Uniform:
P(X k)

k r , r 1, .

173

1
, k 1, 2 , , N .
N

174

## Some Commonly used Random Variables

Department of Telecommunications Engineering

## Continuous Random Variable

Statistics, likelihoods, and probabilities mean
everything to men, nothing to God.
Richelle E. Goodrich
(c) Prof. Okey Ugweje

## Federal University 175

of Technology, Minna

Uniform RV
Gaussian (Normal) RV
Cauchy RV
Rayleigh RV
Nakagami RV
Beta RV
Chi-squared RV
Pareto RV

Exponential RV
Gamma RV
Laplacian RV
Rician RV
Weilbull RV
Log-normal RV
Erlang RV
Student F distribution, etc

## Continuous RVs are specified by their probability

density function (pdf)

176

## Department of Telecommunications Engineering

An uniform RV is given by

fX

R| 1 ,
( x) S b a
|T0,

R|0,
x a
F ( x) S
,
b
|T1, a
X

R|e(x a) ,
f ( x) S
T| 0,
R|1 e(x a) ,
F ( x) S
T|0,

f X ( x)
1
ba

a x b

otherwise
a

xa

FX ( x)
1

a x b
b x

## Federal University 177

of Technology, Minna

f X ( x)

xa
xa

xa

## Federal University 178

of Technology, Minna

## Rayleigh Random Variable

Department of Telecommunications Engineering

## This means that

R|
( x a)

1
f ( x) S ( x a)e 2 ,
|T0,
R| (x a)
F ( x) S1 e 2 , x a
|T0,
xa

P[ X s t | X t ] P[ X s ]

Proof:
From conditional probability definition, one obtains

P[ X s t , X t ]
P[ X t ]

## Federal University of Technology, Minna

xa
xa
f X ( x)

P[ X s t ]

P[ X t ]
P[ X s ] P[ X t ]

P[ X t ]
P[ X s ]
(c) Prof. Okey Ugweje

FX ( x)

## This function often arises in practice and is used to describe the

amount of time until some specific event occurs, e.g.,
Amount of time until a phone call is received
Amount of time until an earthquake occur
Models the reliability of electronic components
Exponential RV is the only continuous distribution characterized
with lack of memory (memoryless)
(c) Prof. Okey Ugweje

## Department of Telecommunications Engineering

P[ X s t | X t ]

xa

a=0

The Rayleigh RV with parameter =1 corresponds to the Chisquared with 2 degree of freedom
The square of Rayleigh RV with parameter corresponds to the
exponential RV with parameter 1/(2)
The Rayleigh PDF and CDF are commonly used in
communication
179

180

## A Rice RV X with parameter , 2 > 0 is describe by the pdf

R|
S|
T

LM
N

x
x
f X ( x) 2 exp 2 2
0,
2

OP I F xI,
Q H K
o

x0

x0

## where Io(x) = zeroth order modified Bessel function of the 1st

kind
Rice PDF was developed in the 1940s in the study of noise
in communication channels
Its CDF is given by

FH a , x IK
b b

FX ( x ) 1 Q

where

181

by
x
2

, x

1
Ex
2
E x , m

var( x)
2
2

182

## The continuous RV X has Gamma distribution, with parameters

and , if its density function is given by

R| x
S|0,
T

f X ( x)

f X ( x)

FX ( x) P[ X x]

OP
Q

m 2
x , x0

x0

## Cauchy Random Variable

Department of Telecommunications Engineering

LM
N

exp

## Federal University of Technology, Minna

f X ( x)

2m 1

m = 1 Rayleigh PDF
m = 0.5 One sided Gaussian

2
2

Q , x exp x I dx

2
(c) Prof. Okey Ugweje

R| 2 e mj x
S|0, m
T

f X ( x)

1 1
x
tan 1

1 x

, x 0, 0, 0

f X ( x)

x0

where

x 0 x 1e x dx,

FX ( x ) G
x, 1 Incomplete Gamma Function

Note that

e21j

1
m1 m!, m 1, 2, m m1!
(c) Prof. Okey Ugweje

183

184

## Is a special case of Gamma RV with parameter = n

(n is a positive integer)

## A Laplace RV X with parameter is described by

f X ( x) e x 1 , x
2

1 x
f X ( x)
x e , x0

R|21 e (x a) ,
F ( x) S
|T1 21 e (x a) ,

n 1 k

FX ( x ) 1e x k ! x k

k 0

x a
a x

(x)

x
(c) Prof. Okey Ugweje

185

186

## Weilbull Random Variable

Department of Telecommunications Engineering

## The continuous RV X has Weibull distribution, with

parameters and , if its density function is given by

## A Beta RV X with parameter , is described by

fX

R|
( x) S
|T0,

FH x a IK

x a 1 exp

R|1 expLF (x a) I O,
MN H K PQ
F ( x) S
|T0,

fX
, x a, 0, 0
xa

a f
R
x
|
( x ) S a f
|T0,

FX ( x)

xa

RSI a , f,
T0,
X

1 x 1 ,

0 x 1
otherwise

x 1

f X (x)

x 1
0

xa

187

188

## Chi-squared Random Variable, X ~ 2(, )

Department of Telecommunications Engineering

## The continuous RV X has a chi-squared distribution, with

parameters and , if its density function is given by
f X ( x)

1
n
2n / 2
2

## Fisher F-Random Variable

Department of Telecommunications Engineering

n 1
2

f X ( x)

x
exp , x 0
2

e 21 j e 21 j

m n

mx n

mn
2

## This distribution arises in problems of testing

hypothesis in which 2 or more normal distributions are
compared

f X ( x)

FX ( x) G n , x
2 2

21 m n mm / 2 n n / 2 x m / 2 1

## where n is a +ve integer and G(a,b) is the incomplete gamma

function
Note that

2 (n) G n2 , 2
(c) Prof. Okey Ugweje

189

Pareto Variable

Pareto Variable

f X ( x)

|RS
T|0,

1
x 1

190

f X ( x)

x
otherwise

F
e2 j H

2
1
x
1 2 1
1

I
K

fT ( t )

191

## Federal University of Technology, Minna

192

Gaussian (Normal) RV - 1
Department of Telecommunications Engineering

## Normal distribution - If a continuous random

variable has distribution that is symmetric and
bell-shaped we call it a normal distribution

Variable

and symmetric

## Based on the law of probability

Everything is possible because
The sheer existence of possibility
Confirms the existence
Of impossibility.
Dejan Stojanovic

Score

193

## Standard Normal Distribution: = 0 and = 1

194

Gaussian (Normal) RV - 3

## 99.7% of data are within 3 standard deviations of the mean

( x )2
exp
, x , 0
2 2
2 2
1

f X ( x)

95% within
2 standard deviations

## is said to be a Gaussian or Normal density function

It is commonly denoted as N(,2), where

68% within
1 standard deviation

## = mean (average) value, 0,

= standard deviation, and
2 = variance
34%

f X ( x)

34%

2.4%
0.1%

0.1%
13.5%

-3
(c) Prof. Okey Ugweje

-2

## Federal University of Technology, Minna

+2

+3
(c) Prof. Okey Ugweje

2 2

X ~ N , 2
3

195

b g

0607
. a

13.5%

2.4%

## Federal University of Technology, Minna

x
196

Gaussian (Normal) RV - 4

Gaussian (Normal) RV - 5

## Once & are specified, Gaussian curve is uniquely determined

The Gaussian PDF is symmetric about x =
1

## Characteristics of the Normal Curve

The curve is bell-shaped and symmetrical.
The mean, median, and mode are all equal.
The highest frequency is in the middle of the curve.
The frequency gradually tapers off as the scores approach
the ends of the curve.
The curve approaches, but never meets, the abscissa at
both high and low ends.

2
x

## Gaussian curves with 1 2 and 1 2

1
2

1 2
Gaussian curves with 1 2 and 1 2

2
x

## Gaussian curves with 1 2 and 1 2

It is the most important of all densities and models more different random
occurrences than any other PDF
The most widely used model of noise in communication systems
Federal University of Technology, Minna

197

## (c) Prof. Okey Ugweje

Gaussian (Normal) RV 6

198

Gaussian (Normal) RV - 7

## It is so important that it is the only density in the world

to earn a place in a banknote (a German Banknote)

given by

FX ( x) P[ X x]
x

## Importance of Gaussian PDF stems from the central limit theorem

which states that the sum of RVs (or average of the sum) of almost
any type of RV approaches Gaussian density as n
Gaussian density is encountered in all areas of engineering and
science
(c) Prof. Okey Ugweje

199

(t ) 2
exp
dt
2
2

2
1

## This integral cannot be evaluated in closed form

However, because of its importance, FX(x) for the
Gaussian RV have been tabulated by means of
numerical integration and approximation techniques

## Federal University of Technology, Minna

200

Gaussian (Normal) RV - 8

Gaussian (Normal) RV - 10

## The tabulated function is normalized (standardized) Gaussian

RV denoted by N(0,1)
That is, a standard Normal RV has zero mean and unit
variance
A standard Normal RV have zero mean ( = 0) and unit
variance (2 = 1)

X ~ N ( , 2 )
X ~ N (0, 2 )
X ~ N (0,1)

201

## F x I is the CDF of standard Normal RV

HK
Federal University of Technology, Minna

## (c) Prof. Okey Ugweje

Gaussian (Normal) RV - 11

## Department of Telecommunications Engineering

Hence,
1
2

FX (a )

F t I dt
H 2K

exp

2
a exp x dx

2 2
2 2
x

(x)

( x)

y2
1

exp ( dy )
2
2

Also
x 1 x

Hence

FX (a) P[ X a]
X a
a
P

LM
N

FX (a ) P[ X a ]
(c) Prof. Okey Ugweje

202

Gaussian (Normal) RV - 12

( x)

exp y dy
2
2
x
y2
x
1

exp dy

2
2
2

FX ( x)

=1

=0

OP F I
Q HK
1
2

a
exp

y2
1
a
exp
dy Q

a
2

2

Q( x) 1 FX ( x)

z FH y2 IK dy
exp
x

Q function

## In many cases, the probability of error in communication system is given

directly in Q(x)
Q(x) is often referred to as the upper tail of the Gaussian density fn.

x 2
2 2

1
2

dx

203

## Federal University of Technology, Minna

204

Gaussian (Normal) RV - 13

Gaussian (Normal) RV - 14

Q( x) 1 Q x

## Calculating probability with the Q-function (area under

the curve)

Q( x) 1 x

Q( x)

F a I
HK

Q(0) 1

2) P X a 1 P X a 1 Q

## If the value of Q(k) is given, the value of a can be determined

from the Q-function table directly, e.g,
Q(k) = 0.2005 k = 0.84
Sometimes, linear interpolation may be necessary
E.g., if Q(k) = 0.02, then value of k lies between 2.05 & 2.06.
Q(2.05) = 0.02018, Q(2.06) = 0.01970
Hence, by interpolation, we obtain
k 2.05

3) P a X b FX b FX a P X a P X b
or
a
b
Q
P a X b Q

F I F I
HK HK
a I
4) P X a 1 P X a 1 QF
HK

## F 0.020180.02 I2.062.05 2.054

H 0.020180.01970K

205

## (c) Prof. Okey Ugweje

Gaussian (Normal) RV - 15

Lognormal RV

## The sum of n independent normal RVs is a normal RV with

mean n and variance n2
n
2
2
N ( , ) ~ N n , n

i 1

## Any fixed linear transformation of a Gaussian RV is also a

Gaussian RV
a bN (, 2 ) ~ N a , b 2

## The sum of squares of -independent unit Gaussian RV, N(0,1), is a chi-squared

RV (central type) with degrees of freedom

## A lognormal distribution if the RV Y = ln(X) has a normal

distribution with mean a and standard deviation .
The resulting density function of X is given by

R| 2
L ln(x a) b OP , x a
f ( x) S 2 x a expMN
Q xa
2
|T0,
R| 1 L ln(x a) b O z x expF z I dz, x a
PQ H 2 K
F ( x) S 2 MN
2
|T0,
xa
2

2
N (0,1) ~
i

i 1

206

The ratio of two independent unit Gaussian RV, N(0,1), is the standard Cauchy
The sample mean of n-independent and identically distributed RV each with
mean m and variance 2, tend to be Gaussian distributed with mean m and
variance 2/n, as n 0
(c) Prof. Okey Ugweje

207

## Federal University of Technology, Minna

208

Example 29a
Department of Telecommunications Engineering

## If X is a normal RV with parameter =3 and 2 =9,

find
a) P[2 < V<5],
b) P[X > 0],
c) P[|X-3| > 6]

Example 29a
Department of Telecommunications Engineering

(a)

2 3 X 3 5 3
1 X 3 2
P 2 X 5 P

3
3
3
3
3
3
1
2
P z
3
3
Q x 1 x
2
1

3
3
2
1
2
1

1 1
3
3
3
3

## From Table, we obtain

2
1
P 2 X 5 1
3
3
0.7486 0.6293 1 03779
(c) Prof. Okey Ugweje

## Federal University 209

of Technology, Minna

## (c) Prof. Okey Ugweje

Example 29a
Department of Telecommunications Engineering

b) P X 0 P X 3 0 3 P z 1

3
3
1 1 1 1 1 1
0.8413

Example 29b
Department of Telecommunications Engineering

## The velocity V of the wind at a certain location is a

normal RV with = 2 and = 5.
Determine P[-3 V 8].
3 2 v 2 8 2
P -3 v 8 P

5
5
5
v2
6
P 1 z

1 Q 1 Q 1
5

Q 1 Q
5

1 Q 1 Q 1.2
1 0.1587 0.1151 0.7262

c) P X 3 6 P X 3 6 P X 3 6

P X 3 P X 9
X 3 3 3
X 3 9 3
P

3
3
3
3
P z 2 P z 2
2 1 2
1 2 1 2 2 1 2
2 1 0.9772 0.0456

## Federal University 211

of Technology, Minna

## Federal University 210

of Technology, Minna

## Federal University of Technology, Minna

212

Statistical Properties of RV
Department of Telecommunications Engineering

## Department of Telecommunications Engineering

Statistical Properties of
Random Variables
Rowes Rule: the odds are six to five that the light
at the end of the tunnel is the headlight of an
oncoming train.
Paul Dickson
Federal University of Technology, Minna

213

## Some statistical characteristics or parameters that are

used to describe the behavior of random variables
These properties convey the information about the
shape of the function, the symmetric point (or center
point), the variation from this point, etc.
Knowing some of the properties, the behavior of a RV
can uniquely be determined
Some of these properties include the Mean, Variance,
Characteristic Function, etc.
For example, the mean and variance are universally
used to represent the overall properties of the RV and
its PDF
(c) Prof. Okey Ugweje

## Expectation of a Random Variable - 1

214

Expectation of a RV - 2

## To explain this concept, consider 2 figures shown

below:

- xf X ( x)dx, continuous

x m x X E[ X ] n
xk p ( xk ), discrete
k =1

f ( x4 )

f (x3)

f ( x5)

f ( x2 )
f ( x1)

EX
EX
x1

x2

x3

x4

x5

## E[X] is also known as the

Mean,
Average Value,
First Moment
E[X] is probably the most important concept in probability
theory and Random Processes - a must know concept
The concept of expectation is analogous to the physical
concept of the center of gravity of a distribution

## For figure (a), the x-axis may be considered as a long

weightless rod to which weights are attached
If weights equal to f(xj) are attached to this rod at
each point xj, then the rod will be balanced, iff it is
supported at point E[X]
For figure (b), the x-axis may be regarded as a long
rod over which the mass varies continuously

215

(a)

## Federal University of Technology, Minna

(b)

216

Expectation of a RV - 3

Expectation of a RV - 4

## If the density of the rod at each point is equal to f(x),

then the center of gravity will be located at point E[X]
and the object will be balanced if supported at that
point
E[X] will exist iff

## Properties of Expectation (Must Know)

1) E c c, c isaconstant
2) E cX cE X
3) E X c E X c
4) E X Y E X E Y

xf X ( x ) dx x f X ( x) dx

5) E X E Y , if P X Y 1
6) E[ X ] E X

## i.e., only when the integral converges absolutely

In general, if fX(x) has one peak @ X = x1 and is
symmetric about x1, then E[X] = x1, else the mean
value do not necessarily lie @ X = x1
Note that the notation E[X] is not a function of X
(c) Prof. Okey Ugweje

7) E X 1 X 2 X N E[ N ] E X

## If Xi, i = 1, , N are independent and identically

distributed (iid)

217

## Federal University of Technology, Minna

Expectation of a Function

218

Example 30a

## Given a function of a RV X, Y = g(x), we want to

compute the mean E[X]

## A RV X is uniformly distributed in the interval [a,b],

what is the expected value E[X]?

E X x
xf X ( x) dx

E Y E g ( x) ?
First find the PDF of Y and then use the definition to find
E[Y], or
Calculate the expectation directly using the definition as in

2
1 x

ba 2

- g ( x) f X ( x)dx

g ( x) E[ g ( x)]
g ( xk ) P X xk
k

1 b
xdx
ba a

1 1
1 1
b 2 a 2
2 b a b a b a
2 ba

b a
1 1
1 1
b 2 a 2
b a b a

2 ba
2 ba
2

Also
2

E X

3
1 b 2
1 x

x f X ( x) dx
x
dx

ba a
ba 3

1 1

b3 a 3
3 ba
(c) Prof. Okey Ugweje

219

220

Example 30b

Example 31

## b) A RV X is uniformly distributed in the interval [0, 10],

what is the expected value E[X]?

fX(x)

1 , 0 x 10
f X x 10
else
0,

1
10

10

## First we compute the value of K

bx
f X ( x)dx 0 Ke dx 1

K
e bx 1 K b
0
b

E X
xf X x dx
10
0 x

Kebx , x 0
f X ( x)
x 0
0,

## Then we compute the mean of the random variable X

E X 0 xbe bx dx

1
dx
10
10

x2
5
20 0
(c) Prof. Okey Ugweje

221

xb e bx

xb e bx

0
0

0 e bx dx
1
b

e bx

1
b

222

Example 32

## A discrete RV X has Xk = k2, k = 1,2,,5,

which occur with probability 0.4, 0.25, 0.15, 0.1,
and 0.1, respectively. Find E[X].

Solution

## This is analogous to the power of a signal

5

E X x xk p xk

m X
2
X

- x 2 f X ( x)dx, continuous

X E[ X ] n 2
discrete
xk p ( xk ),
k=1
2

value

k 1

X RMS E[ X 2 ]

## (3)2 (0.15) (4) 2 (0.1) (5) 2 (0.1)

6.85
(c) Prof. Okey Ugweje

223

224

Example 33

## N-th Moment / N-th Central Moment

Department of Telecommunications Engineering

## a) A RV X is uniformly distributed in the interval [a, b],

what is the mean square value E[X2]?

E[ X n ] X n - x n f X ( x)dx

Solution:

n

E[ X x ] X x

2
E X 2
x f X ( x)dx

- X x

3
1 b 2
1 x
x
dx

ba a
ba 3

f ( x)dx
X

1 1
b3 a 3

3ba

## Federal University of Technology, Minna

225

Example 33

x2 var X E[(X x ) 2 ]

2 x 2 e x
1 x

0 e dx
0

- (X x ) 2 f X ( x) dx
continuous

2
g (X k x ) P X xk discrete
k

2 1 x
2
e 2

## The variance of a random variable X is given as

x 2e x
2 x
E X 2
0 xe dx
0

Special Cases:
when n = 1, the first central moment is zero
when n = 2, the 2nd central moment is called the variance, i.e., the
variance is the second central moment

x m e ax m m 1 ax
x e dx
a
a

## (c) Prof. Okey Ugweje

226

E[ X x ] X x - X x f X ( x)dx

E X

2 x
0 x e dx

## Department of Telecommunications Engineering

f X ( x ) e x , x 0

x m e ax dx

## The variance provides a measure of the spread or

dispersion of the density around the mean
227

228

## Variance of a Random Variable - 2

Properties of Variance - 1

## A small value of the variance indicates that the probability density is

tightly concentrated around the mean and vice versa
The variance is the moment of inertia about the center of mass
Note that

var[ X ] X2 X x E[ X x ]
2

## Let a and b be constants

1) Var[a] 0
2) Var aX b a Var X
2

If E X , then E aX b a b

E X 2 Xx Xx xx
E X 2 xE[ X ] xE[ X ] E[ xx ]

2
Var aX b E aX b a b

X2 X X X X X X

E aX a 2 a2 E X 2 a2Var X

X X X
2

E[ X ] E[ X ]
2

a f

## 3) Var X Y Var X Var Y 2 E[ X x Y y ]

Standard deviation:

var[ x]
(c) Prof. Okey Ugweje

229

## (c) Prof. Okey Ugweje

Properties of Variance - 2

Example 34

## A RV X is uniformly distributed in the interval [a,b],

what is the Variance of the RV?

Proof:
Suppose that n = 2,
E[X1] = 1, E[X2]= 2, then E[X1+X2] = 1 + 2

EX x

var X 1 X 2 E[ X 1 X 2 1 2 ]
2

E[ X 1 1 ] E[ X 2 2 ] 2 E[ X 1 1 X 2 2 ]
2

var[ X 1 ] var[ X 2 ] 2 E[ X 1 1 X 2 2 ]
E[ X 1 1 X 2 2 ] E[ X 1 1 ]E[ X 2 2 ] ( 1 1 )( 2 2 ) 0

1 1
b3 a 3
3 ba

Vax X E x 2 E x
b a 1 1 b3 a3 2

3 b a

2
2

231

You can also use the brute force method shown below
2 E x 2 E x

## var X 1 X 2 var[ X 1] var[ X 2 ]

Federal University of Technology, Minna

E X 2

b a

## The variance can be computer as

E[ X 1 1 X 2 2 2 X 1 1 X 2 2 ]

Hence

230

1 b
ab 2
dx
a x
2
b 1

232

Example 34

Example 35

The variance is

1 b
ab
2
Var X E x x
x

dx
a

ba
2
2

1 1
ab
x

3 b a
2

Kebx , x 0
f X ( x)
x0
0,

3b

3
1 1 b a a b

3 b a 8
8

3
1 1 b a

3 b a 4

b a

12

233

## Federal University of Technology, Minna

234

Example 35
Department of Telecommunications Engineering

## b) Then we compute the variance of the random variable X

2
2
X2 E X x 0 X x f X ( x)dx

0 X b1 be bx dx
2

## Functions that Give Moments

b 0 x 2 e bx dx b2 0 xe bx dx b12 0 e bx dx

b xb e bx 2b2x e bx b23 e bx
2

0
b

2
b2

xe

bx

## Medicine is a science of uncertainty and an

art of probability.

b13 e bx b b13 e bx
0
0

William Osler

## b b23 b23 b13 b12

(c) Prof. Okey Ugweje

235

236

## Functions that Give Moments

Characteristic Function - 1

## Because of the importance of the n-moments (n-th order

expected value), several other techniques can be used to
evaluate them
These techniques are widely used in determining the moments
of important distributions for large value of n
These alternative procedures exist for determining the
moments of random variables especially when n > 2
These procedures or functions are:
Characteristic Function
Moment Generating Function
Probability Generating Function
Laplace Transform
These transforms are handy when computing the statistical
behavior of sums of large random variables

## Characteristic Function (CF) of a random variable X is

given by E[ejX] and is denoted by X(), such that

## Federal University of Technology, Minna

237

X ( ) E e j X
- e j X f X ( x)dx, continuous

j X
e k p X ( xk ), discrete
k
The characteristic function will exist only if the integral or
the sum specified above converges
X() can be interpreted as the expectation of a function of
X, denoted as Y = ejX, with unspecified
X() can also be interpreted as the Fourier Transform (FT)
of the PDF fx(x) of the random variable X with the sign of
reversed

Characteristic Function - 2

## Federal University of Technology, Minna

238

Characteristic Function - 3

## If X() is known, then fx(x) can be found from the

inverse FT with sign of reversed

## Now consider the derivatives of the CF, X(), evaluated at = 0

f X ( x)

1
2

X ( )e

f X ( x) X ( )

jX

X ( f ) F x(t )

x(t )e j2ft dt

x(t ) F

X( f )

X ( )
jxf X ( x)e j X dx 0
d
0

jxf X ( x )dx jE X

X ( f )e j2ft df

## CF is especially useful in evaluation the moments of RVs when

n>2
Consider the following,

E X
xf X ( x)dx

2
d2

( )
jx f X ( x)e j X dx 0
d 2 X
0

j 2 x 2 f X ( x )dx j E X 2

z
z

E X
2

E X

x f X ( x)dx

n
dn

( )
jx f X ( x)e j X dx 0
d n X
0

x 3 f X ( x)dx

E X
n

## (c) Prof. Okey Ugweje

j n x n f X ( x )dx j E X n

x n f X ( x)dx

239

## Federal University of Technology, Minna

240

Characteristic Function - 4

Example 37

E X

LM
N

## Find the characteristic function of the exponential RV

with PDF given by

OP
Q

1 d
X ( )
j d
0

LM
N

e x , x 0
f X ( x)
x0
0,

OP
Q

1 d
E X 2
2 X ( )
j d
0
2

Hence,

LM
N

OP
Q

d
E X n 1n
X ( )
n
0
j d

## This implies that if we know the CF of a RV, we can easily find

the n-th moment of the RV.
The Characteristic Function of a random variable always exist
(c) Prof. Okey Ugweje

241

242

## The Moment Generating Function (MGF) of a RV X is given by

E[etX] and is denoted as MX(t). Hence

## MGF is the same as the characteristic function with the j-term in

the exponent removed
MGF is used more often - since CF is related to Fourier
Transform
MGF may not always exist, e.g., find the MGF of f(x) = 2/x3
Like the Characteristic Function we find that

R|z e
S
|T e

M X (t ) E e

tX

-
n

tX
tX k

k 1

f X ( x)dx, t 0 continuous
P X xk

discrete

## Expanding the exponential as a power series and taking the

expectation implies that
M (t ) 1 tE[ X ]

Hence,

t 2 E[ X 2 ] t 3 E[ X 3 ]

2!
3!

M X (0) 1
M Xn (0) E[ X n ]
(c) Prof. Okey Ugweje

OP
Qt 0

M X (t )

Property:

Y aX b MY (t ) ebt M X at

X M X (0) M 1X (0)
2

LM d
Ndt

E X

ebt M X at

243

244

## Probability Generating Function - 1

Example 38
Department of Telecommunications Engineering

## The Probability Generating Function (PGF) defined for

nonnegative discrete random variable X is given by

GX (z) E z X pX ( xk )z x
x0

## The PGF is essentially the z-transform of a RV X with the z

replaced by z-1.
If we know the PGF, we can find the probability mass function
k
pX (k) P[ X k] 1 d k GX (z)
z 0
k ! dz

P X k zk 1
E X GX ( z )
dz
z 1 k 0
z 1

xP X x E[ X ]
x 0

245

246

Example 39

## Department of Telecommunications Engineering

d2

E X 2 2 GX ( z )
P X x x( x 1) z x 2
dz
z 1 x 0
z 1

x( x 1) P X x E[ x( x 1)] E[ X 2 ]
x 0

d
E X n n GX ( z )
E[ x( x 1) ( x n 1)]
dz
z 1

## This is sometimes called the factorial moments

We can also compute the variance using the PGF as
follows
2
(c) Prof. Okey Ugweje

LM
N

OP
Q

d2
d
d
GX ( z ) GX ( z )
GX ( z )
2
dz
dz
z 1 dz
z 1
z 1
Federal University of Technology, Minna

247

## Federal University of Technology, Minna

248

Laplace Transform

Example 40

## Laplace Transform (LT) of a positive RV X with PDF fX(x) is

LX ( s ) E e sX 0 e sX f X ( x)dx

## where s is a complex number with positive real part

The Inverse Laplace Transform (ILT) can be obtained as follows
f X ( x) 1 cc jj L X (s)e sX ds
j2

LM
N

E X (1)
n

dn
dz n

L X (s)

OP
Q

s0

## It is also possible to invert the above equation to get

E[ X n ] n
s
n!
n0
This means that the LT and fX(x) can be computed in principle from
the knowledge of the moments

LX (s)

249

## Federal University of Technology, Minna

250

Tail Inequalities - 1
Department of Telecommunications Engineering

## Probabilities of the form P[X k] and P[|X| k] are

known as Tail Probabilities
Sometimes we want to estimate (upper bound) of
these probabilities without actually evaluating them
The following 3 bounds provide us with various
estimates of the Tail Probabilities

Tail Inequalities
It is always better to be approximately right, than
precisely wrong.

1. Markov Inequality
2. Chebyshevs Inequality
3. Chernoff Inequality

- Unknown Engineer

251

## Federal University of Technology, Minna

252

Tail Inequalities - 2

Tail Inequalities - 3

## Department of Telecommunications Engineering

1. Markov Inequality:
If X is a RV that takes nonnegative values, then for any value k
>0

2. Chebyshevs Inequality:

PX k

Proof:

E X
k

I st order bound

## E X 0 xf X ( x)dx 0 xf X ( x)dx b xf X ( x)dx

b xf X ( x)dx b kf X ( x) dx
kP[ X k ]

## Chebyshev Inequality (CI) gives a conservative estimate of

the probability that a random variable X assumes a value
within standard deviation of its mean,
Let X be a RV with mean and variance 2. Then for any
value k > 0 at most 1/k2 of the probability is distributed
outside the interval -k2 < X < +k2 . That is
P X k

Hence,

E X P X k

## This simple inequality is surprisingly useful and various

other well known inequalities are derived from it
(c) Prof. Okey Ugweje

253

a f

E X
k2

E X
2
k

2
k

## That is, if we pick a value of a RV arbitrarily, we can state the

min probability that the random value falls within a given limit
The significance of CI is that it emphasizes the general
importance of the standard deviation of a RV
Sometimes the following forms of CI are used
2

P X k 1 2

254

or P X k 1
2

or

P X k 1

1
k2

## CI holds for all distribution positive or negative and cannot be

improved
It provides intuition about the meaning of the variance of a
RV
This is because it shows that wide diversions from the mean
E[X] are unlikely if the variance 2 is small, e.g., let var[X] =
2 and k = n
2
P X n 2 12
n
n

## Although that Chebyshev Inequality is correct, the upper

bound is not tight; i.e., it usually different from actual value

## But since (X- )2 k2 iff |X- | > k, then

P X k

Tail Inequalities - 5

## Since (X- )2 > 0, we can apply Markov Inequality such that

2

Proof:
Chebyshev Inequality is a consequence of the Markov
Inequality

Tail Inequalities - 4
Department of Telecommunications Engineering

P X k2

255

## Federal University of Technology, Minna

256

Tail Inequalities - 6

## Two laws of large numbers deal with the behavior of n as n

becomes arbitrarily large
Var[n] 0 as n suggest that PDF of n becomes
narrower and narrower and approaches delta function

3. Chernoff Inequality:
If X is a RV and for any value k > 0
P e k
tX

M X t
k

## where MX(t) is the Moment Generating Function

That is, Chernoff bound requires the knowledge of the MGF

R|P X k ekt M
S| P X k ekt M
T

(t ), k E[ X ]

(t ), k E[ X ]

## Chernoff Inequality is a much tighter bound than Chebyshev

Inequality but more complex
Thus, we expect Chernoff bound to be tighter Markov bound
It applies to any RV whether positive or not
(c) Prof. Okey Ugweje

257

## Strong law of Large Numbers (SLLN)

Consider a sequence of independent and identically distributed
(iid) RVs, X1, X2, , XN, each with mean
Then for > 0
or
P lim n 0
n

P lim n 1
n

## This means that n as n

SLLN is the basis for justifying simulations and analysis of all
experimental results
(c) Prof. Okey Ugweje

258

## Weak Law of Large Numbers (WLLN)

Let X1, X2, , XN, be a sequence of iid RVs, each with mean
Then for > 0

Let X1, X2, , XN, be a sequence of iid RVs, each with finite
mean and finite variance 2
Let Sn = X1 + X2+ + Xn, n > 1, and let Zn be a sequence of
unit variance, zero mean RVs, defined as

lim P n 1
n

## Since is arbitrary, in the limit, the density of n

WLLN is an easy consequence of the Chebyshev inequality
Central Limit Theorem (CLT)
The CLT is one of the most remarkable results in probability
theory
It is concerned with the PDF of the sum of independent RVs
It states that the sum of large number of independent RVs (any
distribution) has a distribution that is approximately Gaussian
under certain conditions
(c) Prof. Okey Ugweje

## Federal University of Technology, Minna

259

Z Sn n
n
n

Then,

x2
1 z 2
dx
lim P Z n z N 0,1
e
n
2

## That is, for all n, E[Zn] = 0, Var[Zn] = 1

Hence even as n the mean and variance of Zn will not change
In other words, the CDF of the normalized sum approaches a
Gaussian CDF no matter what the distribution of the component RVs
(c) Prof. Okey Ugweje

260

## Laws of Large Numbers - 4

Department of Telecommunications Engineering

## This concept is very important in Engineering, for

example:
Electrical noise is often the result of superposition of
voltages due to large number of charge carriers
Turbulent boundary-layer pressure variations on an aircraft
skin are the result of superposition of minute pressures due
to numerous eddies
Random errors in experimental measurements are due to
many irregularities

## In all these cases, Gaussian approximation is valid

Transformation of a
Random Variable
The laws of probability, so true in general,
so fallacious in particular.
Edward Gibbon

261

262

## Frequently, one encounters the need to derive the probability

distribution of one or more RVs.

function

X
fX(x)

g(.)

Y
fY(y)

also a RV

SX
(S, F, P)

A
(A, E(A), PX)

SY

(S, F, P)

## (B, E(B), PY)

Federal University of Technology, Minna

## In general, we call the above black box transformation or

data processing
Transformation may be classified as memoryless or with
memory. Only memoryless cases are treated in this class
If input X is a RV, output Y is also a RV
The basic idea here is to relate the event A = {Y y} to an
equivalent event that involves X, B = {X g-1(y)}

## Suppose that the CDF or PDF of one RV X is given, we wish to

compute the CDF or PDF of another RV Y = g(X), where g is a
function

g: X Y

Y g( x)

X g1( y)

263

264

## Transform of Distribution Function - 1

Department of Telecommunications Engineering

## Given CDF of X, we want to find the CDF of a related RV

Y = g(X)
FY ( y) P[Y y]
P[ g( x) y] P[ X g 1 ( y)]
1
FX g ( y )

## Transform of Distribution Function - 2

Department of Telecommunications Engineering

## Some Important CDF Transforms

1. Linear Transformation: Y = aX + b
a, b are constant. We know the CDF of X
y dy
y
Case 1: a > 0
X

y b
a

y b
y b
F a yf P L X
MN a OPQ F FH a IK

Hence

a f LNM

OP LM
Q N
F
IK
1 F H

FY y P X

265

## Transform of Distribution Function - 3

Department of Telecommunications Engineering

x y b

## Federal University of Technology, Minna

a0

Yy

y b
X
a

Steps:
1) Solve for x in the given equation in terms of y
2) Substitute into the above equation
(c) Prof. Okey Ugweje

x x dx

Case 2: a < 0

FY ( y) FX g ( y)

Y aX b

y b
y b
P
X
a
a
y b
a

OP
Q

a0

y b
a

Yy

266

## Transform of Distribution Function - 4

Department of Telecommunications Engineering

2. Square Function: Y = X2

R| FH IK
af S F I
|T H K

y b
FX
,
a 0
a
FY y
y b
1 FX
, a 0
a

X y

Case 1: y 0
yx

af

FY y P y x y FX

b y g F b y g
X

Case 2: y < 0
There is no value of X for which x2 <y. Hence FY y P 0

af

a f RS0F, b y g F b y g,
T

FY y

267

y 0
y 0

268

## Transformation of Density Function - 1

Department of Telecommunications Engineering

X
fX(x)

g(x)

Y
fY(y)

n
f X ( xk )
f Y ( y) d
k 1
g ( x)

## where xk, k = 1, 2, , n are real roots of the equation y = g(x) in terms of y

For a one-to-one transformation,

dx
fY ( y) f X ( x) f X ( x) f X ( x)
d
dy
dy
g( x)

g ( x)

dx

269

## Transformation of Density Function - 3

Department of Telecommunications Engineering

y
fX
,
fX
a
a

y
a

x y

x y

270

## Transformation of Density Function - 4

5) Cosine Function: Y = cos(X)
X is a RV uniform in the interval [0, 2)
For -1<y<1, g(x) = cos(x) has two solutions:

y0
y0

xo = cos ( y), 0 x
1

FG JI
HK

F
H

I
K

## Federal University of Technology, Minna

1
2
1
1

2 2

B f ( x)

1
fY ( y) 1 f X ln y
a
y

2 cos ( y)
1

b
g d cosdy ayf f b2 cos (y)g d 2 dycos ayf
f bcos ( y)g f b2 cos ( y)g 1
1 y
1

## 4) Exponent Function: Y = exp(aX)

2
cos 1 ( y)

-1

fY ( y) f X cos 1( y)

dx
1
1
ln y ax x ln y
a
dy ay

y
1

x1 = 2 cos1( y), x 2

a
fY ( y) a2 f X
y
y

1
a
dy
a
g' ( x )
2 y2
y
dx
x
a

FH IK

y b
1
fX
a
a

x

f Y ( y)

dy d 2
y

aX 2 ax 2 a
2 ay ,
dx dx
a
1
dx

dy 2 ay

## 3) Substitute into the formula and simplify

f y 2 ay
Y

0,

yb
a
dy
d

aX b a
dx
dx

dx

Steps:
1) Given y = g(x), solve for x in the given equation in terms of y
2) Find d
dy

dx

## Some Important PDF Transformations

1) Linear Transformation: Y = aX + b
(a, b are constant and we know the PDF of X)
X

dx

dx

## Transformation of Density Function - 2

Department of Telecommunications Engineering

271

## (c) Prof. Okey Ugweje

1
1 y2
Federal University of Technology, Minna

272

1 1
f ( y) L O
MN2 2 PQ
Y

y 1
fY ( y)

1 y 1
y 1

## The same density transformation holds for sine function

The cosine and sine RV has an arcsine distribution function
For an interval of (-, ), the sine or cosine will have
infinitely many solutions, e.g.,

x0

x1

x2

x3

## Federal University of Technology, Minna

273

a y

f X ( xn ), y a

xn tan 1 y dy 1 1 y2
dx cos2 x

fY ( y) 1 2 f X ( xn )
1 y n

x4

1
2

## 6) Tangent Function: Y = tan(X)

Y a sin x

dy a cos x a 2 y 2
n
dx

By integration

R|0,
sin ( y)
F ( y) S 1
,
2
|T1,

F y I
H aK

Y asin x x sin1

1
1 , 1 y 1
2
1 y 2
1 y

274

## Two Random Variables - 1

Department of Telecommunications Engineering

## Department of Telecommunications Engineering

When there are more than one RV, we talk about joint
events from the same sample space
Any ordered pair of numbers (x, y) can be considered
as a point in the xy plane
Y

## Multiple Random Variables

(2 Random Variables)

S1

X(s2), Y(s1)
Y

S2

SJ

## All knowledge degenerates into probability.

David Hume

Let A = {X x} and B = {Y y}
Events A and B refer to the sample space S, while events {X
x} and {Y y} refer to the joint sample space SJ

k p

X x Y y X x, Y y
(c) Prof. Okey Ugweje

275

276

## In diagram below, notice that event A B defined in

sample space corresponds to the joint event {X x}
and {Y y}
A X x
Y

## This new sample space in SJ is called the range

sample space or 2-D product space, but we will just
call it joint sample space

## In the study of multiple RVs, we characterize events

by the following:
Joint Cumulative Distribution Function
Joint Probability Density Function
Concept of joint PDF is an extension of joint
probability
Marginal Density and Distribution Function
Given joint PDF or CDF, find the PDF or CDF of
one of the RVs
Joint Expectation of 2 Random Variables
Conditional Expectation of Random Variables
Independence of one Random Variable and another

SJ
S

A
A B

k p

B Yy

277

278

## Two Random Variables - 4

Department of Telecommunications Engineering

## Correlation of Random Variables

The relationship between the 2 RVs in terms of
their means
Covariance of Random Variables
The relationship between the 2 RVs in terms of
their variances
Correlation Coefficient
The normalized 2nd order joint central moments
Functions of two Random Variables
Transformations of Random Variables
As in one RV, multiple RVs can also be
transformed
More difficult to compute, etc.
(c) Prof. Okey Ugweje

Joint CDF

279

280

## Joint Cumulative Distribution Function - 1

Department of Telecommunications Engineering

## Joint Cumulative Distribution Function - 2

Department of Telecommunications Engineering

1) 0 FXY ( x, y) 1
2) FXY ( x, y) is a nondecreasing function of both x and y

## Considering only two Random Variables X and Y

If X and Y are RVs, then the joint cdf of X and Y is given by

## 3) FXY (, ) FXY (, y) FXY ( x, ) 0

P X x, Y y , continuous
F (x,y ) =
XY
discrete
P X x, Y y ,

## This means that it is impossible for X or Y or both to assume a value

less than - (boundary conditions)

4) FX (,) 1

## FXY(a,b) is the probability that X and Y lie in the semi-infinite

region of the (x, y) plane

conditions)

y
b

## 5) FXY (a2 , b2 ) FXY (a1 , b1 ) FXY (a1 , b2 ) FXY (a2 , b1 )

P[a1 X a2 , b1 Y b2 ], a1 a2 , b1 b2

Properties:
The properties of joint CDF is similar to that of the single variable

b2

(a1,b2)

(a2,b2)

b1

(a1,b1)

(a2,b1)

281

282

## Department of Telecommunications Engineering

6) FXY ( x,) FX ( x)

a2

a1
(c) Prof. Okey Ugweje

## Department of Telecommunications Engineering

1) P[ X a, Y b] FXY (a , b )

P[ X a, Y ]

2) P[ X a, Y b] 1 FX (a )-FY (b )+FXY (a ,b )

3) P[a1 X a2 , b1 Y b2 ]
7) FXY (, y) FY (y)
P[ X , Y b]

b
x

## 4) P[a1 X a2 , Y b] FXY (a2 ,b ) FXY (a1 ,b )

Note:
The first 5 properties are just the 2-dimensional extension of
properties of one random variable
Properties 3, 4, and 5 may be used to test whether a given
function is a valid joint CDF
As in the case of a single RV, joint CDF can be used to compute
probabilities of unions and intersection of semi-infinite rectangles
(c) Prof. Okey Ugweje

## Federal University of Technology, Minna

283

b2
a1

a2

x
b1

5) P[ X a, b1 X b2 ] FXY (a , b2 ) FXY (a , b1 )
(c) Prof. Okey Ugweje

284

## Computing Probabilities with Joint CDF - 2

Department of Telecommunications Engineering

6) P[a1 X a2 , b1 Y b2 ]

=FXY (a2 , b2 )+FXY (a1 , b1 )-FXY (a1 , b2 )-FXY (a2 , b1 )+P X =a1 , b1 <Y b2

where
P X a, b1 Y b2
lim FXY (a 1n , b2 )- lim FXY (a 1n ,b1)- lim FXY (a 1n ,b2 ) lim FXY (a 1n , b1)
n

7) P[a1 X a2 , b1 Y b2 ]

FXY (a2 , b2 )+FXY (a1 , b1 )-FXY (a2 , b1 )-FXY (a1 , b2 )-P Y b2 , b1 <Y b2

## Marginal Distribution Functions

Department of Telecommunications Engineering

285

## In the study of several RVs, the statistics of each RV

can be obtained from the joint RV. This is known as
Marginal function
The marginal CDFs of the RVs X and Y are
FX ( x) =

RSF (x,)
Tz z f (, y)ddy
XY

XY

F (,y)
XY
F (y) =

Y
f (x, )d dx
XY

286

## Joint Probability Density Function - 1

Department of Telecommunications Engineering

## The joint density of X and Y is defined as

2

f XY ( x, y) = d FXY ( x, y)
dxdy
It is assumed that X & Y are jointly continuous, else
the derivative may not exist
It follows that

Joint PDF

287

XY

( x, y) =

zz

x y

f XY ( , )dd

288

Properties:

## As in joint CDF, the joint PDF can be used to compute the

probabilities of random variables

1) f XY ( x, y) 0
2)

z z

PA

f XY ( x, y)dxdy FX () 1

z z
4) F ( x) z z f
5) F ( y) z z f

3) FXY ( x, y)
x

XY

f XY ( , )dd

XY

7) f X ( x)

z z f ax, yfdxdy
2) P a X b, c Y d z z f a x, yfdxdy
3) P a X b, c Y d z z f a x, yfdxdy
4) P a X b, c Y d z z f a x, yfdxdy
5) P a X b, c Y z z f a x, yfdxdy
6) P X a, c Y d z z f XY a x, yfdxdy
b d
a c

XY

b d

XY

b d

a c

f XY ( x, y)dxdy

XY

b d

a c

f XY ( x, y)dy

XY

a c

8) f Y ( y) f XY ( x, y)dx

XY

a d

Note:
Properties 1 and 2 are sufficient to test the validity of joint
PDF
(c) Prof. Okey Ugweje

f XY ( x, y)dxdy

a c

( , )dd

6) P a1 X a2 , b1 Y b2

1) P a X b, c Y d

( , )dd
b2 a2
b1 a1

zz

289

a c

## Care should be exercised with the limits of the integration when

discrete or mixed RVs are involved.
You may have to integrate a (.) on the boundary
(c) Prof. Okey Ugweje

290

## Joint Probability Mass Function

Department of Telecommunications Engineering

## Considering only two discrete Random Variables X

and Y
The joint pmf of X and Y is given by
p

Joint PMF

XY

(x,y ) =P X x, Y y

## pXY(a,b) is the probability that X and Y equal to some

value (x, y)
The properties of joint PMF is similar to that of the
single variable

291

292

## Marginal Distribution/Density Functions

Department of Telecommunications Engineering

## In the study of several RVs, the statistics of each RV can be

obtained from the joint RV. This is known as Marginal function
The marginal CDFs of the RVs X and Y are
F

FXY (x , )
(x ) = x
f XY ( ,y )d dy

Random Variables

F (,y)
XY
F (y) =
y
Y
f (x, )d dx
XY

## Joint Expectation (i.e., joint moments)

Covariance of a Random Variables
Correlation of X and Y
Correlation Coefficient, etc.
Conditional Expectation and Variance

z
z

## f X ( x) = f XY ( x, y)dy = d FXY ( x,)

dx

f Y ( y) = f XY ( x, y)dx = d FXY (, y)

dy
(c) Prof. Okey Ugweje

293

294

## Department of Telecommunications Engineering

1. Independence of X and Y

## 2. Joint Moments of X and Y

The joint expected value of two RVs X and Y is defined as

## Statistical independence can be depicted in terms of joint

distributions, joint densities, and joint probability
functions
Recall that if events A and B are independent,
P A B P A P B
Hence
P X x, Y y P X x P Y y
FXY ( x, y ) FX ( x) FY ( y )
f XY (x ,y ) = f X (x )fY (y )
Implies that if X and Y are independent, their jpdf and jcdf factor into
2 marginal densities or distributions, respectively

Also

ij X iY j E[ X iY j ]
i j

- x y f XY ( x, y )dxdy, continuous

i j
discrete
xn yk p XY xi y j ,

n k
The sum of i+j is called the order of the moments
Given a function z = g(x,y), we can 1st compute the PDF of
Z and then compute the mean of z

or we can
as follows
E[ Z ]compute

zf Z (directly
z )dz

P X x, Y y P X x P Y y

295

296

## Relationship Between X and Y - 3

Correlation of X and Y - 1

## The correlation of X and Y is defined as

- g ( x, y ) f XY ( x, y )dxdy, continuous
E[ g ( x, y )]
discrete
g xi , y j p XY xi y j ,

n k

11 RXY E[ XY ]
- xyf XY ( x, y )dxdy

## This is, the joint moment when i = j = 1

Measures relationship between the mean of X & Y
If X and Y are independent, then

## If X and Y are independent, then

ij E[ X iY j ] - xi f X ( x)dx - y j fY ( y )dy

RXY E[ X ]E[Y ]

E[ X i ]E[Y j ]
Thus

## 20 E X 2 , 02 E Y 2 , 11 E XY , 2nd order moments

(c) Prof. Okey Ugweje

## Federal University of Technology, Minna

297

Correlation of X and Y - 2

Note:
When RXY = E[X]E[Y], X and Y are said to be
uncorrelated
Independence Uncorrelatedness
Uncorrelatedness (not always) Independence
(c) Prof. Okey Ugweje

298

## This means that it is possible for X and Y to be uncorrelated

and yet not independent (except for the jointly Gaussian RV)
If RXY = 0, then X and Y are orthogonal (X Y)

## In practice, the outcome of many experiments are not

independent
For example, the output of a communication channel
Y is usually dependent on the input X in order to
convey the proper information
From probability, we know that

P A|B

ij E[ X X Y Y ]
i

P A B
PB

## The definition of joint conditional CDF and PDF can be

directly obtained from conditional probability

j

i

- x X y Y f XY ( x, y )dxdy

299

## Federal University of Technology, Minna

300

Conditional PMF - 1

Conditional PMF - 2

## Department of Telecommunications Engineering

Discrete
The joint conditional pmf of X given that Y = y is given
by

## If X and Y are independent

P X xi |Y y j

P X xi , Y y j

P Y y j | X xi

P Y yj
P X xi , Y y j
P X xi

pXY xi , y j

pY y j

P X xi |Y y j

P X xi P Y y j
P X xi p X ( xi )
P Yyj

P Y yi | X xi

P X xi P Y y j
P Y y j pY ( y j )
P X xi

## The conditional CDF of X given that Y = y is

FX x| y j F
X|Y y j P X x|Y y j
XY
P X x, Y y j

P Y yj

pXY xi , y j
p X xi

## The conditional PMF satisfies all the properties of

PMF

Similarly

FY y | xi FXY Y | X xi P Y y | X xi

301

## (c) Prof. Okey Ugweje

Conditional Density - 1

P X xi

## Federal University of Technology, Minna

302

Conditional Density - 2

## For the continuous RV, the denominators of * and ** are zero,

i.e.,

FXY x | y lim P X x | y Y y y
y y

P Y y j P X xi 0
Hence * and ** are undefined for continuous RV. Fortunately,
the numerators are also zero
We say that * and ** are limiting cases for continuous RV
For X and Y jointly continuous, we obtain the following
x y y
f XY , d d
y
P X x | y Y y y
y y
fY d
y

y f XY , y ' d
, y y ', y '' y y
P X x | y Y y y
y fY y ''
x

## Federal University of Technology, Minna

x
f XY , y d
FXY x | y
fY y

Consequently
f XY x | y

f x, y
d
FXY x | y XY
,
dx
fY y

f XY y | x

f x, y
d
FXY y | x XY
fX x
dy

Also,

ab g z dz b c g (c), a c b

## (c) Prof. Okey Ugweje

P Y y, X xi

303

f XY x | y
(c) Prof. Okey Ugweje

f XY y | x f X x
,
fY y

f XY y | x

f XY x | y fY y
fX x

## Federal University of Technology, Minna

304

Conditional Expectation - 1

Conditional Density - 3
Department of Telecommunications Engineering

## If X and Y are independent, then

f XY x, y f X x fY y ,

yfY ( y | x)dy, continuous

E Y | x
y j pY y j | x , discrete
j

f XY x | y f X x ,

## Note that E[Y|x] is defined at a given point X = x and is

not defined for any other value of x (zero any other
place)
E[Y|x] is the center of mass associated with the
conditional PDF/PMF
Since E[Y|x] is a function of X, it is itself a RV with its
own probability distribution

f XY y | x fY y

305

## (c) Prof. Okey Ugweje

Conditional Expectation - 2

## Department of Telecommunications Engineering

Theorem 1:
For any random variables X and Y E[ E Y | x ] E Y

A) Covariance:

Proof

E Y | x f ( x)dx
E[ E Y | x ]
X
yf

XY y | x f X ( x) dxdy

y
E[ E Y | x ]

x, y
XY
f

306

## Covariance measures the relationship between variance of

X and Y
The 2nd order joint central moment is known as the
Covariance of X and Y, .i.e,
C XY Cov X , Y

E[ X X Y Y ]

- x X y Y f XY ( x, y )dxdy

f ( x)dxdy
X

yf

XY ( x, y )dxdy
E[Y ]
(c) Prof. Okey Ugweje

307

308

## Department of Telecommunications Engineering

Cov X , Y E[ X X Y Y ]

B) Correlation Coefficient ()

E[ XY Y X X Y X Y ]
E[ XY ] Y E X X E Y X Y

## The normalized 2nd order joint central moment is called the

Correlation Coefficient
Cov XY
XY
, 1 XY 1

E[ XY ] Y X X Y X Y

XY

Thus

By definition,

Cov X ,Y E[ XY ] X Y RXY X Y

Note:
If X and Y are either independent or uncorrelated, then

XY E

where

E[ XY ] E[ X ]E[Y ] Cov XY 0
If X and Y are orthogonal, then RXY = 0

fO, 1
PQ

XY

2X Var X X 2X
Y2 Var Y Y Y2

## Note that if X and Y are uncorrelated, XY = 0

Cov XY E[ X ]E[Y ]
(c) Prof. Okey Ugweje

LMa X fa X

309

310

## One Function of 2 RVs - 1

Department of Telecommunications Engineering

## Transformation from two RVs 1 Random Variable

Given 2 RVs X and Y, we form a new RV Z such that

Z g ( x, y)

## The event of interest is {Z z}.

Let Rz denote a region on XY plane such that {Z z} =
g(x,y) z

Transformation in Two
Dimension
If nature has taught us anything it is that the
impossible is probable

## Federal University of Technology, Minna

R f ( x, y )dxdy
Z

f Z z P z Z z dz
R f ( x, y )dxdy

Ilyas Kassam
(c) Prof. Okey Ugweje

FZ z P Z z P g ( x, y ) RZ

311

312

## We fix a value of x and then let y vary from - to (zax)/b

Z aX bY y

z aX
b

f Z z

Y
z
b

z
a

f Z z

z ax

FZ z P Z z P aX bY z b f ( x, y )dydx
(c) Prof. Okey Ugweje

## Federal University of Technology, Minna

z ax

b
y

f ( x) f ( y)dydx

f ( x) f z ax dx ...................................... A1
b

## A1 is a convolution integral like the ones encountered in linear

system and communications
This means that if two RVs are independent, then the density of
their sum is equal to the convolution of their marginal densities
Proficiency in evaluating A1 is very important in Electrical
Engineering

## This is an important case because it is frequently

found in the analysis of physical system

zz

FZ z

X
Z=aX+bY

Note:

313

## One Function of 2 RVs - 4

314

Examples of Convolution

## Applications of A1 is seen a lot in the analysis of Linear

Systems
For example, consider the system shown below whereby the
received signal is the convolution of the input signal plus
noise and the impulse response
(s+n)

aX+bY < Z

Signal + noise

d
FZ z
dz

h(t)

fX(x)

fY(y)

fz(z)

a+c

b+d

a+c

b+d

## The convolution of two functions is often calculated using

Fourier Transform (FT) which is related to the Characteristic
Function (CF) of a random variable

fX(x)

fY(y)

fz(z)

## These situations arise a lot in communications

a(t )*b(t ) A( f ) B( f )

## Since the CF is closely related to the FT, we may write

Z ( ) X ( )Y ( )
(c) Prof. Okey Ugweje

315

316

## Expected Value of Sum of 2 RVs - 1

Department of Telecommunications Engineering

## Department of Telecommunications Engineering

Let

Z X Y

such that

Transformation in 2 Dimensions
(1 function of 2 RVs)

E Z E X Y x y f ( x, y )dxdy

xf ( x, y )dxdy yf ( x, y )dxdy

xf ( x)dx yf ( y )dy

Hence

## Sum of Two RVs

Product of Two RVs
Ratio of Two RVs
Minimum and Maximum Functions

E Z E X Y E X E Y

## For arbitrary constants a and b

E Z E aX bY aE X bE Y

317

318

Let

## If X and Y are uncorrelated, then Cxy = 0 and

hence

Z X Y , z x y

such that

a f E aZ a ff
E a X Y a ff E ka X f aY fp

var Z E Z z

Expanding

Var Z E X X

2Z 2X Y2

E Y Y

fa

2E X X Y Y

## Var aX bY a 2 Var X b2Var Y 2abCov X ,Y

That is
Var Z Var X Var Y 2Cov X ,Y

2Z 2X Y2 2CXY
2Z 2X Y2 2 X Y XY
(c) Prof. Okey Ugweje

319

320

## Characteristic Function of Sum of X and Y

Department of Telecommunications Engineering

Z aX bY

## Moment Generating Function of Sum of X and Y

Department of Telecommunications Engineering

Z ( ) E e j ax by X (a ,b )

Z aX bY

M Z (t ) E et ax by
t ax by

f x, y dxdy
e

Z ( ) E e

ja X
1

jb Y
E e 2

## If X and Y are independent, then

X (a1 )Y (b 2 )

t ax
t by
M Z (t )
e
f x dx
e
f y dy
M X (t ) M Y (t )

Note
The technique for the sum of two RVs is applicable to the
difference of two RVs

321

322

## Matrix Formulation for Functions of 2 RVs

Department of Telecommunications Engineering

## Department of Telecommunications Engineering

Let X1 & X2 be jointly continuous RVs with joint PDF fX1X2(x1, x2)
We want to find the PDF of the random variables Z = X1 + X2
Define two new RVs Y1 and Y2 as a function of X1 and X2

## Under these conditions, Y1 and Y2 are jointly continuous with

joint pdf
fY Y ( y1 , y 2 ) f X X ( x1 , x2 ) J h1h2 y1 , y2
12
12

Y1 g1 x1 , x2 ; Y2 g 2 x1 , x2

## Assume that the function g1 & g2 satisfy the following conditions

y1 = g1(x1, x2) and y2 = g2(x1, x2) can be uniquely solved for x1 and x2 in
terms of y1 and y2 with the solution given by

x1 h1 y1 , y2 ; x2 h2 y1 , y2

Z X Y

## i.e, h1 and h2 are inverse functions of , g1 and g2

h1 and h2 have continuous partial derivatives at all points (y1, y2) such
that

J h1h2
(c) Prof. Okey Ugweje

h1
y1
y1, y2
h2
y1

h1
y2
0 Jacobian
h2
y2
Federal University of Technology, Minna

## Finally, transform the joint PDF of Y1 and Y2 in terms of the

original variables

a f
a f

z x y g1 x, y
w x g2 x, y

## Determine the inverse function

x w h1w, z
y z x h2 w, z z w

323

324

Jh h

1 2

h1w, z
w, z w
h2 w, z
w

a f

Z XY

h1w, z
1 0
z

h2 w, z
1 1
z

a f
a f

z xy g1 x, y

w x g2 x, y

## Apply to the fundamental equation

In terms of the
original variables

x w h1w, z

f
(w, z w)dw
XY

f z (z)

y z h2w, z

f
( x, z x)dx
XY

## If X and Y are independent,

f z ( z)

f (z y) f ( y)dy
f ( x) fY (z x)dx
X
Y
X

Jh h

1 2

## This is known as the convolution integral

(c) Prof. Okey Ugweje

325

326

Z X
Y

## Transform to original variables and integrate

a f
a f

z x g1 x, y
y
w y g2 x, y

1
fz (z) f XY x, z dx
x
x

e j

## If X and Y are independent,

x zy h1w, z
y w h2w, z

1
1
f ( z ) f X x fY z dx f X z fY y dy
z
x
x
x
y

Jh h

1 2

0
1
1
w
w

## Ratio of 2 Random Variables

fwz (w, z) f XY ( x, y) 1 f XY e x, wz j 1

a f

h1w, z
1
z
z
h2 w, z
w2
z

h1w, z
w, z w
h2 w, z
w

327

h1w, z
w, z w
h2 w, z
w

a f

## (c) Prof. Okey Ugweje

h1w, z
z w
z

w
h2 w, z 1 0
z
Federal University of Technology, Minna

328

## Jointly Gaussian Random Variables

Department of Telecommunications Engineering

## Gaussian RVs are important because they show up in every

area of engineering and science
Suppose the PDF of 2 Gaussian RVs X and Y (bivariate
Gaussian density) are

## Apply the fundamental equation

fwz (w, z) f XY ( x, y) w f XY e x, wz j w

f z ( z)

a f

y f XY zy, y dy

## If X and Y are independent

y y 2
exp
fY y

2 y2

2 2
If X and Y are jointly Gaussian, then joint PDF is
1

f ( z ) y f X zy fY y dy 2 x f X x fY x dx
z
z
z

Maximum Function
Z max X,Y

c h

f XY x, y

Minimum Function
Z min X,Y
(c) Prof. Okey Ugweje

x x 2
exp

2 x2

2 2
1

fX x

1
2 x y 1 2xy

where |xy|1
Federal University of Technology, Minna

329

## (c) Prof. Okey Ugweje

LM LMMFH
expM N
MM
N

x x
x

I 2 2 xy F x x I FG y y IJ FG y y IJ 2 OP O
K
H x K H y K H y K PQ P
PP
2FH1 2xy IK

PQ

330

## The PDF is centered at x and y and the shape

depends on the values of x, y and xy
Since

## Multiple Random Variables

fXY ( x, y ) fX ( x) fY ( y )

## X and Y are not independent

But observe that if xy = 0, then fXY(x,y) = fX(x) fY(y)
We can conclude that any uncorrelated Gaussian RVs
are also independent

## When you have eliminated the impossible, what

ever remains, however improbable, must be the
truth.
Sir Arther Conan Doyle

331

332

## What you should learn in this Lecture

Department of Telecommunications Engineering

## Statistical Properties of Sum of Two RVs

Expectation of sums of Two RVs
Characteristic Function of sums of Two RVs
etc.,

## Jointly Gaussian Random Variables

Functions of More than 2 Random Variables (a vector)
Multiple Random Variables (more than 2 RVs)
Large Numbers and their properties
Central limit theorem

333

## Events involving many RVs greater than two

An extension from two RVs to N RVs can be made
without much problem using the concept of 1dimensional vector or matrix
Let X1, ..., XN be the components of an N-dimensional
vector RV, i.e.,

X = X1 , X 2 , , X N

334

## B. Marginal CDF of Vector Random Variables

If we substitute in FX(x1, ... ,xN) certain values by we
obtain the JCDF of the remaining variables
For example,

For N random variables X1, X2, ..., XN, the joint CDF is defined as

P[ X 1 x1 , X 2 x2 , , X N xN ],
FX ( x1 , , xN )
P[ X 1 x1 , X 2 x2 , , X N xN ]

Continuous
Discrete

FX ( x1,, xN 1) FX ( x1,, xN 1, )

## Properties are similar to the case of two RVs

1) 0 FX ( x1,, xN ) 1, X R N

2) FX (, ,,) 1

## 4) FX ( x1,, xN ) is continuous from the right

5) FX ( x1,, xN ) is nondecreasin g
(c) Prof. Okey Ugweje

335

336

## C. Joint Density of Vector Random Variables

The joint PDF of X = [X1, X2, ..., XN], is defined as

f X ( x1,, xN )

## Marginal densities are obtained by integrating out the non

required variables

FX ( x1,, xN )
x1x2x N

f X ( x1)

## If we know the joint PDF, then

x x
n n 1

x
1

z z z f

( x1,, xn )dx2dxn

## More generally, the marginal joint PDF of any k of the N RVs

can be found by integrating the PDF over the remaining n-k
variables
For example, for n = 4

FX ( x1 , , xn ) f X ( x1 , , xn )dx1dx2 dx
n

f X ( x2 x4 ) f X ( x1 , x2 , x3 , x4 ) dx1dx3

337

338

## However, if xk are independent in pairs, they are not

necessarily independent
It is possible that

The N Random Variables X1, X2, ..., XN are independent if the events
X1 x1, X2 x2, ..., Xn xn are independent
This implies that
FX ( x1,, xN ) FX ( x1) FX ( x2 )FX ( xN )

f X ( x1, x2 ) f X ( x1) f X ( x2 )
f X ( x1, x3) f X ( x1) f X ( x3)
f X ( x2 , x3) f X ( x2 ) f X ( x3)

f X ( x1,, xN ) f X ( x1) f X ( x2 ) f X ( xN )

pX ( x1,, xN ) pX ( x1) pX ( x2 ) pX ( xN )
It follows that any subset of xi is a set of independent random variables
For example for N = 3 and x1, x2, x3 are independent, then

but
f X ( x1, x2 , x3) f X ( x1) f X ( x2 ) f X ( x3)

## f X ( x1, x2 , x3) f X ( x1) f X ( x2 ) f X ( x3)

f X ( x1, x2 ) f X ( x1) f X ( x2 )
f X ( x1, x3) f X ( x1) f X ( x3)
f X ( x2 , x3) f X ( x2 ) f X ( x3)
(c) Prof. Okey Ugweje

339

340

## The joint expectation of vector random variable is given by

f X ( xN ,, xk 1| xk ,, x1) f X ( x1,, xk ,, x N )
f X ( x1,, xk )

E[ x1, x2 ,, xN ]

f X ( x2 , x3 )

## We can rewrite the expression as follows

f X ( x1,, xk ,, xN ) f X ( xN ,, xk 1| xk ,, x1) f X ( x1,, xk )

This implies that we can use chain rule to write the joint pdf as
f X ( x1,, xN ) f X ( xN |x1,, xN 1) f X ( x1,, xN 1)
f X ( xN |x1,, xN 1) f X ( xN 1| x1,, xN 2 ) f X ( x1,, xN 2 )
f X ( xN |x1,, xN 1) f X ( xN 1| x1,, xN 2 ) f X ( x2| x1) f X ( x1)

Correspondingly,
FX ( xN ,, xk 1| xk ,, x1)

z z
xN

xk 1

## (c) Prof. Okey Ugweje

z z bx ,x ,,x g f

( x1 ,, x N )dx1dxN

## or in vector notation we have

E[XN ]

z z

X N f X (X N )dXN

## For N random variables X1, X2, , XN and some function of

these random variables g(X1, X2, , XN), the expected value is
given by

h z z gbx ,x ,,x g f

E[ x1, x2 ,, x N ]

( x1,, x N )dx1dxN

341

342

## For N random variables X1, X2, , XN, the (n1 + n2 +

+ nN)-order joint moment are defined by

## H.Joint Central Moments and Variance of Vector RV

n n

1 2

E[ x1 1 , x2 2 , , xNn ]

h c X h c X h ]
z z b X g b X g b X g
E[ X 1 1

## x1 1 , x2 2 , , xNN f X ( x1 , , xN )dx1 dxN

n

For N random variables X1, X2, , XN, the (n1 + n2 + + nN)order joint moment are defined by

n1

n
1

n2

nN

n
2

f X ( x1,, x N )dx1dxN

343

344

## I. Characteristic Functions of Vector Random Variable

X 1, 2 ,, N

g E[e b

j 1 X1 2 X2 ,, N X N

z z

## K. N Jointly Gaussian Random Variables

N random variables X1, X2, , XN, are called jointly Gaussian if
there density function can be written as

g]

## e jb1X1 2 X2 ,, N X N g fX ( x1,, x N )dx1dxN

b g a2 f1 K

f X XN

If Independent,
X 1, 2 ,, N E[e jb 1 X1g]E[e jb 2 X 2 g]E[e jb N X N g]

c h c h

c h

X 1 X 2 X N

z z

345

LM
N

i j
Xi X j , i j

OP
Q

346

k 1

k 1

## Mean of the Product of RVs

LM
MN
2
1

LM
N

OP
Q

E ak X k ak E X k

1 2
22

OP
PQ

k 1

## Variance of the Sum of RVs

LM
N

OP RS c
Q T

gUVW
h b
a a E c X E X hb X E X g
a Varb X g a a Covb X X g
N

Var ai Xi E a j X j E X j ak Xk E Xk
i 1
j 1
k 1
N

j 1k 1
N

j 1

## Federal University of Technology, Minna

Independence is assumed

k 1

## (c) Prof. Okey Ugweje

N1

OP
PP
PQ

E ak X k ak E X k

22

C1N
C2 N

CNN

## Assume that we have N random variables, X1, X2, , XN

Mean of the Sum of RVs

2
Xi ,

21

ij

C12
C22

CN 2

11

## Department of Telecommunications Engineering

R|
C S
|TC

LMC
C
KM
MM
NC

M X t1 M X t2 M X t N
(c) Prof. Okey Ugweje

If independent,
t X
t X
t X
M X t1 , t2 , , t N E[e 1 1 ]E[e 2 2 ] E[e N N ]

LM 1 aXmf K aXmfOP
N2
Q

exp

LM OP

mM P
MM PP
N Q

1
2

## where X and m are column vector, and K is the covariance

matrix all defined by

LM x OP
x
X M P,
MM PP
Nx Q

N
2

347

2
j

j 1k 1
jk

348

## Important Properties of Large RVs - 1

Department of Telecommunications Engineering

## Important Properties of Large RVs - 2

Department of Telecommunications Engineering

## Sample Average (SA)

Suppose that random variables X1, X2, , XN, are independent
and identically distributed (iid), each with mean and variance

## This means that the mean value of n is the same as the

mean value of the RV Xk

## The variance of n is given by

N X 1 X 2 X N N1 X k
k 1

## Properties of Sample Average

The expectation of n is given by
N
E N E N1 X k
k 1
N
N1 E X
k 1 k
(c) Prof. Okey Ugweje

1 N
X
N k 1 k

OP ELMFH
Q N

1 N
X
N k 1 k

IK OP
Q
2

2
X

1 N N
E
N 2 j 1 k 1

X j X k X2
1

2 E X 2j 2 E X j X k X2
N j 1
N j 1

1 N
E
N k 1

X k N1 N

k 1
j k

but

E X 2 X2 E X X2 X
2

349

## Important Properties of Large RVs - 3

Department of Telecommunications Engineering

350

## Important Properties of Large RVs - 4

Department of Telecommunications Engineering

hence

Var N

1
N2

N
1 N N
2
2
X X 2 E X j E X k X

1
j
N
j 1
k 1

1
N2

X2

j k

1
(N (N
N2

1) X2 X2

## In fact, this is the premise of the Chebyshev inequality

which states that
Var N
P N E N a
2
a

X2

Substituting

This means that the variance of n is 1/n times the variance of the RV
Xk

Var N 0 as N

2
P N a 2
Na

## The complement will be

2
P N a 1 2
Na

This implies that the probability that n is close to the true mean
approaches zero as N becomes larger and larger

LM
N

1 N
1 N

Var N E X X k X2
j
N
k 1
N j 1

Var N Var

351

352

## What you should learn in this Lecture

Department of Telecommunications Engineering

## Definition and Specifications of one Random process

Sample distribution and density functions of random
Processes
Some important Random Processes (independent
increment)
Statistical properties of Random processes

Stochastic Processes
(a.k.a. Random Processes)
It is remarkable that a science which began with the
consideration of games of chance should have become
the most important object of human knowledge.
Laplace Pierre Simon, 1812

353

## Expectation of Random Processes

Variance of Random Processes
Autocorrelation function and its properties
Correlation Coefficient
Power Spectral Density of a Random Process and its
properties
(c) Prof. Okey Ugweje

354

356

Definitions

355

## (c) Prof. Okey Ugweje

Stochastic Processes - 1

Stochastic Processes - 2

## Recall that a RV is a rule for assigning a number on

the real line to an experiment S

## The collection of such waveforms form a stochastic

process.
The set of {k} and the time index t can be continuous
or discrete (countably infinite or finite) as well.
For fixed k S (the set of all experimental outcomes),
X(t, ) is a specific time function
In other words, a Random Process is a rule for
assigning to every outcome of an experiment , a
function of time, X(t, )
A RP can be viewed as a function of two variables:

experiment

## Often, random data collected from an experiment are

functions of time
If time factor is included in our experiment, then a
Random Process (RP) arises
Let denote the random outcome of an experiment.
To every such outcome suppose a waveform X(t, ) is
assigned.

357

## (c) Prof. Okey Ugweje

Stochastic Processes - 3

## Consider random experiment specified by outcomes Si

from some sample space S
x1(t)

S
Sn

A realization,
sample path,
or sample
function

x2(t)
xn(t)
t

tk

## To every outcome , we assign, according to some

rule, a time function X(t, )
A specific event, say j, X(t, j) signifies a single time
function
Since a RP is a function of two variables, t and , one
or both of these may be chosen to be fixed
If the fixed values are denoted by a subscript, we obtain:

tk+1

Observation interval

## For a fixed time tk inside the observation space, a set of

sample functions

X (t j , ) = X (t j ) is a random variable

X j t, , j 1, 2,, n

X (t , ) = X (t ) is a random process
X (ti , j ) = X (ti , j ) is a real number

## are observed, where j is a member of S

(c) Prof. Okey Ugweje

358

Stochastic Processes - 4

S1 S
2

359

## Federal University of Technology, Minna

360

Stochastic Processes - 5

Stochastic Processes - 6

## From above illustration, we can conclude that given a

RP, if we sample at a given time, we obtain a RV.

## Equivalently, a RP is the mapping of outcome of a

random experiment to function of time.

## These time indexed family of random variables {X(tk,

1),, X(tk, n)}, (X(t)) are known as RP.
The ensemble of all such realizations X(t, ) over time
represents the stochastic process X(t).
A stochastic process X(t) is a collection of time functions
corresponding to various outcomes of an experiment.

361

## e.g. X(t) = acos(0t + ), where is a uniformly distributed

random variable in (0, 2) represents a stochastic process.

## To distinguish a RV from a RP, we note that

1. the outcome of a RV is mapped into a number on the real
line
2. the outcome of a RP is mapped into a function of time, t

## (c) Prof. Okey Ugweje

Stochastic Processes - 7

## Federal University of Technology, Minna

Stochastic Processes - 8

## Examples of Stochastic Processes abound in nature, e.g.,

1. Stock market fluctuations
2. Brownian motion
3. Information signals such as voice (speech), TV, computer
data sequence, electrical noise, etc.
4. Brain/heart waves
(electroencephalogram/electrocardiograms)
5. Various queuing systems
6. Sound (or music) signals
7. Random sinusoidal signal
8. Buffer content of Network Routers
10.Random binary sequences, etc

## Classification of Random Processes:

A RP can be classified as discrete-time or continuous-time

362

363

## Continuous RP (uncountable collection of RVs)

Random Process=
Discrete RP (countable collection of RVs)

## Specifying a Random Process

Question: How do we characterize the probabilistic behavior of a RP?
Answer: We must specify the joint CDF/PDF for an infinite of RVs!
Since this is not possible, we must select a subset of k RVs and then
specify the joint probabilities
The idea is that event of interest do not necessarily involve all the
RPs
Loosely speaking, a RP is just an infinite bunch of RVs with slightly
different notation, one for each time, t
Federal University of Technology, Minna

364

Stochastic Processes - 9

Stochastic Processes - 10

## From the general definite we can obtain specific cases:

A) First-order distribution of the random process x(t) is

## Let X1, ..., Xk be k RVs obtained by sampling the random

process X(t,s) at times t1, t2, ..., tk, i.e.,

a f

a f

a f

X1 X t1,s , X2 X t2 ,s ,, Xk X tk ,s ,

FX x,t P Xt x

## Notice that FX(x,t) depends on t, since for a different t, we

obtain a different RV
B) First-order density of the random process x(t) is

af

af

af

## If the RP is continuous, then the the k-dimensional joint PDF

can be obtained as follows

fX x1,, xk ; t1,, tk

k FX x1,, xk ; t1,, tk
x1x2xk

## C) Second-order distribution of the random process

For t = t1 and t = t2, X(t) represents two different random variables X1
= X(t1) and X2 = X(t2) respectively. Their joint distribution is given by

a f

f X x, t FX x,t
x

af

af

## FX x1, x2 ; t1, t2 P X t1 x1, X t2 x2

pX x1,, xk P X1 x1, X2 x2 ,, Xk xk ,
Federal University of Technology, Minna

365

## Federal University of Technology, Minna

366

Stochastic Processes - 11
Department of Telecommunications Engineering

## D) Second-order density of the random process x(t) is

fX x1, x2 ;t1,t2

2 FX x1, x2 ; t1, t2
x1x2

## E) This can be extended to k-order distribution or density functions. For

example, the nth order density function of a RP X(t) is

f X ( x1 , x2 , xn , t1 , t 2 , t n )
As in random variables, marginal cdf and pdf of a RP is given
by

a f a

## FX x1,; t1 FX x1, ; t1 ,t2

a f z f ax , x ;t ,t fdx

f X x1;t1

2 1 2

## It is important to mention that these descriptions are partial

description since full descriptions are not possible.
Complete specification of the stochastic process X(t) requires
the knowledge of
f X ( x1 , x2 , xn , t1 , t 2 , t n )
for all ti, i = 1,2, , n, and for all n.
(c) Prof. Okey Ugweje

Processes

367

## The concept of randomness and coincidence will

be obsolete when people can finally define a
formulation of patterned interaction between all
things within the universe.
Toba Beta
(c) Prof. Okey Ugweje

368

## Department of Telecommunications Engineering

A.Mean of X(t)
First-order Random Processes (i.e., function of one
random process)
The mean of a random process X(t) is given by

m X (t ) X (t ) E X t
x(t ) f X x, t dx

## Is an alternative way of computing the mean

function of a RP X(t) by averaging it over the time
interval [-T, T] or some period
The time average is defined by

index t.

369

X t

1 T

2T T

X t

1 T2

T T2

x (t ) dt

or

x (t ) dt

370

## Department of Telecommunications Engineering

B. Variance of X(t)
The variance of a random process X(t) is given by

C. Autocorrelation of X(t)
Autocorrelation Function (ACF) of a RP x(t) is denoted
as either RXX(t1,t2) or RX(t1,t2) or RXX(t, t+)
Autocorrelation of a random process X(t) is given by

X2 (t ) Var X t

RXX t1 , t2 E X t1 X * t2

2
E X t X t

E X 2 t E X t

x1 x2 f x1 , x2 ; t1 , t2 dx1dx2

It follows that

a f

af af

RXX t1, t2 E X t2 X * t1

of x(t)
2

a f

RXX t, t E X t 0

371

372

## The last expression implies that the autocorrelation of

a random process X(t) is a positive definite function
Note that

D. Autocovariance of X(t)
The autocovariance of a RP X(t) is given by
CX t1,t2 E X t1 X t1 X t2 X t2

b a f a fg

E X t1 X t2

a f

a f a f

## RX t1,t1 2 RX t1,t2 RX t2 ,t2

a f l a f a fql a f a fq
E Xat f Xat f at fXat f at fXat f at f at f
E Xat f Xat f at f at f
1

Thus

C X t1 , t2 RX t1 , t2 X t1 X t2

## The value of C(t1, t2) when t1 = t2 = t is the variance of

X(t), i.e.,
2
C X t , t Var X t E X t X t
(c) Prof. Okey Ugweje

373

374

## Properties of Random Processes - 7

Department of Telecommunications Engineering

## Department of Telecommunications Engineering

E. Correlation Coefficient
The correlation coefficient of a RP X(t) is given by

## PSD is used to describe and estimate the properties of an

observed experiment in the frequency domain
It describes the distribution of the signal power in frequency
domain
Knowledge of the Fourier Transform (FT) is important in the
understanding of the frequency domain description of RPs
Recall that the FT of a random process X(t) is defined as
j 2 ft

X f
x(t )e
dt

X t1 , t2

C X t1 , t2

C X t1 , t1 C X t2 , t2

where

X t1 , t2 1

## and the inverse FT is given by

j 2 ft

x t
X ( f )e
df
But the above equations cannot be computed for realistic
samples of all RPs
A limited definition is required assuming ergodic process

375

376

## In order to find the FT, it is necessary to modify the function and

limit the samples in some observation interval, say [-T, T]
Since X(t) have infinite energy and may not have a Fourier
Transform
For a RP X(t), let XT(t) be defined as that portion of the sample
function X(t) that exist between -T and T, i.e.,

Rx(t),
X (t ) S
T 0,
Hence

T t T

X T f TT xT (t )e

af

j 2 ft

dt

377

S X f
RX e j 2 f d

## 4. If X(t) is stationary, then the power content is

determined from the PSD as follows

RX 0 E X 2 t

SX ( f )df

This is the area under the PSD curve. It is also known as the
Average Power
Conversely,

cos 2 f j sin 2 f d

RX cos 2 f d
RX j sin 2 f d

RX cos 2 f d

SX 0

378

SX(-f)

## 1. SX(f) is a nonnegative function of f, SX(f) 0

S X ( f )e j2f df

af

## If we know the autocorrelation function, we can compute the

PSD and vice versa (transform pairs)

## Department of Telecommunications Engineering

RX

af z

WienerKhintchine
Theorem

RX S X f
Federal University of Technology, Minna

discrete

RX F 1 S X f

1
2
S X f
XT f
2T

RX

Conversely

## (c) Prof. Okey Ugweje

R
a f |Sz R (k )e j2kf ,
|T
k

## From this definition, the PSD denoted by SX(f) is given by

af

af

SX f

else

S X f lim 1 E X T f
T
2T

## Another definition of PSD is obtained from the ACF

For a stationary RP X(t), the PSD SX(f) is the Fourier Transform
of the ACF
S X f F RX
That is,

R ( )e j2f d , continuous

RX ( )d

SX ( f )e j2f df
Federal University of Technology, Minna

379

380

## Classes of Random Processes - 1

Department of Telecommunications Engineering

## A.Stationary Random Processes

X(t) is said to be stationary if its statistical properties
are time independent
This means that an observation at time (t0, t1) is the
same as observation at time (t0+, t1+ ).
That is,

## Classes of Random Processes

Strict Sense Stationary
Wide Sense Stationary
Ergodic Processes
Cyclostationary Processes

381

## Classes of Random Processes - 2

(c)
Prof.
OkeyUgweje
Ugweje
Prof.
Okey

Federal
University of Technology,
Minna
Federal University
of Technology,
Minna

382

## Intuitively, a stationary process has a behavior

independent of time
The concept of stationarity of a RP is similar to the
idea of Steady State in the analysis of the response
of electrical circuits
Statistical properties are invariant with respect to
time translation

## Two Main Types:

1. Strict Sense Stationary (SSS)
A random process X(t) is said to be stationary in the strict
sense if its statistical properties are time independent
This is, the process X(t) and X(t+c) have the same statistics
for any value of c

## The CDF of X(t) is same as the CDF of X(t+c).

t X(t)
c Fas
the PDF of X(t+c)
The PDF
FX t of
FX tis
tsame
X t

f X t t f X t t c f X t
(c) Prof. Okey Ugweje

383

384

## Hence a process is nth-order SSS for any c, if

f X ( x1 , x2 , xn , t1 , t2 , tn )
f X ( x1 , x2 , xn , t1 c, t2 c , tn c)
where left side represents the joint pdf of the RVs

## X 1 X (t1 ), X 2 X (t2 ), , X n X (tn )

and the right side corresponds to the joint pdf of the
RVs
X1 X (t1 c),
X 2 X (t2 c), ,

X n X (tn c).
(c)
Prof.
OkeyUgweje
Ugweje
Prof.
Okey

## To check for SSS we need to find all the CDF or

PDF as a function of time and then determine all
the moments
By definition it implies that all the moments are
equal and do not depend on time origin
Also, all the joint moments are equal and do not
depend on time

ti , i 1, 2, , n, n 1, 2, and any c.

Federal
University of Technology,
Minna
Federal University
of Technology,
Minna

385

386

## 2. Wide Sense Stationary (WSS)

The condition on SSS random process is very restrictive
It is difficult to prove except in limited cases
For RPs with unlimited observation times, proof of SSS is
virtually impossible
A limited definition of stationarity known as WSS RP is used

## A RP x(t) is said to be stationary in the wide sense if it meets

the following two conditions:

E Xt E Xt

## 2.It autocorrelation (or autocovariance) depends

only on = t1 - t2
R

E X t X t

## This means that that autocorrelation does not depend on

the actual value of t1 and t2, but depends on difference
= t1-t2
The RP that does not satisfy the requirement of
stationary RP (SSS or WSS), is said to be non-stationary
(c) Prof. Okey Ugweje

## Federal University of Technology, Minna

387

(c) Prof.
OkeyUgweje
Ugweje
Prof.
Okey

Federal
University of Technology,
Minna
Federal University
of Technology,
Minna

388

## Classes of Random Processes - 8

Department of Telecommunications Engineering

Note:

## If a process is SSS, then it is also WSS.

The converse is not true except when the process
is Gaussian, i.e., for a Gaussian Process, WSS
also implies SSS

Autocorrelation Function

Stochastic Process
WSS

## It is likely that unlikely things should

happen.

SSS

Aristotle
(c) Prof. Okey Ugweje

389

## Properties of Autocorrelation Function - 1

Department of Telecommunications Engineering

390

## Properties of Autocorrelation Function - 2

Department of Telecommunications Engineering

Proof:

bf

RX 0 E X 2 t

b f bf

RX E X t X t

but

RX RX

RX E X t X t
E X t X t
RX

## This implies that we may also define the

autocorrelation function as

## 3. For WSS process RX() is maximum at the origin i.e.,

bf b f

RX 0 RX

RX E X t X t
(c) Prof. Okey Ugweje

391

392

## Properties of Autocorrelation Function - 3

Department of Telecommunications Engineering

Consider

c b f b fh

E X t X t

## Other Classes of Random Processes

Department of Telecommunications Engineering

b f

bf

bf b f

E X t E X t E X t X t
2 RX 0 2 RX 0
2

Hence

RX 0 R X

## B. Cyclostationary Random Process

A random process X(t) is said to be cyclostationary if
both its mean and Autocorrelation are periodic in
time with period T, i.e.,

m X (t ) E X t kT
RX RX t kT

## 4. If X(t) has a dc component, then RX() will have a

constant component
For example, if X t A then

bf b f

RX E X t X t E A2 A2
5. If X(t) has a periodic component, then RX() will also
have a periodic component with the same period
(c) Prof. Okey Ugweje

393

## Other Classes of Random Process - 1

Department of Telecommunications Engineering

394

## Other Classes of Random Process - 2

Department of Telecommunications Engineering

## C. Ergodic Random Process

Some stationary RPs posses the property that almost
every member of the ensemble exhibits the same
statistical behavior as the whole ensemble
By examining only one typical sample function, it is
possible to determine the statistical behavior of the
whole process
Such processes are said to be Ergodic
If the statistical average is equal to the time
average, the random process is said to be Ergodic

## This statement implies that it is sufficient to examine

one realization of a process and find its time average
rather than considering a large number of realizations
and averaging over all of them
1. Ergodic in the mean

## A stationary RP is Ergodic in the mean if

bf

bf

Xn t E Xn t

2. Ergodic in Autocorrelation

## A stationary RP is Ergodic in autocorrelation if

bg bg

b g

X t1 X t2 RX t1, t2

## A process that does not posses these properties is

non-ergodic
(c) Prof. Okey Ugweje

395

396

## A.Independent Increment Process (IIP)

X(t) is said to have independent (uncorrelated) increments
if for any k and any choice of sampling instants t1 < t2 <
tk the RVs defined by

## Examples of Independent Increment Process are:

Y1

Y2

Yk 1

bg bg
Xbt g Xbt g
X t2 X t1
3

Poison Process,
Weiner Process

If X(t) and Y(t) are such that the RVs X(t1), , X(tn)
and Y(t1) and Y(tn) are mutually independent, then the
processes are independent

bg b g

X tk X tk 1

## are independent RVs

i.e., it possesses independent increments if the changes in
the value of the processes over non-overlapping time
intervals are independent
(c) Prof. Okey Ugweje

397

398

## A RP X(t) is said to be Markov if the future of the process given

the present is independent of the past
This means that a Markov process is a stochastic process
whose past history has no influence on the future, if the
present is specified
That is, for any k and any choice of sampling instants t1 < t2 <
< tk,

## As in random variables, multiple Random Processes

(RPs) are extension of single random processes
Multiple processes arise naturally when dealing with 2
or more RPs defined on the same probability space
Off course, complete description requiring the
specification of all the joint statistical behavior for all
time samples is not possible
We will restrict our study to second-order processes (2
RPs X(t) and Y(t)), which are considered to be
stationary
The following are characteristics of second-order
Random Process

bg

bg

P X t k xk |xk 1, , x1 P X t k xk |xk 1

## A RP that has independent increment is also a Markov Process

Other processes of interest include:
1. Gaussian Processes
2. Brownian Process
3. Renewal Process
4. Regenerative Processes
(c) Prof. Okey Ugweje

399

400

## Cross-Correlation Function (CCF)

Properties of CCF - 1

## CCF describes the relationship between two RPs X(t)

and Y(t) and is given by
RXY t1 , t2 E X t1 Y t2

1
2 XY
1
2

Thus

## It is assumed that X(t) and Y(t) are jointly stationary

Note that RXY(t1, t2) = RXY(t2, t1)

## Federal University of Technology, Minna

RXY E X t Y t
RYX E Y t X t

401

RXY RYX
Note that the above equation simply indicates symmetry. It
does not necessarily indicate that the CCF is even
The ACF of a RP is even but the CCF is not

## (c) Prof. Okey Ugweje

Properties of CCF - 2

## 2. For 2 WSS processes X(t) and Y(t), the CCF is bounded as

follows
RXY RX 0RY 0
CCF does not necessarily have its maximum at = 0. The
maximum can occur anywhere but the value is limited
3. For two WSS processes X(t) and Y(t), the CCF is bounded as

bf

b f

b f

E X 2 t E Y 2 t 2 E X t Y t 0
RX 0 RY 0 2 RXY 0
RXY 1 RX 0 RY 0
2

## 4. If two RPs X(t) and Y(t) are statistically independent, then

RXY RYX

RXY E X t Yt E X t E Yt
E Yt E X t
E Yt X t
RYX

Note that in ACF the value at zero equals mean square value, but in
CCF the value at zero has no special significance

RXY mX mY
RXY RYX

## To demonstrate, consider E[(X(t) Y(t+))2] 0

(c) Prof. Okey Ugweje

402

Properties of CCF - 3

RXY 1 RX 0 RY 0
2

403

## Federal University of Technology, Minna

404

Properties of CCF - 4

Properties of CCF - 5

## b g LMRR bbtt ,,tt gg RR bbtt ,,tt ggOP

N
Q
If X(t) and Y(t) are WSS, then
L R R OP
R M
NR R Q
RXY t1 ,t2

XY

YX

XY

YX

## 9. Equality of two random processes

Two processes X(t) and Y(t) are said to be equal if their
respective time samples are equal, i.e

b g b g

X t , Y t , , for all

XY

## 7. Two RPs X(t) and Y(t) are said to be orthogonal if RXY 0

8. Sum of two Random Processes: Z(t) = X(t) + Y(t)
RZ RX RY RXY RYX
SZ S X SY S XY SYX
S X SY 2S XY
(c) Prof. Okey Ugweje

405

406

## The time cross-correlation functions are defined as

Cross-Covariance (CC)
The cross-covariance of two processes X(t) & Y(t) is
defined as

2
E X (t ) Y (t ) 0, for all t

b g m b g b grmYbt gm bt gr
R bt ,t g m bt gm bt g

## If the two processes are jointly ergodic, then

CXY t1,t2 E X t1 mX t1

1 T
x(t ) y (t )dt
T 2T T
lim 1 TT y (t ) x(t )dt
T 2T

XY

XY lim
YX
Hence

XY RXY ,
(c) Prof. Okey Ugweje

## X(t) and Y(t) are uncorrelated if

b g

CXY t1,t2 0

YX RYX
Federal University of Technology, Minna

407

408

## Time Cross-Correlation Function (TCCF) - 3

Department of Telecommunications Engineering

## Cross-Power Spectral Density (CPSD)

For two RPs, it is possible to define the cross power
density
The cross-power spectral density is defined as

S XY

R|z R ( )e j2f d ,
a f f S R (k )e j2kf ,
|T

XY

XY

## Random Processes in Linear

Systems

continuous
The most important questions of life are, for the
most part, really only problems of probability.

discrete

## Laplace Pierre Simon

(c) Prof. Okey Ugweje

409

410

## Many physical systems involve the processing of

random signals/process, e.g.,

## If input of a system is random, the output is also

bound to be random
Most of the analysis in Electrical Engineering involve
the understating of the relationships between the input
and output of a linear system
With this knowledge, the engineer will design the
systems
It is assumed that the students in this class is already
familiar with the usual method of analyzing linear
systems in time or frequency domain

Prediction

## predicting future values in terms of past values

Filtering and Smoothing

Modulation

## converting signals from low frequency to high

frequency
All signal processing operations involves the
transformation of signals from one time or frequency
function to another
(c) Prof. Okey Ugweje

411

412

## Now given random input of a linear system X(t), we

can find all the statistical characteristics of the output
Y(t), in terms of the input X(t)
If the system is Linear Time Invariant (LTI), then the
response of the system to an arbitrary input is given
by

Linear Network
h(t)

x(t)
x[n]
x(ejw)
X(f)
X(z)
RX(f)
SX(f)

y(t)
y[n]
Y(ejw)
Y(f)
Y(z)
Ry(f)
Sy(f)

h[n]
H(ejw)
H(f)
H(z)

y t h(t ) x(t )

h( ) x(t )d

Time Function
Difference Equation

h(t ) x( )d

Pole-Zero Plot
H - Function
Random Process

413

## Random Processes and Linear Systems

Department of Telecommunications Engineering

414

## Random Processes and Linear Systems

Department of Telecommunications Engineering

## If the input of a LTI system is a random process X(t),

then the output is also a random process given by

Y t
h( ) X (t )d

h(t

## Mean Squared Value:

E Y 2 t E
X (t s )h( s )ds
X (t r )h(r )dr

ds
X (t s ) X (t r )h( s )h(r )dr
E

) X ( )d

ds
E X (t s ) X (t r ) h( s )h(r )dr

But

## Some of the statistical properties of the output are

given as follows:

E X (t s ) X (t r ) RX (t s t r )

Mean:

RX ( r s )

Hence

E Y t E h( ) X (t )d h( ) E X (t ) d

E X t E X t mx

X

mx h( )d

415

416

## Random Processes and Linear Systems

Department of Telecommunications Engineering

## Random Processes and Linear Systems

Department of Telecommunications Engineering

SY f h s h r RX (u )e j 2 f ( u s r ) dsdrdu

Autocorrelation Function:

## E Y t Y t E h( s ) X (t s )ds h(r ) X (t r )dr

h s e j 2 fs ds h r e j 2 fr dr RX (u )e j 2 fu du

H ( f ) H ( f )S X ( f )

H ( f ) SX ( f )

## Power Spectral Density:

RXY E X (t )Y (t )

SY f RY ( )e j 2 ft d

E X (t ) X t r h(r )dr

h s h r RX ( s r )e j 2 ft dsdrd
If we let u = +s-r,
we obtain

E X (t ) X t r h(r )dr

RX r h(r )dr
RX h( )
(c) Prof. Okey Ugweje

417

## Random Processes and Linear Systems

Department of Telecommunications Engineering

## By taking the Fourier Transform of the cross

autocorrelation function, we obtain the cross power
spectral density of the input and output

S XY f H f S X f
Since RXY() = RXY(-), we obtain

S XY f SYX
f H f SX f

419

418