Field Guide To Probability Random Processes and Random Data Analysis

SPIE PRESS | Field Guide
Field Guide to
Probability,
Random Processes, and
Random Data Analysis
Larry C. Andrews
Ronald L. Phillips
Field Guide to
Probability,
Larry C. Andrews
Ronald L. Phillips
SPIE Field Guides

Volume FG22
John E. Greivenkamp, Series Editor
Bellingham, Washington USA

Library of Congress Cataloging-in-Publication Data
Andrews, Larry C.
Field guide to probability, random processes, and random
data analysis / Larry C. Andrews, Ronald L. Phillips.
p. cm. – (Field guide series)
Includes bibliographical references and index.
ISBN 978-0-8194-8701-8
1. Mathematical analysis. 2. Probabilities. 3. Random
data (Statistics) I. Phillips, Ronald L. II. Title.
QA300.A5583 2012
519.2–dc23
2011051386
Published by
SPIE
P.O. Box 10
Bellingham, Washington 98227-0010 USA
Phone: +1.360.676.3290
Fax: +1.360.647.1445
Email: books@spie.org
Web: http://spie.org
Copyright © 2012 Society of Photo-Optical Instrumenta-

tion Engineers (SPIE)
All rights reserved. No part of this publication may be re-

produced or distributed in any form or by any means with-
out written permission of the publisher.
The content of this book reflects the work and thought of

the author. Every effort has been made to publish reliable
and accurate information herein, but the publisher is not
responsible for the validity of the information or for any
outcomes resulting from reliance thereon.
Printed in the United States of America.
First Printing
Introduction to the Series
Welcome to the SPIE Field Guides—a series of publica-
tions written directly for the practicing engineer or sci-
entist. Many textbooks and professional reference books
cover optical principles and techniques in depth. The aim
of the SPIE Field Guides is to distill this information,
providing readers with a handy desk or briefcase refer-
ence that provides basic, essential information about op-
tical principles, techniques, or phenomena, including def-
initions and descriptions, key equations, illustrations, ap-
plication examples, design considerations, and additional
resources. A significant effort will be made to provide a
consistent notation and style between volumes in the se-
ries.
Each SPIE Field Guide addresses a major field of optical
science and technology. The concept of these Field Guides
is a format-intensive presentation based on figures and
equations supplemented by concise explanations. In most
cases, this modular approach places a single topic on a
page, and provides full coverage of that topic on that page.
Highlights, insights, and rules of thumb are displayed in
sidebars to the main text. The appendices at the end of
each Field Guide provide additional information such as
related material outside the main scope of the volume,
key mathematical relationships, and alternative methods.
While complete in their coverage, the concise presentation
may not be appropriate for those new to the field.
The SPIE Field Guides are intended to be living
documents. The modular page-based presentation format
allows them to be easily updated and expanded. We are
interested in your suggestions for new Field Guide topics
as well as what material should be added to an individual
volume to make these Field Guides more useful to you.
Please contact us at fieldguides@SPIE.org.
John E. Greivenkamp, Series Editor

College of Optical Sciences
The University of Arizona
Field Guide to Probability, Random Processes, and Random Data Analysis

The Field Guide Series
Keep information at your fingertips with all of the titles in

the Field Guide Series:
Field Guide to
Adaptive Optics, Robert Tyson & Benjamin Frazier
Atmospheric Optics, Larry Andrews
Binoculars and Scopes, Paul Yoder, Jr. & Daniel
Vukobratovich
Diffractive Optics, Yakov G. Soskind
Geometrical Optics, John Greivenkamp
Illumination, Angelo Arecchi, Tahar Messadi, & John
Koshel
Infrared Systems, Detectors, and FPAs, Second Edition,
Arnold Daniels
Interferometric Optical Testing, Eric Goodwin & Jim
Wyant
Laser Pulse Generation, Rüdiger Paschotta
Lasers, Rüdiger Paschotta
Microscopy, Tomasz Tkaczyk
Optical Fabrication, Ray Williamson
Optical Fiber Technology, Rüdiger Paschotta
Optical Lithography, Chris Mack
Optical Thin Films, Ronald Willey
Polarization, Edward Collett
Radiometry, Barbara Grant
Special Functions for Engineers, Larry Andrews
Spectroscopy, David Ball
Visual and Ophthalmic Optics, Jim Schwiegerling

Field Guide to Probability, Random Processes, and
Developed in basic courses in engineering and science,

mathematical theory usually involves deterministic phe-
nomena. Such is the case for solving a differential equation
that describes a linear system where both input and out-
put are deterministic quantities. In practice, however, the
input to a linear system, such as imaging or radar systems,
can contain a “random” quantity that yields uncertainty
about the output. Such systems must be treated by prob-
abilistic rather than deterministic methods. For this rea-
son, probability theory and random-process theory have
become indispensable tools in the mathematical analysis
of these kinds of engineering systems.
Topics included in this Field Guide are basic probability
theory, random processes, random fields, and random data
analysis. The analysis of random data is less well known
than the other topics, particularly some of the tests for
stationarity, periodicity, and normality.
Much of the material is condensed from the authors’
earlier text Mathematical Techniques for Engineers and
Scientists (SPIE Press, 2003). As is the case for other
volumes in this series, it is assumed that the reader has
some basic knowledge of the subject.
Larry C. Andrews
Professor Emeritus
Townes Laser Institute
CREOL College of Optics
University of Central Florida
Ronald L. Phillips
Professor Emeritus
Townes Laser Institute
CREOL College of Optics
University of Central Florida

vii
Table of Contents
Glossary of Symbols and Notation x
Probability: One Random Variable 1

Terms and Axioms 2
Random Variables and Cumulative
Distribution 3
Probability Density Function 4
Expected Value: Moments 5
Example: Expected Value 6
Expected Value: Characteristic Function 7
Gaussian or Normal Distribution 8
Other Examples of PDFs: Continuous RV 9
Other Examples of PDFs: Discrete RV 12
Chebyshev Inequality 13
Law of Large Numbers 14
Functions of One RV 15
Example: Square-Law Device 16
Example: Half-Wave Rectifier 17
Conditional Probabilities 18
Conditional Probability: Independent Events 19
Conditional CDF and PDF 20
Conditional Expected Values 21
Example: Conditional Expected Value 22
Probability: Two Random Variables 23

Joint and Marginal Cumulative Distributions 24
Joint and Marginal Density Functions 25
Conditional Distributions and Density
Functions 26
Example: Conditional PDF 27
Principle of Maximum Likelihood 28
Independent RVs 29
Expected Value: Moments 30
Example: Expected Value 31
Bivariate Gaussian Distribution 32
Example: Rician Distribution 33
Functions of Two RVs 34
Sum of Two RVs 35
Product and Quotient of Two RVs 36
Conditional Expectations and Mean-Square
Estimation 37

viii
Table of Contents
Sums of N Complex Random Variables 38

Central Limit Theorem 39
Example: Central Limit Theorem 40
Phases Uniformly Distributed on (−π, π) 41
Phases Not Uniformly Distributed on (−π, π) 42
Example: Phases Uniformly Distributed on
(−α, α) 43
Central Limit Theorem Does Not Apply 45
Example: Non-Gaussian Limit 46
Random Processes 48
Random Processes Terminology 49
First- and Second-Order Statistics 50
Stationary Random Processes 51
Autocorrelation and Autocovariance
Functions 52
Wide-Sense Stationary Process 53
Example: Correlation and PDF 54
Time Averages and Ergodicity 55
Structure Functions 56
Cross-Correlation and Cross-Covariance
Functions 57
Power Spectral Density 58
Example: PSD 59
PSD Estimation 60
Bivariate Gaussian Processes 61
Multivariate Gaussian Processes 62
Examples of Covariance Function and PSD 63
Interpretations of Statistical Averages 64
Random Fields 65
Random Fields Terminology 66
Mean and Spatial Covariance Functions 67
1D and 3D Spatial Power Spectrums 68
2D Spatial Power Spectrum 69
Structure Functions 70
Example: PSD 71
Transformations of Random Processes 72

Memoryless Nonlinear Transformations 73
Linear Systems 74
Expected Values of a Linear System 75

ix
Table of Contents
Example: White Noise 76

Detection Devices 77
Zero-Crossing Problem 78
Random Data Analysis 79

Tests for Stationarity, Periodicity, and
Normality 80
Nonstationary Data Analysis for Mean 81
Analysis for Single Time Record 82
Runs Test for Stationarity 83
Equation Summary 85
Bibliography 90
Index 91

x
Glossary of Symbols and Notation
a, x, u, etc. Random variable, process, or field

Bu (R ) Autocovariance or covariance function of
random field
C x (τ) Autocovariance or covariance function of
random process
C xy (τ) Cross-covariance function
CDF Cumulative distribution function
Cov Covariance
D x (τ) Structure function
E [.] Expectation operator
E [ g(x)| A ] Conditional expectation operator
f x ( x), f x ( x, t) Probability density function
f x ( x| A ) Conditional probability density
Fx ( x), Fx ( x, t) Cumulative distribution function
Fx ( x| A ) Conditional cumulative distribution
function
pFq Generalized hypergeometric function
h( t) Impulse response function
H (ω) Transfer function
I p ( x) Modified Bessel function of the first kind
J p ( x) Bessel function of the first kind
K p ( x) Modified Bessel function of the second
kind
m, m( t) Mean value
mk k’th standard statistical moment
n! Factorial function
PDF Probability density function
Pr Probability
P r (B | A ) Conditional probability
PSD Power spectral density
RV Random variable
R x (τ) Autocorrelation or correlation function
R xy (τ) Cross-correlation function
Rx (τ) Long-time-average correlation function
S x (ω), S u (κ) Power spectral density function
U ( x − a) Unit step function

xi
Glossary of Symbols and Notation
Var Variance
Var[x| A ] Conditional variance
x( t) Time average
z∗ Complex conjugate of z
γ( c, x) Incomplete gamma function
Γ( x) Gamma function
δ( x − a) Dirac delta function (impulse function)
µk k’th central statistical moment
µ̂( t) Estimator of mean value
σ2 , σ2x Variance
τ Time difference t2 − t1
Φx ( s ) Characteristic function
|| Absolute value
∈
µ ¶
Belonging to
a
Binomial coefficient
n
〈〉 Ensemble average
{} Event
∩ Intersection

1
Probability: One Random Variable

The origins of probability theory can be traced back to
correspondence between Blaise Pascal (1623–1662) and
Pierre Fermat (1601–1665) concerning gambling games.
Their theory, considered the first foundation of probability
theory, remained largely a tool reserved for games of
chance until Pierre S. Laplace (1749–1827) and Karl
Friedrich Gauss (1777–1855) applied it to other problems.
Further interest in probability was generated when it
was recognized that the probability of an event often
depends on preceding outcomes, e.g., in the kinetic theory
of gases and many social and biological phenomena. In
Russia, for example, the study of such linked chains
of events (now known as Markov chains or Markov
processes) was initiated in 1906 by Andrei A. Markov
(1856–1922), a student of Chebyshev. Important advances
in Markov processes were made by Andrei N. Kolmogorov
(1903–1987) in 1931. Kolmogorov is also credited with
establishing modern probability theory in 1933 by his
use of the theory of measure and integration, which
was advanced in the early twentieth century by Henri
Lebesgue (1875–1941) and Félix E. E. Borel (1871–1956).

2 Probability: One Random Variable
Terms and Axioms
Some of the terms used in discussing random happenings

include the following.
Random experiment: An experiment with an uncertain
outcome (e.g., flipping a coin). A single instance of an
experiment is called a trial.
Event: A collection of possible outcomes or a random
experiment.
Sample space: The entire set of possible outcomes (the
universal set).
Relative frequency approach: If N is the number of
equally likely outcomes of an experiment, and N A is the
number of outcomes favorable to event A , the relative
frequency of event A is simply
NA
r( A ) =
N
If N is sufficiently large, then we associate r ( A ) with the
probability Pr( A ) of event A . If S is the universal set, it
follows that Pr(S ) = 1.
Axiomatic approach: With each event A ∈ S , where ∈
means contained in, we associate a number Pr( A ), called
the probability of A , such that the following axioms of
probability are satisfied:
Axiom 1. For every A ∈ S ,
0 ≤ Pr( A ) ≤ 1
Axiom 2. The entire universal sample space S has the
probability
Pr(S ) = 1
Axiom 3. For mutually exclusive events A and B,
Pr( A or B) = Pr( A ) + Pr(B)

Random Variables and Cumulative Distribution
A probability distribution shows the probabilities

observed in an experiment. The quantity observed in a
given trial of an experiment is a number called a random
variable (RV). In the following, RVs are designated by
boldface letters such as x and y.
• Discrete RV: a variable that can only take on certain

discrete values.
• Continuous RV: a variable that can assume any
value within a specified range (possibly infinite).
For a given RV x, there are three primary events to

consider involving probabilities:
{x ≤ a}, {a < x ≤ b}, {x > b}
For the general event {x ≤ x}, where x is any real number,

we define the cumulative distribution function (CDF)
as
Fx ( x) = Pr(x ≤ x), −∞ < x < ∞
The CDF is a probability and thus satisfies the following

properties:
1. 0 ≤ Fx ( x) ≤ 1, −∞ < x < ∞
2. Fx (a) ≤ Fx (b), for a < b
3. Fx (−∞) = 0, Fx (∞) = 1
We also note that
Pr(a < x ≤ b) = Fx ( b) − Fx (a)

Pr(x > x) = 1 − Fx ( x)

Probability Density Function
If x is a continuous RV, its probability density

function (PDF) is related to its CDF by
dFx ( x)
f x ( x) =
dx
Thus, the CDF can also be recovered from the PDF via
integration, i.e.,
Z x
Fx ( x) = f x ( u) du
−∞
The shaded area in the

figure represents the CDF;
hence,
Z b
Pr(a < x ≤ b) = Fx ( b) − Fx (b) = f x ( u) du
a
Because the probability Fx ( x) is nondecreasing, it follows

that
f x ( x ) ≥ 0, −∞ < x < ∞
Also, by virtue of axiom 2, we see that

Z ∞
f x ( x) dx = 1
−∞
That is, the total area under the PDF curve is always
unity.
For a discrete RV x that takes on values xk with
probabilities Pr(x = xk ), k = 1, 2, 3, . . . , it follows that
∞
X ∞
X
Fx ( x) = Pr(x = xk )U ( x − xk ), f x ( x) = Pr(x = xk )δ( x − xk )
k=1 k=1
where U ( x − a) is the unit step function, and δ( x − a) =

dU ( x − a)/ dx is the Dirac delta function.

Expected Value: Moments
If x is a continuous RV, the expected value of any

function of x, say g(x), is defined by
Z ∞
E [ g(x)] = g( x) f x ( x) dx
−∞
For a case when g(x) = xn , n = 1, 2, 3, . . . , one obtains

standard statistical moments
Z ∞
E [xn ] = m n = x n f x ( x) dx, n = 1 , 2, 3, . . .
−∞
∞ R
The first moment m 1 = m = −∞ x f x ( x) dx is called the mean
(or expected value) of the RV x. The mean is the value
around which most other values of the RV tend to cluster.
Variance is related to the first and second moments by
Var(x) ≡ σ2x = m 2 − m21

p
Related quantity σx = Var(x) is the standard devia-
tion, and roughly measures the dispersion or width of the
PDF about the mean value.
Higher-order statistical moments m n , n = 3, 4, 5, . . . describe
characteristic behavior in the tails of the PDF.
Central moments describe the distribution around the
mean m and are defined by
Z ∞
n
E [(x − m) ] = µn = ( x − m)n f x ( x) dx, n = 2 , 3, 4, . . .
−∞
Variance is the central moment Var(x) = µ2 .

Central moments are related to the standard moments by
Xn (−1) k n!
µn = m k E [xn−k ], n = 2, 3, 4, . . .
k=0 k !( n − k )!

Example: Expected Value
Calculate moments E [xn ], n = 1, 2, 3 . . . of the Rayleigh

distribution
x 2 2
f x ( x) = 2 e− x /2b U ( x)
b
where U ( x) is the step function. From the first two

moments, calculate the variance.
Solution: From definition of the moments, we have
Z
1 ∞ 2 2
n
E [x ] = x n+1 e− x /2b dx (Let t = x2 /2 b2 )
b2 0
Z ∞
= 2n/2 b n e− t t n/2 dt
0
= 2n/2 b n Γ(1 + n/2), n = 1, 2, 3 , . . .
where Γ( x) is the gamma function. For special cases n = 1

and n = 2, we find that
r
π
E [x] = m 1 = b, E [x2 ] = m 2 = 2 b2
2
From these two moments, we can calculate the variance

³ π´ 2
Var(x) = m 2 − m21 = 2 − b
2
In mathematics, a moment can be interpreted as a

quantitative measure of the shape of a set of points.
The “second moment,” for example, is widely used and
measures the “width” of a set of points, or distribution,
in one dimension. Other moments describe other aspects
of a distribution, such as how the distribution is skewed
from its mean. In general, the higher-order moments
describe the “tails” of the distribution. Even moments
describe the symmetry of the tails, and odd moments
describe the asymmetry of the tails.

Expected Value: Characteristic Function
A useful expected value of RV x, called the characteristic

function of x, is defined by
Z ∞
Φx ( s) = E [ e isx ] = e isx f x ( x) dx
−∞
p
where the imaginary quantity i = −1. The PDF can be
recovered from the characteristic function through the
inverse relation
Z
1 ∞
f x ( x) = e− isx Φx ( s) ds
2π −∞
We recognize from these expressions that the PDF and

characteristic function actually satisfy Fourier transform
and inverse Fourier transform relations.
The Fourier transform and inverse transform pair are

not uniquely defined. For example, the constant 1/2π can
appear in front of either integral, and either integral
can be the Fourier transform. Once defined, however, the
transforms are unique.
One of the most practical properties of a characteristic

function is its relation to the moments of RV x. For s = 0,
first observe that
Z ∞
Φx (0) = f x ( x) dx = 1
−∞
whereas in general, the standard moments are related by
E [xn ] = (− i )n Φ(xn) (0), n = 1, 2 , 3, . . .
Thus, a characteristic function is sometimes called a

moment-generating function.
For a discrete RV x, the characteristic function is
defined by
∞
X
Φx ( s ) = e isxk Pr(x = xk )
k=1

Gaussian or Normal Distribution
Perhaps the most important of all PDFs is the Gaussian

(normal) distribution, defined by
" #
1 ( x − m )2
f x ( x) = p exp − , −∞ < x < ∞
σ 2π 2σ2
where m and σ2 are mean and variance, respectively. Its

importance stems from the fact that the Gaussian PDF is
a limit distribution for large sums of independent RVs.
The related CDF is
· µ ¶¸ Z x
1 x−m 2 2
Fx ( x) = 1 + erf p , erf( x) = p e− t dt
2 σ 2 π 0
where erf( x) is the error function, and the associated

characteristic function is
Ã !
σ2 s2
Φx ( s) = exp ims −
2
With zero mean ( m = 0), the odd-order moments of x are all

zero and the even-order moments are related to variance
σ2 by
(2 n)! 2n
E [x2n ] = σ , n = 1, 2, 3, . . .
2 n n!
Also, for a zero-mean Gaussian RV, it can be shown that


1 · 3 · · · (n − 1)σn ,
s for n = 2k, k = 1, 2, 3, . . .
n
E [| x| ] = 2 k

 2 k!σ2k+1 , for n = 2 k + 1, k = 1, 2, 3, . . .
π

Other Examples of PDFs: Continuous RV
Uniform distribution:

 1 , a<x<b
f x ( x) = b − a

0, otherwise
The mean and variance, respectively, are m = (a + b)/2 and
σ2x = ( b − a)2 /12. The CDF is


 0, x≤a

x−a
Fx ( x) = , a<x≤b

 b−a

1, x>b
The uniform distribution most commonly found in random

signal processes is described by

 1 , −π < x < π
f x ( x ) = 2π

0, otherwise
Rayleigh distribution:
x 2 2
f x ( x) = 2 e− x /2b U ( x)
b
p
The mean of a Rayleigh variate is given by m = b π/2, and
the variance is given by σ2x = (4 − π) b2 /2. The corresponding
CDF is
2 2
Fx ( x) = (1 − e− x /2b )U ( x)
Gamma distribution:
x c−1 − x
f x ( x) = e U ( x ), c>0
Γ( c)
Parameter c represents both the mean and variance of
gamma variate x. The CDF is
Z x
1
Fx ( x) = γ( c, x)U ( x), γ( c, x) = e− t t c−1 dt
Γ( c) 0
where γ( c, x) is the incomplete gamma function.

Other Examples of PDFs: Continuous RV (cont.)
The special case c = 1 produces the negative exponen-

tial PDF:
f x ( x) = e − x U ( x)
Beta distribution:

α−1 (1 − x)β−1
x

, 0<x<1
f x ( x) = B(α, β)

 0, otherwise
where B(α, β) = ΓΓ((αα)+Γ(ββ)) is the beta function. The mean and

variance are, respectively,
α αβ
m= , σ2x =
α+β (α + β)2 (α + β + 1)
whereas the CDF is proportional to the incomplete beta
function B x (α, β), namely,
B x (α, β) xα Γ(β) X ∞ (−1)n x n
Fx ( x) = = , 0≤x≤1
B(α, β) B(α, β) n=0 Γ(β − n)(α + n) n!
Cauchy distribution:
α/π
f x ( x) = , −∞ < x < ∞
α2 + ( x − θ)2
The mean and variance of the Cauchy distribution do not
exist, but the parameter θ identifies the median and mode.
The CDF is given by
µ ¶
1 1 x−θ
Fx ( x) = + tan−1 , −∞ < x < ∞
2 π α
Although the moments of the Cauchy distribution do not
exist, it does have the characteristic function
Φx ( s) = exp( isθ − α| s|), −∞ < s < ∞
K distribution:
2α ¡ p ¢
f x ( x) = (α x)(α−1)/2 K α−1 2 α x U ( x)
Γ(α)

Other Examples of PDFs: Continuous RV (cont.)
The mean has been selected as unity for convenience, and

the variance is Var(x) = 2/α. The function K p ( x) is the
modified Bessel function of the second kind.
K distribution was originally proposed as a model for non-
Rayleigh sea echo, but it is also an excellent model for
predicting statistics in a variety of scattering problems.
Gamma–gamma distribution:
2(αβ)(α+β)/2 (α+β)/2−1 ³ p ´
f x ( x) = x K α−β 2 αβ x U ( x)
Γ(α)Γ(β)
This distribution is a generalization of K distribution and

reduces to the K distribution when β = 1. The mean in this
case has been selected as unity, and the variance is
1 1 1
Var(x) = + + −1
α β αβ
This model has been closely linked with the PDF of

irradiance fluctuations of an optical wave propagating
through atmospheric turbulence.
The CDF for the gamma–gamma PDF is
π
Fx ( x) =
sin[π(α − β)]Γ(α)Γ(β)
(
(αβ x)β
× 1 F2 (β; β + 1, β − α + 1; αβ x)
βΓ(β − α + 1)
¾
(αβ x)α
− 1 F2 (α; α + 1, α − β + 1; αβ x)
αΓ(α − β + 1)
where 1 F2 is a generalized hypergeometric function. The

CDF for the K distribution here arises when β = 1.

Other Examples of PDFs: Discrete RV
Binomial distribution:
n
X n!
f n ( x) = p k q n−k δ( x − k), p+q =1
k=0 k !( n − k )!
Mean and variance are, respectively, m = np and σ2n = npq.

The corresponding CDF is defined by
n
X n!
Fn ( x) = p k q n − k U ( x − k ), p+q =1
k=0 k !( n − k )!
and the characteristic function of a binomially distributed

RV is
Φn ( s) = ( pe is + q)n
Poisson distribution:
X∞ mk
f n ( x) = e−m δ( x − k)
k=0 k!
The mean and variance of a Poisson-distributed RV are

both equal to m, and the CDF is defined by
X∞ mk
Fn ( x) = e−m U ( x − k)
k=0 k!
If the number n of Bernoulli trials (e.g., like coin tossing)

is very large, and if the probability p of success in each
trial is very small, then the binomial probability can be
approximated by the Poisson probability.
If n is a Poisson-distributed RV, then its characteristic
function is
Φn ( s) = exp[ m( e is − 1)]

Chebyshev Inequality
Let x be a continuous RV. Then, for any non-negative real-

valued function g(x) and every k > 0, the probability of
g(x) ≥ k is less than or equal to E [ g(x)]/ k, i.e.,
E [ g(x)]
Pr[ g(x) ≥ k] ≤
k
The general inequality given above is called the
Chebyshev inequality. A particularly useful form of this
inequality is given by the following.
Let m be the expected value or mean of RV x. The
inequality that we seek provides a bound on the
probability that x deviates from the mean m by more than
a certain amount ε. That probability is illustrated by the
shaded areas under the PDF shown in the following figure.
In symbols, the Chebhshev inequality that we seek takes
the form
Var(x)
Pr(|x − m| ≥ ε) ≤
ε2
which depends only on ε and the variance of x.

Law of Large Numbers
The law of large numbers is a concept that was

reconsidered and generalized for more than two centuries
by some of the most famous names in probability. In simple
terms, it is a rule that says: As the number of samples
increases, the average of these samples is likely to reach
the mean of the whole population.
To demonstrate, let k be a binomial RV that denotes the
number of times an event A occurs in n trials. The relative
frequency of its occurrence is defined by the ratio
k
F=
n
Thus, F takes on values that are integral multiples of 1/ n,

0 ≤ F ≤ 1. If we define p = Pr( A ), then it follows that
E[F] = p,
p(1 − p)
Var(F) =
n
Based on this last expression, it is clear that as n becomes

large, the distribution of RV F clusters more closely about
its expected value p.
If we apply the Chebyshev inequality from the previous
page, we are led to the inequality known as the law of
large numbers, namely,
p(1 − p)
Pr(|F − p| ≥ ε) ≤
nε2
In general, the law of large numbers is a theorem that

describes the result of performing the same experiment a
large number of times. It actually guarantees stable long-
term results associated with random events.

Functions of One RV
In many cases, an examination is necessary of what hap-

pens to RV x as it passes through various transformations,
such as a random signal passing through a nonlinear de-
vice. Suppose that the output of some nonlinear device
with input x can be represented by the new RV:
y = g(x)
If the PDF of x is known to be f x ( x), and the function

y = g( x) has a unique inverse, the PDF of y is related by
f x ( x)
f y ( y) =
| g 0 ( x )|
If the inverse of y = g( x) is not unique, and x1 , x2 , . . . , xn are

all of the values for which y = g( x1 ) = g( x2 ) = · · · = g( xn ), then
the previous relation is modified to
f x ( x1 ) f x ( x1 ) f x ( xn )
f y ( y) = + 0 +···+ 0
| g ( x1 ) | | g ( x1 ) |
0 | g ( xn )|
Another method for finding the PDF of y involves the

characteristic function. For example, given that y = g(x),
the characteristic function for y can be found directly from
the PDF for x through the expected value relation
Z ∞
Φy ( s) = E [ e isg(x) ] = e isg( x) f x ( x) dx
−∞
Consequently, the PDF for y can be recovered from

characteristic function Φy ( s) through inverse relation
Z
1 ∞
f y ( y) = e− is y Φy ( s) ds
2π −∞

Example: Square-Law Device
The output of a square-law device is defined by the

quadratic transformation
y = a x2 , a>0
where x is the RV input. Find an expression for the PDF

f y ( y) given that we know f x ( x).
Solution: We first observe that if y < 0, then y = ax2 has no

real solutions; hence, it follows that f y ( y) = 0 for y < 0.
For y > 0, there are two solutions to y = ax2 , given by
r r
y y
x1 = , x2 = −
a a
where
p
g0 ( x1 ) = 2ax1 = 2 a y
p
g0 ( x2 ) = 2ax2 = −2 a y
In this case, we deduce that the PDF for RV y is defined

by
· µr ¶ µ r ¶¸
1 y y
f y ( y) = p fx + fx − U ( y)
2 ay a a
where U ( y) is the unit step function.

It can also be shown that the CDF for y is
· µr ¶ µ r ¶¸
y y
Fy ( y) = Fx − Fx − U ( y)
a a

Example: Half-Wave Rectifier
The output of a half-wave rectifier is y = xU (x), or

½
x, x>0
y = g ( x) =
0, x<0
Given that f x ( x) = 12 e−| x| , determine the CDF and PDF for

RV y.
Solution: Note that y = g( x) is constant over the interval
x < 0. This means that all probability associated with
negative values of x will pile up at y = 0, leading to a delta
function for f y ( y) at the origin.
From the definition, we first calculate

 1
1
Z x 1 − e− x ,
 x≥0
Fx ( x) = e−|u| du = 2
2 −∞ 
 1 x
 e , x<0
2
Because the function y = g( x) = x is monotonically

increasing for x ≥ 0, we find that
1 −y
F y ( y) = F x ( x ) = 1 − e , y≥0
2
For y < 0, we have an impossible event, so Fy ( y) = 0.

³ ´
Hence, Fy ( y) = 1 − 12 e− y U ( y) and
µ ¶
dFy ( y) 1 1
f y ( y) = = e− yU ( y) + 1 − e− y δ( y)
dy 2 2
From the property f ( x)δ( x) = f (0)δ( x), the above equation

reduces to
1 −y 1
f y ( y) = e U ( y) + δ( y)
2 2

18
Conditional Probabilities
The notion of conditional probability relates to the
likelihood of some event B happening, given that another
known event A has already happened. Conditional
probabilities are probabilities in their own right, and
therefore satisfy the basic axioms listed in the first
chapter.
Some of the basic concepts of probability theory presented
in that chapter are reformulated in this chapter in terms of
conditional concepts. Thus, it is necessary to introduce the
concepts of conditional CDF, conditional PDF, conditional
moments, and so on.

Conditional Probability: Independent Events
It is required at times to find the probability of event B

under the condition that event A has occurred, called the
conditional probability of B given A . This is denoted by
the symbol Pr(B | A ). The probability that events common
to both A and B occur is defined by
Pr( A ∩ B) = Pr(B | A )Pr( A )
where the symbol ∩ denotes intersection (i.e., common to

both A and B). Thus, the conditional probability of B given
A is defined by
Pr( A ∩ B)
Pr(B | A ) = , Pr( A ) 6= 0
Pr( A )
A related property is
Pr( A ∩ B) = Pr(B | A )Pr( A ) = Pr( A | B)Pr(B) (multiplication rule)
Consequently, the conditional probability of A given B is
Pr( A ∩ B)
Pr( A | B) = , Pr(B) 6= 0
Pr(B)
By writing the multiplication rule in a different form, one

obtains what is known as Bayes’ theorem:
Pr(B | A )Pr( A )
Pr( A | B) = , Pr(B) 6= 0
Pr(B)
If events A and B are such that
Pr( A ∩ B) = Pr( A )Pr(B)
they are called independent events. In such cases, it is

also true that conditional probabilities reduce to ordinary
probabilities, i.e.,
Pr( A | B) = Pr( A ), Pr(B | A ) = Pr(B)

20 Conditional Probabilities
Conditional CDF and PDF
Given that event A has occurred, we define the

conditional distribution function as
Pr[(x ≤ x) ∩ A ]
Fx ( x | A ) = Pr(x ≤ x | A ) =
Pr( A )
The associated conditional PDF is
dFx ( x | A )
fx(x | A) =
dx
If A is the event A = {x ≤ a}, where a is constant, then
Pr[(x ≤ x) ∩ (x ≤ a)]
Fx ( x | A ) = Pr(x ≤ x | A ) =
Pr( A )
For the intersection of two events, defined by

½
x ≤ x, x≤a
(x ≤ x) ∩ (x ≤ a) =
x ≤ a, x>a
the related CDF and PDF are, respectively,


 Pr(x ≤ x) Fx ( x)

 = , x≤a
Pr(x ≤ a) Fx (a)
Fx ( x | x ≤ a) =

 Pr(x ≤ a)
 = 1, x > a
Pr(x ≤ a)
dFx ( x)/ dx f x ( x)
f x ( x | x ≤ a) = U ( a − x) = U ( a − x)
Fx ( x) Fx ( x)
If event A is A = {a < x ≤ b}, then the corresponding CDF

and PDF are


 0, x ≤ a

Fx ( x) − Fx ( a)
Fx ( x | a < x ≤ b ) = , a<x≤b
 Fx ( b ) − Fx ( a)


1, x > b
f x ( x)
f x ( x | x ≤ a) = [U ( x − a) − U ( x − b)]
Fx ( x) − Fx ( a)

Conditional Expected Values
If x is a continuous RV depending on event A , the

conditional expected value of any function of x, say
g(x), is defined by
Z ∞
E [ g(x) | A ] = g( x) f x ( x | A ) dx
−∞
For the case when g(x) = xn , n = 1, 2, 3, . . ., one obtains the

standard conditional moments
Z ∞
E [xn | A ] = x n f x ( x | A ) dx, n = 1, 2, 3 , . . .
−∞
Suppose that A is the event A = {a < x ≤ b}; then,

Z ∞
E [xn | a < x ≤ b] = x n f x ( x | a < x ≤ b) dx
−∞
Rb n
a x f x ( x) dx
= , n = 1 , 2, 3, . . .
Fx ( b ) − Fx ( a)
Conditional variance of a RV x with respect to a

conditioning event A is defined by
Var(x | A ) = E [x2 | A ] − E 2 [x | A ]
Similarly, the conditional central moments are

Z ∞
E [(x − m)n | A ] = ( x − m)n f x ( x | A ) dx, n = 2, 3 , 4, . . .
−∞
where now m = E [x | A ]. As before, the variance is the

second conditional central moment, namely,
Var(x | A ) = E [(x − m)2 | A ]

22 Conditional Probabilities
Example: Conditional Expected Value
Given that x is a Gaussian RV with mean m and variance

σ2 , calculate the conditional moment E [x | x ≤ m].
Solution: The conditional PDF takes the form
f x ( x)
f x ( x | x ≤ m) = U ( m − x)
Fx ( m)
where
" #
1 ( x − m )2
f x ( x) = p exp − , −∞ < x < ∞
σ 2π 2σ2
and
Z " #
1 ( x − m )2
m 1
Fx ( m) = p exp − 2
dx =
σ 2π −∞ 2σ 2
Therefore, the conditional PDF becomes
f x ( x | x ≤ m) = 2 f x ( x)U ( m − x)
By calculating the required statistical expectation, it can

be shown that
Z ∞
E [x | x ≤ m] = x f x ( x | x ≤ m) dx
−∞
Z m " #
1 ( x − m )2
= p x exp − 2
dx
σ 2π −∞ 2σ
The evaluation of this last integral leads to

s
2
E [x | x ≤ m] = σ
π

23
Probability: Two Random Variables

Up to this point we have considered PDFs and CDFs
associated with a single RV. In some cases, however, a
single RV can be a function of two or more other RVs,
such as the sum z = x+y or product z = xy. We often make
calculations concerning RV z in such cases by developing
a joint PDF and/or CDF model of x and y. In doing so, it
is helpful to visualize the outcome of an experiment as a
point in the ξη plane with coordinates (x, y).
Concepts required for the analysis of joint PDFs or CDFs
of two or more RVs are a natural extension of those
for a single RV. We focus primarily on two RVs, with
the generalization to three or more RVs being fairly
straightforward.

24 Probability: Two Random Variables
Joint and Marginal Cumulative Distributions
If x and y are RVs, we associate the events {x ≤ x} and

{y ≤ y} with respective probabilities
Pr(x ≤ x) = Fx ( x), Pr(y ≤ y) = Fy ( y)
where Fx ( x) and Fy ( y) are CDFs of x and y. The event

defined by the intersection
{x ≤ x} ∩ {y ≤ y} = {x ≤ x, y ≤ y}
is represented by a quad-
rant in the ξη plane
having its vertex at
point ( x, y), as shown
in the figure. The prob-
ability of this event,
called the joint distri-
bution function of x
and y, is given by
Fxy ( x, y) = Pr(x ≤ x, y ≤ y)
This joint CDF has four properties analogous to those for

a single RV, namely,
(1) 0 ≤ Fxy ( x, y) ≤ 1, −∞ < x < ∞, −∞ < y < ∞
(2) Fxy (−∞, y) = Fxy ( x, −∞) = Fxy (−∞, −∞) = 0
(3) Fxy (∞, ∞) = 1
(4) Fxy (∞, y) = Fy ( y), Fxy ( x, ∞) = Fx ( x)
The joint CDF Fxy ( x, y) is a nondecreasing function when

either x or y (or both) increase. Here CDFs Fx ( x) and Fx ( x)
are called marginal distributions.

Joint and Marginal Density Functions
The joint density function of RVs x and y is defined by
∂2 Fxy ( x, y)
f xy ( x, y) =
∂ x∂ y
provided that the joint CDF Fxy ( x, y) is known, continuous,

and differentiable. The joint PDF is also commonly called
the bivariate PDF.
If given the joint PDF, the joint CDF can be recovered by
integrating the PDF over the shaded rectangular domain
shown in the figure on the previous page, i.e.,
Z x Z y
Fxy ( x, y) = f xy (ξ, η) d ξ d η
−∞ −∞
If we define event A = {(x, y) ∈ D }, where ∈ means contained

in and D is some domain of the ξη plane, then the
probability assigned to this event is
Ï
Pr( A ) = f xy (ξ, η) d ξ d η
D
The marginal CDF of RV x can be found from the joint

CDF through the relation
Z x Z ∞
Fx ( x) = Fxy ( x, ∞) = f xy (ξ, η) d ξ d η
−∞ −∞
and by differentiating with respect to x, we obtain the

marginal density function:
Z x Z ∞ Z ∞
∂
f x ( x) = f xy (ξ, η) d ξ d η = f xy ( x, η) d η
∂x −∞ −∞ −∞
Similarly, the marginal density function of y is

Z ∞
f y ( y) = f xy (ξ, y) d ξ
−∞

Conditional Distributions and Density Functions
To discuss conditional distributions, we illustrate only

the case where the conditioning event depends on y. In
particular, given event A = {y ≤ y}, then
Pr(y ≤ y) = Fy ( y)
Pr(x ≤ x, y ≤ y) = Fxy ( x, y) = Fx ( x | y ≤ y)Fy ( y)
From these results, the conditional CDF is
Fxy ( x, y)
Fx ( x | y ≤ y) =
F y ( y)
and the corresponding PDF is

Z y
1
f x ( x | y ≤ y) = f xy ( x, η) d η
Fy ( y) −∞
For event A = {a < y ≤ b}, it follows that
Fxy ( x, b) − Fxy ( x, a)
Fx ( x | a < y ≤ b ) =
Fy ( b ) − Fy ( a)
Z b
1
f x ( x | a < y ≤ b) = f xy ( x, y) d y
Fy ( b ) − Fy ( a) a
The special event A = {y = y} leads to

Z x
1
F x ( x | y = y) = f xy (ξ, y) d ξ
f y ( y) −∞
f xy ( x, y)
f x ( x | y = y) ≡ f x ( x | y) =
f y ( y)
from which we also deduce the total probability

Z ∞
f x ( x) = f x ( x | y) f y ( y) d y
−∞
Finally, the previous results lead to Bayes’ theorem:
f x ( x | y) f y ( y)
f y ( y | x) =
f x ( x)

Example: Conditional PDF
Determine the conditional PDF f xy ( x, y | x2 + y2 < b2 ), given

that
Ã !
1 x 2 + y2
f xy ( x, y) = exp −
2πσ2 2σ2
Solution: With D = { x2 + y2 < b2 }, we first calculate

Ï
Pr(x2 + y2 < b2 ) = f xy ( x, y) dxd y
D
Ï Ã !
1 x2 + y2
= 2
exp − 2
dxd y
2πσ D 2σ
By changing to polar coordinates, the evaluation of this

integral yields
Z 2π Z b
1 2 2
Pr(x2 + y2 < b2 ) = e−r /2σ r drd θ
2πσ2 0 0
− b2 /2σ2
= 1− e
Thus,
f xy ( x, y)
f xy ( x, y | x2 + y2 < b2 ) = Î
D f xy ( x, y) dxd y
exp[−( x2 + y2 )/2σ2 ]
=
2πσ2 [1 − exp(−b2 /2σ2 )]
A topic related to conditional probability is Fermat’s

principle of conjunctive probability. That is, the
probability that two events will both happen is AB,
where A is the probability that the first event will
happen, and B is the probability that the second event
will happen when the first event is known to have
happened.

Principle of Maximum Likelihood
The maximum-likelihood estimation (MLE) technique

was originally developed by R.A. Fisher in the 1920s. It
states that the desired probability distribution is the one
that makes the observed data “most likely.” This means
that one must seek the value of the parameter vector that
maximizes the likelihood function L( x | y).
MLE estimates need not exist nor be unique. For
computational convenience, the MLE estimate is obtained
by maximizing the log-likelihood function ln[L( x/ y)].
The conditional PDF f x ( x | y) is sometimes called the
posterior density function of RV x.
The principle of maximum likelihood is equivalent to
Bayes’ theorem, i.e., it gives the best estimate of RV x,
given the observation y = y.
The maximum-likelihood estimate (MLE) x̂ is that value
for which the conditional PDF f x ( x | y) is maximum.
In statistics, the maximum-likelihood estimation

(MLE) technique is a method of estimating the
parameters of a statistical model. For a fixed set
of data and underlying statistical model, the method
of maximum likelihood selects values of the model
parameters that give the observed data the greatest
probability. Unlike least-squares estimation, which is
primarily a descriptive tool, MLE is a preferred
method of parameter estimation in statistics and is
an indispensable tool for many statistical modeling
techniques—in particular, for nonlinear modeling with
non-normal (i.e., non-Gaussian) data.

Independent RVs
Two RVs x and y are said to be statistically

independent if
Pr(x ≤ x, y ≤ y) = Pr(x ≤ x)Pr(y ≤ y)
or equivalently, in terms of CDFs,
Fxy ( x, y) = Fx ( x)Fy ( y)
It follows that in terms of density functions,

f xy ( x, y) = f x ( x) f y ( y)
Also, because f xy ( x, y) = f x ( x | y) f y ( y), for example, it can be

deduced that
f x ( x | y) = f x ( x )
Similarly, for statistically independent RVs x and y, it

follows that
f y ( y | x ) = f y ( y)
The formal definition of conditional independence

is based on the idea of conditional distributions. For
example, if the random variables x and y are continuous
and have a joint PDF, then x and y are conditionally
independent given z if
Fxy ( x, y | z) = Fx ( x | z)Fy ( y | z)
or, equivalently,
f xy ( x, y | z) = f x ( x | z) f y ( y | z)
In simple terms, two RVs x and y are said to be

statistically independent if x conveys no information
about y, and y conveys no information about x. If two
RVs are independent, information received about one of
the two does not change the assessment of the probability
distribution of the other.

Expected Value: Moments
If x and y are RVs, the expected value of the function

g(x, y) is defined by
Z ∞ Z ∞
E [ g(x, y)] = g( x, y) f xy ( x, y) dxd y
−∞ −∞
Statistical moments analogous to those defined for a single

RV are called joint moments of x and y. In particular,
Z ∞ Z ∞
m jk = E [x j yk ] = x j yk f xy ( x, y) dxd y, j, k = 1, 2, 3, . . .
−∞ −∞
Of special importance is the moment m 11 , given by

Z ∞ Z ∞
E [xy] = x y f xy ( x, y) dxd y
−∞ −∞
called the correlation of RVs x and y. If x and y are

statistically independent, then f xy ( x, y) = f x ( x) f y ( y), and
the correlation yields
E [xy] = E [x]E [y]
If E [xy] = 0, it is said that x and y are orthogonal.

The covariance of RVs x and y is
Cov(x, y) = E [xy] − E [x]E [y]
If Cov(x, y) = 0, then x and y are uncorrelated. If x and y

are statistically independent, they are also uncorrelated;
however, the converse might not be true.
Finally, the correlation coefficient of x and y is defined
by
Cov(x, y)
ρ= ; σ2x = Var(x), σ2y = Var(y)
σx σ y
Based on the inequality |Cov(x, y)| ≤ σ x σ y , it follows that

−1 ≤ ρ ≤ 1.

Example: Expected Value
Determine the correlation coefficient between RVs x and

y whose joint density function is
½
x + y, 0 ≤ x ≤ 1, 0 ≤ y ≤ 1
f xy ( x, y) =
0, elsewhere
Solution: From the symmetry of the PDF, it follows that

Z 1Z 1
7
E [x] = E [y] = x( x + y) dxd y =
0 0 12
Also,
Z 1Z 1
1
E [xy] = x y( x + y) dxd y =
0 0 3
and consequently,
µ ¶2
1 7 1
Cov(x, y) = E [xy] − E [x]E [y] = − =−
3 12 144
In a similar fashion,
Z 1Z 1µ ¶2
7 11
σ2x = σ2y = x− ( x + y) dxd y =
0 0 12 144
from which the following can be deduced:
Cov(x, y) 1
ρ= =−
σx σy 11
Thus, x and y are negatively correlated. That is, when one
variable is large, the other tends to be small.
The expected value is simply the limit of the sample

mean as the sample size grows to infinity. More
informally, it can be interpreted as the long-run average
of the results of many independent repetitions of an
experiment. The correlation coefficient is a measure of
the strength of the linear relationship between two RVs.

Bivariate Gaussian Distribution
The most widely used joint PDF is the bivariate

Gaussian density function
1
f xy ( x, y) = q
2πσ x σ y 1 − ρ2
 ( y− y)2

( x− x)2 2ρ( x− x)( y− y)
− +
 σ2x σx σ y σ2y 
× exp − 
2(1 − ρ2 )
where x and y are the means, respectively, and ρ is the

correlation coefficient. If both x and y have zero means
and unit variances, the joint PDF reduces to the simpler
form
" #
1 x 2 − 2 ρ x y + y2
f xy ( x, y) = q exp −
2π 1 − ρ2 2(1 − ρ2 )
This form can always be obtained in practice by scaling

the RVs according to
( x − x) ( y − y)
ξ= , η=
σx σy
and redefining ξ = x and η = y.

Marginal density functions associated with the bivariate
Gaussian distribution are
" #
1 ( x − x )2
f x ( x) = p exp −
σ x 2π 2σ2x
and
" #
( y − y) 2
1
f y ( y) = p exp −
σ y 2π 2σ2y

Example: Rician Distribution
Suppose that x and y are statistically independent RVs,

with joint Gaussian PDF
" #
1 ( x − m)2 + y2
f xy ( x, y) = exp −
2πσ2 2σ2
where m = m x and σ2 = σ2x = σ2y . Converting to polar

coordinates, the joint PDF takes the form
" #
(R cos Θ − m)2 + (R sin Θ)2
R
f RΘ (R, Θ) = exp −
2πσ2 2σ2
Compute the marginal PDFs for R and Θ.

Solution: The marginal PDFs for R and Θ are defined by
Z 2π Z ∞
f R (R ) = f RΘ (R, Θ) d Θ, f Θ (Θ) = f RΘ (R, Θ) dR
0 0
For R, the marginal PDF is the Rician distribution

Ã !Z µ ¶
U (R ) R 2 + m2 2π mR cos Θ
f R (R ) = exp − exp dΘ
2πσ2 2σ2 0 σ2
Ã ! µ ¶
R R 2 + m2 mR
= exp − I0 U (R )
σ2 2σ2 σ2
where U (R ) is the step function and I 0 ( x) is the modified

Bessel function. The PDF integral for Θ reduces to
Ã !Z Ã !
1 m2 ∞ R 2 − 2 mR cos Θ
f Θ (Θ) = exp − exp − dΘ
2πσ2 2σ2 0 2σ2
Ã !
1 m2
= exp − 2
2π 2σ
( r Ã ! · µ ¶¸)
m π m2 cos2 Θ m cos Θ
× 1+ exp cos Θ 1 + erf p
σ 2 2σ2 2σ
where erf( x) is the error function.

Functions of Two RVs
Let us now develop the CDF and/or PDF for RV z when it

is related to RVs x and y by equation z = g(x, y). Several
methods are available for making such calculations.
Method 1. Event g(x, y) ≤ z is represented by the domain D
in the ξη plane. Hence, the CDF and PDF are found from
the relations
Ï
Fz ( z ) = f xy ( x, y) dxd y
D
Ï
d
f z ( z) = f xy ( x, y) dxd y
dz D
Method 2. Here we use conditional statistics. Let us fix y,

say y = y, so that RV z depends only on x. If g( x, y) is a
monotone function of x with inverse x = g−1 ( x, y), then the
PDF for z is
Z ∞
f z ( z) = f z ( z | y) f y ( y) d y
−∞
where f y ( y) is the marginal PDF of y, and

¯
¯
f x ( x | y) ¯
f z ( z | y) = ¯ ¯ ¯
¯
¯ ∂ g( x,y) ¯ ¯
¯ ∂x ¯
x= g−1 ( x,y)
Method 3. This last method is based on the characteristic

function of z, defined by
Z ∞ Z ∞
Φz ( s ) = e isg( x,y) f xy ( x, y) dxd y
−∞ −∞
Inverting this expression, we obtain

Z
1 ∞
f z ( z) = e− isz Φz ( s) ds
2π −∞

Sum of Two RVs
One of the most important cases of the form z = g(x, y)

involves a sum of two RVs, namely,
z = x+y
To use Method 1 on the previous page, note that the

domain of integration D is defined by x + y ≤ z. Hence, the
CDF of z is
Z ∞ Z z− y
Fz ( z ) = f xy ( x, y) dxd y
−∞ −∞
and the corresponding PDF is therefore

Z ∞
f z ( z) = f xy ( z − y, y) d y
−∞
If x and y are statistically independent, then the PDF

becomes
Z ∞ Z ∞
f z ( z) = f x ( z − y) f y ( y) d y = f x ( x) f y ( z − x) dx
−∞ −∞
Finally, if x and y take on only positive values, then this

last expression reduces even further to
Z z Z z
f z ( z) = f x ( z − y) f y ( y) d y = f x ( x) f y ( z − x) dx, z>0
0 0
For the more general case of
z = ax + b y
the corresponding PDF takes the form

Z µ ¶
1 ∞ z − by
f z ( z) = f xy , y dy
a −∞ a

Product and Quotient of Two RVs
Let us consider the simple product

z = xy
Using Method 2,
¯
¯ µ ¯ ¶
f x ( x | y) ¯ 1 z ¯¯
f z ( z | y) = ¯ ¯ ¯
¯ = fx y
¯ ∂( x y) ¯
¯ ∂x ¯ ¯ | y| y¯
x= z / y
and, consequently,
Z Z µ ¯ ¶
∞ ∞ 1 z ¯¯
f z ( z) = f z ( z | y) f y ( y) d y = fx y f y ( y) d y
−∞ −∞ | y| y¯
If the RVs are jointly Gaussian with zero means and unit
variances, this last expression leads to
µ ¶ µ ¶
1 ρz | z|
f z ( z) = q exp 2
K0 , −1 < ρ < 1
π 1 − ρ2 1−ρ 1 − ρ2
where ρ is the correlation coefficient and K 0 ( x) is a

modified Bessel function.
For the quotient
x
z=
y
Method 2 leads to
¯
¯
f x ( x | y) ¯
f z ( z | y) = ¯ ¯ ¯
¯ = | y| f x ( yz | y)
¯ ∂( x/ y) ¯ ¯
¯ ∂x ¯
x= yz
and
Z ∞ Z ∞
f z ( z) = f z ( z | y) f y ( y) d y = | y| f x ( yz | y) f y ( y) d y
−∞ −∞
If the RVs are jointly Gaussian, uncorrelated, with zero

means and unit variances, this last expression leads to
1
f z ( z) =
π( z2 + 1)

Conditional Expectations and Mean-Square
Estimation
We first briefly examine some common conditional

expectations involving functions of two RVs.
From definition,
Z ∞ Z ∞
E [ g(x, y) | A ] = g( x, y) f x,y ( x, y | A ) dxd y
−∞ −∞
and if A = a < x < b, then

Z ∞ Z ∞
E [ g(x, y) | a < x < b] = g( x, y) f x,y ( x, y | a < x < b) dxd y
−∞ −∞
Z ∞Z ∞ g( x, y) f x,y ( x, y)
= dxd y
−∞ −∞ Fx ( b ) − Fx ( a)
Of particular importance is the conditional expectation
E {E [ g(x, y) | x]} = E [ g(x, y)]
If we wish to estimate RV y by a suitable function g(x),

such that the mean-square estimation error defined by
E [[y − g(x)]2 ] is minimum, then
g(x) = E [y | x]
In this context, the function g( x) = E [y | x] is known as a

regression curve. Rather than using a general nonlinear
function g(x), the problem is simplified when estimating y
by linear function g(x) = ax + b. Constants a and b that
minimize the mean-square error are given by
ρσy
a= , b = E [y] − aE [x]
σx
and the resulting minimum error becomes
E [[y − (ax + b)]2 ] = σ2y (1 − ρ2 )

38
Sums of N Complex Random Variables

In the previous chapter, discussion was limited to
functions of only two RVs. However, most of the concepts
applied to two RVs can readily be extended to any number
of RVs. In this chapter, the treatment of multiple RVs is
limited to only sums.
Sums of RVs are quite common in practice. For example,
when a random process is sampled, such as the random
output of a detection device, the sum of samples leads to
an expression of the form
N
X
z = x1 + x 2 + · · · + x N = xk
k=1
Of particular interest is the case where each RV xk has

the same density function and all of the RVs are mutually
independent. In the following, we consider cases when all
RVs are complex (namely phasors) as well as when they
are real.

Central Limit Theorem
Gaussian RVs appear so often in science and engineering

applications partly due to the fact that large sums of RVs
can frequently be approximated by a Gaussian RV.
Let x1 , x2 , . . . , x N be N independent and identically
distributed RVs with mean values m and variances
σ2 . The central limit theorem states that, under the
conditions cited before, RV z — defined by the sum
(x1 − m) + (x2 − m) + · · · + (x N − m) N
1 X
z= p =p (xk − m)
N N k=1
whose expected value is zero and whose variance is σ2 —

has a Gaussian PDF in the limit N → ∞.
Other versions of the central limit theorem also exist. For
example, it is unnecessary for RVs xk to be identically
distributed for the sum to be Gaussian. In this latter case,
the means can be different, variances Var(xk ) of each xk
must remain within some fixed bounds, and the third
absolute central moments E [|xk − E [xk ]|3 ] must remain
bounded. Then, RV
N
1 X
z= p (xk − E [xk ])
N k=1
in the limit N → ∞ will be Gaussian with zero mean, and
N
1 X
Var(z) = lim Var(xk )
N →∞ N k=1
Finally, finite sums without limit N → ∞ are most often

found in practice. Nonetheless, if N is finite but sufficiently
large, the sum can still be well approximated by a
Gaussian distribution.

40 Sums of N Complex Random Variables
Example: Central Limit Theorem
For the sum of zero-mean, identically distributed RVs

1
z = p (x1 + x2 + · · · + x N )
N
all with the same variance σ2 , use the characteristic
function method to show that, as N → ∞, the limit
distribution is Gaussian (central limit theorem).
Solution: Consider the limit of characteristic function
p
lim Φz (s) = lim E N [ e isx/ N
]
N →∞ N →∞
Ã !N
isE [x] s2 E [x2 ]
= lim 1 + p − +···
N →∞ N 2N
By assumption, E [x] = 0, and E [x2 ] = σ2 , so that
Ã !N
s2 σ2
lim Φz (s) = lim 1 − +···
N →∞ N →∞ 2N
By use of binomial and Stirling formulas, i.e.,
N
X N! p
(1 + x) N = xk , N! ∼ 2π N N e − N , N → ∞
k=0 k !( N − k )!
it can be shown that
Ã !k
N
X N! s2 σ2
lim Φz (s) = lim −
N →∞ N →∞ k=0 k!( N − k)! N k 2
and
N!
lim =1
N →∞ ( N − k)! N k
Consequently,
Ã !k
X∞ 1 s2 σ2 2 2
lim Φz (s) = − = e−s σ /2
N →∞ k=0 k ! 2
which is the characteristic function of a Gaussian
distribution, namely,
1 2 2
f x ( x) = p e− x /2σ , −∞ < x < ∞
σ 2π

Phases Uniformly Distributed on (−π, π)
A signal occurring in practice can often be represented by

the real or imaginary part of
S( t) = A( t)exp{ i [ω t + Φ( t)]}
where A is a random amplitude and Φ a random phase.

Omitting the frequency ω and fixing time t, it is customary
to write this signal as a random phasor
S = A e iΦ
A sum of N such phasors
N
X
R e iΘ = Ak e iΦk
k=1
can be represented as a random walk in the complex plane

(see figure). If the real and imaginary parts of the sum
satisfy conditions of the central limit theorem, then the
real and imaginary parts of the sum will be approximately
Gaussian for sufficiently large N , regardless of phase
distributions.
In practice, phase Φk of each phasor in the sum is often
uniformly distributed over some interval of 2π radians,
say, (−π, π) or (0, 2π). In this case, the following has been
shown:
N
X
E [R2 ] = E [A2k ]
k=1
as well as the important result,

"Ã !2 #
N
X N
X
2 2 iΘ
E [R e ]=E Ak e iΦk
= E [A2k ]
k=1 k=1

Phases Not Uniformly Distributed on (−π, π)
First consider a sum of N random phasors
N
X
R e iΘ = Ak e iΦk
k=1
where the phase distribution of each phasor in the sum

is uniformly distributed over 2π radians. Dividing the
complex quantities into real and imaginary parts yields
N
X
x = R cos Θ = Ak cos Φk
k=1
N
X
y = R sin Θ = Ak sin Φk
k=1
If N is large and all Ak are identically distributed, then

both x and y are approximately Gaussian distributed with
zero means and variances σ2x = σ2y = 1/2 N 〈A2 〉. RVs x and y
are uncorrelated in this case and hence are statistically
independent (because x and y are Gaussian distributed).
Also, for large N , amplitude R is Rayleigh distributed and
phase Θ is uniformly distributed.
Suppose now that the phase distribution of each phasor

Φk in the prior sum is not uniformly distributed over
a 2π interval, or not uniformly distributed at all. If N is
large and all Ak are identically distributed, then both x
and y are still approximately Gaussian distributed but
with nonzero means that might not be equal. Variances
in this case might also not be equal, and RVs x and y are
often correlated.

Example: Phases Uniformly Distributed on (−α, α)
Suppose that each phase Φk in a sum of random phasors

is uniformly distributed over interval (−α, α), which is less
than 2π. If the Ak are identically distributed, and the
number of terms N is sufficiently large so that
N
P N
P
x= Ak cos Φk and y = Ak sin Φk are Gaussian,
k=1 k=1
1. find the means m x , m y and variances σ2x , σ2y , and

2. given that x = Rcos Θ, y = Rsin Θ, find the marginal
PDFs for R and Θ.
Solution: (1) First mean m x is defined by

N
X
mx = 〈Ak cos Φk 〉 = N 〈A〉〈cos Φ〉
k=1
which leads to
Z α
1 sin α
m x = N 〈A 〉 cos Φ d Φ = N 〈A〉 = N 〈A〉sinc α
2α −α α
Similarly,
N
X
my = 〈Ak sin Φk 〉 = N 〈A〉〈sin Φ〉 = 0
k=1
The respective variances are

Z +α
1 2
σ2x = N 〈A2 〉 cos2 Φ d Φ − ( N 〈A〉sinc α)
2α −α
1
= N 〈A2 〉(1 + sinc2α) − N 2 〈A〉2 sinc2 α
2
and
Z α
1
σ2y = N 〈A2 〉 sin2 Φ d Φ
2α −α
1
= N 〈A2 〉(1 − sinc2α)
2
Note that for α = π, these results for means and variances
reduce to those on the previous page.

Example: Phases Uniformly Distributed on (−α, α)
(cont.)
(2) The joint PDF for x and y in polar coordinates is

" #
R (R cos Θ − m)2 (R sin Θ)2
f RΘ (R, Θ) = exp −
q −
2π σ2x σ2y 2σ2x 2σ2y
where m = m x , and thus, the marginal PDF for R yields

Z 2π
f R (R ) = f RΘ (R, Θ) d Θ
0
U ( R ) R e −D X
∞
= q (−1)k εk I k (P ) I 2k (Q )
σ2x σ2y k=0
where U (R ) is the step function, I µ ( x) is the modified

Bessel function, and where
½
1 for k = 0 m2 R2 σ2x − σ2y 2
εk = D= + + R
2 for k 6= 0 2σ2x 2σ2y 4σ2x σ2y
σ2x − σ2y mR
P= R2 and Q =
4σ2x σ2y σ2x
The solution is obtained by first expanding the integrand
in a series. Also, this PDF reduces to the Rician PDF when
σ2x = σ2y = σ2 . Similarly, the marginal PDF for Θ leads to
Z ∞
f Θ (Θ) = f RΘ (R, Θ) dR
0
h ¡ ¢i
K exp − 21 B2 1 + K 2 h p 2
i
= 1 + G π eG (1 + erfG )
2π(K 2 cos2 Θ + sin2 Θ)
where erf( x) is the error function, and
v
u 2
m u σy
B= q , K =t ,
σ2x + σ2y σ2x
s
1 + K2
G = BK cos Θ
2K 2 cos2 Θ + 2sin2 Θ

Central Limit Theorem Does Not Apply
When the sum

(x1 − m) + (x2 − m) + · · · + (x N − m) N
1 X
z= p =p (xk − m)
N N k=1
is composed of only causal or positive-definite RVs, the

limit distribution cannot be strictly Gaussian (or normal).
That is, the RV of a Gaussian distribution extends
between −∞ and ∞, whereas the sum of positive-definite
RVs only extends from 0 to ∞. However, if mean value m
is sufficiently large, the limit distribution in some cases
can be approximated by a Gaussian PDF.
Another example of non-Gaussian statistics occurs for the

sum of N statistically independent phasors
N
X
R e iΘ = Ak e iΦk
k=1
when the number of terms is also a RV. This sum

represents a random walk in the complex plane with
random step number N. In the limit of large mean step
number N , the resulting distribution generally is not
Gaussian. For example, if all individual phases in the
sum are uniformly distributed over 2π radians, then the
resultant phase Θ will also still be uniformly distributed
over 2π radians, but the resultant amplitude R will not be
Rayleigh as it is for Gaussian statistics.
In addition, the central limit theorem cannot be applied

if there is a dominant term in the sum or the number of
terms is small.

Example: Non-Gaussian Limit
Consider the N -step two-dimensional walk

N
X
P= Ak e iΦk = R e iΘ
k=1
or equivalently,
N
X N
X
P= Ak cos Φk + i Ak sin Φk = R cos Θ + i R sin Θ
k=1 k=1
where all Ak are identically distributed, and phases Φk are

all uniformly distributed over 2π radians. Find the limit
distribution for random amplitude R, given that
1. step number N is not random, or
2. step number N is random with mean N and satisfies

negative binomial statistics.
Solution:
(1) The two-dimensional characteristic function for R is
Φ N ( s) = 〈exp[ i ( s 1 P1 + s 2 P2 )]〉 = 〈 J0 ( sA)〉 N = 〈 J0 ( sR)〉
where
q subscripts 1 and 2 refer to real and imaginary parts,
s = s21 + s22 , and J0 ( x) is a zero-order Bessel function. If
p
each step number A is renormalized by factor N , then in
limit N → ∞, it can be shown that
µ ¶
1 2 2
lim Φ N ( s) = exp − s 〈R 〉
N →∞ 4
which, through the Bessel transform, corresponds to the
Rayleigh distribution
Z µ ¶
∞ 1
f R (R ) = R J0 ( sR )exp − s2 〈R2 〉 ds
0 4
Ã !
2R R 2
=
2
exp − 2 , R > 0
〈R 〉 〈R 〉

Example: Non-Gaussian Limit (cont.)
(2) Now suppose step number N is random and belongs to

a family of negative binomial distributions, described by
N µ ¶
X N +α−1 α
f N ( x) = p (1 − p) N δ( x − k)
k=1
N
N µ ¶
( N /α) N
X N +α−1
= δ( x − k), α>0
k=1
N (1 + N /α) N +α
where p = α/(α + N ). When averaging 〈 J0 ( sA)〉 N p

over the
fluctuations in N and renormalizing the steps by N , one
obtains the characteristic function
½ Áq ¶À¸¾−α
· ¿ µ
Φ N ( s) = 1 + ( N /α) 1 − J0 sA N
In limit N → ∞, it can be shown that
1
lim ΦN (s) =
N →∞ (1 + s2 〈R2 〉/4α)α
which (again using the Bessel transform) corresponds this

time to the K distribution
Z ∞ sR J0 ( sR )
f R (R ) = ds
0 (1 + s2 〈R2 〉/4α)α
µ ¶α
2 b bR
= K α−1 ( bR ), R>0
Γ(α) 2
p
where b = 2 α/〈R2 〉

48
Random Processes
In practice, one is faced with the analysis of random
data obtained by sampling a single realization of a
random process. When analyzing the time history of
such data, a decision must be made about whether the
data are stationary or not. Strictly speaking, stationarity
is defined in terms of ensemble averages, but a slightly
different interpretation takes place with a single time
history of the process. Generally, one interprets the time
history to be stationary if the properties computed over a
short time interval do not change significantly from one
interval to the next.
A single stationary random process is generally described
by the following statistical quantities:
1. mean and mean-square values
2. PDFs
3. correlation functions
4. power spectral density
Pairs of random records from two different stationary

processes are described by the following joint statistical
quantities:
1. joint PDFs
2. cross-correlation functions
3. cross-power spectral densities

Random Processes 49
Random Processes Terminology
A random process is a natural generalization of the

RV concept introduced in the first four chapters. A
random process, also called a stochastic process, is a
collection of time functions and an associated probability
description. The entire collection of such functions is called
an ensemble. Ordinarily, we represent any particular
member of the ensemble simply by x( t), called a sample
function or realization. For a fixed value of time, say t1 ,
the quantity x1 = x( t1 ) can then be interpreted as a RV
(see first chapter).
A continuous random process is one in which RVs
x1 , x2 , . . . , can assume any value within a specified range
of possible values. A discrete random process is one
in which RVs can assume only certain isolated values
(possibly infinite in number). The treatment is based
primarily on continuous random processes.
If we imagine sampling the random process x( t) at a finite
number of times t1 , t2 , . . . , t n , then we obtain the collection
of RVs xk = x( t k ), k = 1, 2, . . . , n. The probability measure
associated with these RVs is described by the joint PDF of
order n:
f x ( x1 , t 1 ; x2 , t 2 ; . . . ; xn , t n )
In principle, the theory of a continuous random process

can be developed by describing the joint PDF of all orders.
However, this is an impossible task, thus only first- and/or
second-order distributions are acquired.
In probability theory, a stochastic process is the

counterpart to a deterministic process. This means that
even if the initial condition (or starting point) is known,
there are many possibilities to which the process might
converge, but some paths may be more probable than
others.

50 Random Processes
First- and Second-Order Statistics
A first-order PDF is denoted by the symbol f x ( x, t), and

the second-order PDF by f x ( x1 , t1 ; x2 , t2 ).
The function defined by
Fx ( x, t) = Pr[x( t) ≤ x]
is called the first-order distribution function of the

random process x( t). The corresponding first-order PDF is
related by
∂Fx ( x, t)
f x ( x, t) =
∂x
Similarly, the second-order distribution function and

corresponding PDF are defined, respectively, by
Fx ( x1 , t 1 ; x2 , t 2 ) = Pr[x( t 1 ) ≤ x1 , x( t 2 ) ≤ x2 ]
∂ F x ( x1 , t 1 ; x2 , t 2 )
f x ( x1 , t 1 ; x2 , t 2 ) =
∂ x1 ∂ x2
We note that Fx ( x1 , t1 ; ∞, t2 ) = Fx ( x1 , t1 ) and

Z ∞
f x ( x1 , t 1 ) = f x ( x1 , t 1 ; x2 , t 2 ) dx2
−∞
Conditional PDFs and distributions associated with

random processes can be defined in much the same
manner as RVs. For example, given that the process takes
on value x1 at time t1 , the conditional PDF of x2 = x( t2 )
is defined by
f x ( x1 , t 1 ; x2 , t 2 )
f x ( x2 , t 2 | x1 , t 1 ) =
f x ( x1 , t 1 )

Random Processes 51
Stationary Random Processes
Suppose that the first-order PDF does not depend on time,

i.e., f x ( x, t) = f x ( x), and further, that the second-order PDF
has the form
f x ( x1 , t 1 ; x2 , t 2 ) = f x ( x1 , x2 ; t 2 − t 1 )
for all t1 and t2 . That is, the second-order or joint PDF

depends only on the time difference τ = t2 − t1 but not
on the specific times t1 and t2 . If all marginal and joint
PDFs depend only on the time difference τ = t2 − t1 but
not on the specific time origin, this is called a stationary
random process. Such a process can also be described as
one whose moments are invariant under translations in
time.
Random noise produced by an electronic device is usually
considered to be a stationary process during the (finite)
interval of observation, as are many other random
processes that occur in engineering applications. In
general, if the parameters producing a random process do
not change significantly during the finite observation time,
one can often treat that process as stationary. Of course,
if any PDFs associated with a random process do change
with the choice of time origin, that random process is said
to be nonstationary.
This definition of a stationary process is generally too
restrictive to be of much use in practice. For this reason,
a weaker type of stationary process is often introduced,
called a wide-sense stationary process, which has more
practical appeal.
Non-stationary data may indicate an underlying deter-

ministic or random phenomenon. Results deduced from
such data may often indicate a relationship between two
variables where one does not exist. To be useful, non-
stationary data needs to be transformed into stationary
increments.

52 Random Processes
Autocorrelation and Autocovariance Functions
Here the symbol 〈 〉 is used to denote an ensemble average.

The mean (i.e., the expected value or ensemble average)
of the random process x( t) is defined by
Z ∞
〈x( t)〉 = m( t) = x f x ( x) dx
−∞
which can depend on time t. Similarly, the variance
Var[x( t)] ≡ σ2x ( t) = 〈x2 ( t)〉 − m2 ( t)
can also depend on time. If the random process is

stationary, then its mean value and variance are
independent of time; for example, 〈x( t)〉 = m.
Autocorrelation function: Let x1 and x2 denote RVs
taken from a real stationary process x( t) at times
t 1 and t 2 = t 1 + τ, respectively. The autocorrelation (or
correlation) function is defined by
R x ( t 1 , t 2 ) ≡ R x (τ) = 〈x( t 1 )x( t 2 )〉

Z ∞Z ∞
= x1 x2 f x ( x1 , x2 ; τ) dx1 dx2
−∞ −∞
If x( t) is complex, then Rx (τ) = 〈x( t1 )x∗ ( t2 )〉, where ∗
denotes the complex conjugate of the quantity.

Autocovariance function: The autocovariance (or
covariance) function of a real stationary process is
C x ( t 1 , t 2 ) ≡ C x (τ) = 〈[x( t 1 ) − m][x( t 2 ) − m]〉
or equivalently,
C x (τ ) = R x ( τ ) − m 2
When the mean is zero, the autocovariance and

autocorrelation functions are identical. Also, when τ = 0,
the autocovariance reduces to Cx (0) = Var[x( t)] = σ2x .

Random Processes 53
Wide-Sense Stationary Process
To qualify as a strict stationary process, all marginal

and joint density functions of x( t) must be independent
of the choice of time origin. However, in most practical
situations it is impossible to analyze all marginal and joint
density functions beyond second order, so only processes
that are wide-sense stationary are sought.
Wide-sense stationary: If the only known is that the
mean value 〈x( t)〉 and variance σ2x are constant, and
that covariance function Cx (τ) depends only on the time
interval τ = t2 − t1 , the random process x( t) is said to be
stationary in the wide sense.
Strict stationary processes are automatically wide-sense
stationary, but the converse is not necessarily true. For
many wide-sense stationary processes, it is true that
C x ( τ) → 0 , |τ| → ∞
Analogous to the correlation coefficient defined in the

third chapter, we introduce the normalized covariance
function as
C x (τ )
ρ(τ) =
C x (0)
Because the maximum of the covariance function occurs
at τ = 0, it follows that
−1 ≤ ρ(τ) ≤ 1
The basic properties of wide-sense stationary processes

are as follows:
R x (0) = 〈x2 ( t)〉 ≥ 0, C x (0) = σ2x ≥ 0
R x (−τ) = R x (τ), C x (−τ) = C x (τ)
|R x (τ)| ≤ R x (0), |C x (τ)| ≤ C x (0)

54 Random Processes
Example: Correlation and PDF
Consider the random process x( t) = acos ω t + bsin ω t,

where ω is a constant and a and b are statistically
independent Gaussian RVs, satisfying
〈a〉 = 〈b〉 = 0, 〈a2 〉 = 〈b2 〉 = σ2
Determine
1. the correlation function for x( t), and
2. the second-order PDF for x1 and x2 .
Solution: (1) Because a and b are statistically independent

RVs, it follows that 〈ab〉 = 〈a〉〈b〉 = 0, and thus
R x ( t 1 , t 2 ) = 〈(acos ω t 1 + bsin ω t 1 )(acos ω t 2 + bsin ω t 2 )〉
= 〈a2 〉 cos ω t 1 cos ω t 2 + 〈b2 〉 sin ω t 1 sin ω t 2
= σ2 cos[ω( t 2 − t 1 )]
or
R x ( t 1 , t 2 ) = σ2 cos ωτ, τ = t2 − t1
(2) The expected value of the random process x( t) is 〈x( t)〉 =
〈a〉 cos ω t + 〈b〉 sin ω t = 0. Hence, σ2x = R x (0) = σ2 , and the
first-order PDF of x( t) is given by
1 2 2
f x ( x, t) = p e− x /2σ
σ 2π
The second-order PDF depends on the correlation
coefficient between x1 and x2 , which, because the mean
is zero, can be calculated from
R x ( τ)
ρx (τ) = = cos ωτ
R x (0)
and consequently,
Ã 2 2
!
1 x − 2 x1 x2 cos ωτ + x2
f x ( x1 , t 1 ; x2 , t 2 ) = 2
exp − 1
2πσ | sin ωτ| 2σ2 sin2 ωτ

Random Processes 55
Time Averages and Ergodicity
The mean and covariance defined thus far represent

what are called ensemble averages. Such averages depend
on knowledge of the various marginal and joint PDFs
associated with the random process. In practice, however,
one usually deals with a single realization of the random
process over some time period T , say (−T /2, T /2). If x( t)
is a particular realization of a given random process, its
finite-time average (mean) is defined by the integral
Z T /2
1
x T ( t) = x( t) dt
T −T /2
where the subscript T denotes the finite-time interval. The

limit T → ∞ yields the long-time average (mean)
Z T /2
1
x( t) = lim x( t) dt
T →∞ T −T /2
The long-time-average correlation function is defined by

Z T /2
1
Rx (τ) = x( t)x( t + τ) = lim x( t)x( t + τ)dt
T →∞ T −T /2
The ergodic process involves random processes for

which ensemble averages can be replaced with time
averages. For example,
x( t) = 〈x( t)〉, Rx (τ) = Rx (τ)
The basic properties of correlation and covariance

functions previously defined are also valid for time
averages if the process is ergodic.
Note that the theory of random processes is usually formu-
lated in terms of ensemble averages, but actual measure-
ments are ordinarily based on time measurements. There-
fore, the assumption that a stationary process is also er-
godic is basic in practice.

56 Random Processes
Structure Functions
Random processes in practice are usually approximated

with sufficient accuracy by stationary random functions
(in the wide sense at least). Nonetheless, there are many
instances in which the assumption of stationarity is not
warranted. This difficulty can often be alleviated if the
random process has stationary increments. That is,
focus can be on difference function x( t + t1 ) − x( t), which
might behave like a stationary process even though the
process x( t) is not stationary. Such functions have what is
called a slowly varying mean and can be described in terms
of structure functions rather than covariance functions.
It is customary in many instances to write a random

process as the sum
x( t) = m( t) + x1 ( t)
where m( t) is the mean of x( t), and x1 ( t) is the fluctuating

part with a mean value of zero.
Structure function:
The structure function associated with the random process

x( t) is
D x ( t 1 , t 2 ) = 〈[x( t 1 ) − x( t 2 )]2 〉 ≈ 〈[x1 ( t 1 ) − x1 ( t 2 )]2 〉
This shows the utility of using the structure function. If

the mean is not constant, but slowly varying, then the
difference of mean values at t1 and t2 is nearly zero.
If the random process x( t) is stationary, then the structure

function is directly related to the covariance function by
D x (τ) = 2[C x (0) − C x (τ)]

Random Processes 57
Cross-Correlation and Cross-Covariance Functions
When more than one random process occurs in an

application, it is customary to consider how the two
processes are related (if indeed they are related).
Cross-correlation function: Let x1 and y2 denote RVs

taken from two real stationary processes x( t) and y( t) at
times t1 and t2 = t1 + τ, respectively. The cross-correlation
function is defined by
R xy ( t 1 , t 2 ) = 〈x( t 1 )y( t 2 )〉
Z ∞Z ∞
= x1 y2 f xy ( x1 , t 1 ; y2 , t 2 ) dx1 d y2
−∞ −∞
where f xy ( x1 , t1 ; y2 , t2 ) is the joint PDF of x( t1 ) and y( t2 ).
Cross-covariance function: The cross-covariance func-

tion is defined by
C xy ( t 1 , t 2 ) = R xy ( t 1 , t 2 ) − 〈x( t 1 )〉〈y( t 2 )〉
The cross-correlation and cross-covariance functions

describe how one random process is related to the other.
If the two processes are statistically independent, then
C x ( t 1 , t 2 ) ≡ 0. The random processes are said to be
uncorrelated if Rxy ( t1 , t2 ) = 〈x( t1 )〉〈y( t2 )〉 for all t1 and
t 2 . The processes are jointly stationary if their joint
PDF does not depend on time and their cross-correlation
function depends only on τ = t2 − t1 .
The basic properties of jointly stationary random

processes are:
R xy (−τ) = R yx (τ), C xy (−τ) = C yx (τ)

q q
|R x (τ)| ≤ R x (0)R y (0), |C x (τ)| ≤ C x (0)C y (0)
2|Rx (τ)| ≤ Rx (0) + Ry (0)

58 Random Processes
Power Spectral Density
The Fourier transform of the (auto)covariance function

C x (τ) is called the power spectral density (PSD)
function. The covariance function quantifies the similarity
of fluctuations with a time difference τ. The PSD
quantifies these fluctuations as a sinusoidal component
at a frequency defined by 1/τ. Essentially, both of these
functions provide the same information for whatever
random process is studied.
Because random processes do not satisfy the basic

condition for the existence of a Fourier transform, the
PSD cannot be obtained directly from the transform of the
random process. Instead, the PSD of a stationary random
process x( t) is defined by the Fourier transform integral
Z ∞
S x ( ω) = e− iωτ C x (τ) d τ
−∞
The covariance function can likewise be derived from the

PSD by the inverse relation
Z
1 ∞
C x (τ) = e iωτ S x (ω) d ω
2π −∞
The previous expressions are widely known as the

Wiener–Khinchin theorem. The quantity ω = 2π f is
angular frequency, where f denotes linear frequency.
Note that it is also common in practice to define the PSD

by the Fourier transform of the (auto)correlation function
rather than the covariance function. Doing so leads to
a PSD that is exactly the same as before, except for an
impulse function at dc (ω = 0).

Random Processes 59
Example: PSD
Given the stationary random process
x( t) = A cos(ω0 t + ϕ)
where ϕ is a RV uniformly distributed over (0, 2π), and A

and ω0 are constants, determine the PSD.
Solution: In this case the covariance function must first

be calculated. Based on the previous correlation and PDF
example, it follows that the mean value of x( t) is zero and
the covariance function is
A2
C x (τ) = cos ω0 τ
2
Thus, by writing
1
cos ω0 τ = ( e iω0 τ + e− iω0 τ )
2
it follows that
Z
A2 ∞
S x (ω) = [ e− i(ω−ω0 )τ + e− i(ω+ω0 )τ ]d τ
4 −∞
π A2
= [δ(ω − ω0 ) + δ(ω + ω0 )]
2
where δ( x) is the Dirac delta function.
The power spectral density measures the frequency

content of a random process and helps identify
periodicities. The PSD shows at which frequencies the
variations of the random process are strong and at which
frequencies the variations are weak.

60 Random Processes
PSD Estimation
Let x( t) be a stationary random process and xT ( t) a

truncated sample function from it, defined by
½
x( t), | t| < T
x T ( t) =
0, | t | > T
such that
x( t) = lim xT ( t)
T →∞
If X T ( f ) is the Fourier transform of xT ( t), where f is linear

frequency, then it can be shown that (Parseval’s theorem)
Z Z
1 ∞ 1 ∞
x2T ( t) dt = | X T ( f )| 2 d f
2T −∞ 2T −∞
Thus, the left-hand side of the equation is the average

power of the sample function in time interval −T < t < T .
Consequently, in the limit T → ∞, it follows that
Z ∞ E [| X T ( f )|2 ]
x2 ( t ) = lim df
−∞ T →∞ 2T
From this last expression, it is deduced that the PSD can

be estimated from
E [| X T ( f )| 2 ]
S x ( f ) = lim
T →∞ 2T
The power spectral density is usually estimated

by Fourier transform methods as above, but other
techniques such as Welch’s method and the maximum
entropy method can also be used.

Random Processes 61
Bivariate Gaussian Processes
Because Gaussian random processes are so common in

practice, it is useful to develop a few properties peculiar to
this class. A real random process x( t) is called a Gaussian
process if RVs x( t1 ), x( t2 ), . . . , x( t n ) are jointly Gaussian for
any n and times t1 , t2 , . . . , t n . When only two such variables
exist, the random process is called a bivariate Gaussian
process.
Sums of Gaussian processes: Sums and/or differences
of Gaussian random processes are also Gaussian.
If x and y are jointly Gaussian RVs with zero means, their
joint PDF is the bivariate distribution
" Ã !#
1 1 2ρ x y y2
x2
f xy ( x, y) = q exp − − +
2πσ1 σ2 1 − ρ2 2(1 − ρ2 ) σ21 σ1 σ2 σ22
where
〈xy〉
σ21 = 〈x2 〉, σ22 = 〈y2 〉, and ρ =
σ1 σ2
The quantity ρ is the correlation coefficient.
If m x and m y denote the mean values of x and y,
respectively, then the bivariate normal PDF can be
deduced from the above by replacing x and y, respectively,
with the new variables ( x − m x ) and ( y − m y ). Moreover, for
normalized variables u = ( x − m x )/σ1 and v = ( y − m y )/σ2 , the
bivariate normal PDF becomes
" #
1 x2 − 2ρ x y + y2
f xy ( x, y) = q exp −
2 π 1 − ρ2 2(1 − ρ)2
Because a bivariate Gaussian process is completely

specified by its first- and second-order moments, it follows
that a wide-sense stationary Gaussian process is also
stationary in the strict sense.

62 Random Processes
Multivariate Gaussian Processes
The multivariate Gaussian distribution for n jointly

Gaussian RVs x1 , x2 , . . . , xn is
f x ( x1 , x2 , . . . , xn )
· ¸
1 1
= p exp − (x − m)T C−1 (x − m)
(2π)n/2 | det(C)| 2
where det(·) is the determinant, and x and m are column

and mean column vectors, defined respectively by
   
x1 m1
x =  ...  , m =  ... 
   
xn mn
To avoid confusion in notation, RVs that appear as

matrix elements are not denoted by bold letters.
Also, (x − m)T denotes the transpose of column vector

(x − m), and C is the covariance square matrix
 
c 11 c 12 · · · c 1n
C =  ... .. . 

· · · .. 
.
c n1 c n2 · · · c nn
where
c i j = Cov(x i x j ) = 〈( x i − m i )( x j − m j )〉; i, j = 1, 2, . . . , n
The notation C−1 denotes the matrix inverse. For a

bivariate Gaussian process, we have
µ ¶
σ21 ρσ1 σ2
C=
ρσ1 σ2 σ22
and if RVs x1 , x2 , . . . , xn are uncorrelated, then the
covariance matrix reduces to a diagonal matrix with
diagonal elements σ21 , σ22 , . . . , σ2n and all other elements
zero.

Random Processes 63
Examples of Covariance Function and PSD
The following are some common examples of covariance

functions and their corresponding PSDs.

64 Random Processes
Interpretations of Statistical Averages
Regardless of whether a random process x( t) is real or

complex, the PSD Sx (ω) is a real function. Moreover, if
x( t) is a real stationary process, the PSD is real and
even, i.e., Sx (−ω) = Sx (ω).
From an engineering point of view, the following
interpretation of statistical averages of ergodic processes
is useful.
1. The mean value 〈x( t)〉 is the dc component of the

signal.
2. The mean-squared value 〈x( t)〉2 = Rx (∞) is the
power in the dc component of the signal.
3. The second moment 〈x2 ( t)〉 = Rx (0) is the total
average power of the signal.
4. The variance σ2x = Cx (0) is the total average power
in the ac component of the signal.
5. The standard deviation σx is the root-mean-
square (RMS) value of the ac component of the
signal.
The interpretations given above are specific primarily

to electrical engineering. In other areas of application,
the interpretations of statistical averages can be quite
different.

65
Random Fields
A random function of a vector spatial variable R = ( x, y, z)
and possibly time t is called a random field. For the
complete description of a random field, it is necessary to
know its joint probability distributions of all orders, much
the same as for a random process. Because this is an
impossible task in practice, it is customary to describe
a random field in terms of its lowest-order statistical
moments. Consequently, the treatment of a random field
for the most part parallels that given in the fifth chapter
for a random process. For notational simplicity, it is useful
to suppress the time dependency of a random field and
treat it simply as a function of spatial variable R.

66 Random Fields
Random Fields Terminology
A random field is the natural generalization of a random

process, where time dependency is replaced by spatial
dependency. Like a random process, it is a collection of
functions and an associated probability description. The
entire collection of such functions is called an ensemble.
Ordinarily, one represents any particular member of the
ensemble by u(R) = u( x, y, z), called a sample function or
realization. For a fixed value of space, say R1 = ( x1 , y1 , z1 ),
the quantity u1 = u(R1 ) can then be interpreted as a RV.
Note that in this section, the bold letter R denotes the
spatial point ( x, y, z), not a RV.
A continuous random field is one in which RVs u1 , u2 , . . .,
can assume any value within a specified range of possible
values. A discrete random field is one in which RVs can
assume only certain isolated values (possibly infinite in
number).
If one imagines sampling random field u(R) at a finite
number of spatial points R1 , R2 , . . . , Rn , then the collection
of RVs uk = u(Rk ), k = 1, 2, . . . , n can be obtained. The
probability measure associated with these RVs can be
described by the joint PDF of order n. However, in practice
it is customary to consider only first- and second-order
statistics.
Typical random fields include, among others, atmospheric
quantities such as wind velocity, temperature, humidity,
and index of refraction. These are all random fluctuating
quantities that depend on a spatial variable R. To
describe these quantities in a statistical sense, one can
introduce the same statistical parameters as used for
describing random processes, viz., mean value, correlation
or covariance function, power spectral density, and so on.
Random fields are multidimensional whereas random

processes are one-dimensional. Thus, the statistics may
vary with position as well as time, leading to a
generalization of nonstationarity.

Random Fields 67
Mean and Spatial Covariance Functions
Assume that u(R) = u( x, y, z) is a complex random field.

The mean or expected value of the random field is
〈u(R)〉 = m(R)
where the brackets 〈 〉 denote an ensemble average.

The spatial autocovariance function, or simply the
covariance function, is a two-point statistic defined by
the ensemble average
Bu (R1 , R2 ) = 〈[u(R1 ) − m(R1 )][u∗ (R2 ) − m∗ (R2 )]〉
where ∗ denotes the complex conjugate.

Statistically homogeneous: It is said that a random
field u(R) is statistically homogeneous if its moments are
invariant under a spatial translation; that is, the mean
〈u(R)〉 = m is independent of the spatial position R, and the
covariance function depends only on the spatial separation
R = R2 − R1 . In this case, it follows that Bu (R1 , R2 ) =
Bu (R2 − R1 ), or equivalently,
Bu (R) = 〈u(R1 )u∗ (R1 + R2 )〉 − | m|2
Statistically isotropic: If the random field has invari-

ance properties with respect to rotations (no preferred di-
rection), it is called statistically isotropic. In this case, it
follows that Bu (R1 , R2 ) = Bu (R ), where R = |R2 − R1 | is the
scalar distance.
Note that the notion of statistical homogeneity is the
spatial counterpart of stationarity in time.

68 Random Fields
1D and 3D Spatial Power Spectrums
Analogous to random processes, it is customary to define

the Fourier transform of a covariance function as the
spatial power spectrum of a random field. In this case,
however, there exists 1D, 2D, or 3D power spectrums.
Because the 2D power spectrum is derived from the 3D
spectrum, we introduce it last.
1D spatial power spectrum: If u(R) is a statistically
homogeneous and isotropic complex random field with
zero mean, its covariance function can be expressed in the
Fourier integral form
Z ∞ Z ∞
Bu (R ) = e iκR Vu (κ) d κ = 2 cos(κR )Vu (κ) d κ
−∞ 0
where κ denotes the spatial frequency (in units of rad/m),

and Vu (κ) is the 1D spatial power spectrum of random field
u(R). The spatial power spectrum can be defined by the
inverse Fourier transform
Z Z
1 ∞ 1 ∞
Vu (κ) = e− iκR Bu (R ) dR = cos(κR )Bu (R ) dR
2π −∞ π 0
3D spatial power spectrum: If u(R) is a statistically

homogeneous random field with zero mean, its covariance
function can be expressed as
Ñ ∞
Bu (R) = e iK·R S u (K) d 3 κ
−∞
where K = (κ x , κ y , κ z ) is a vector spatial frequency, and

S u (K) is the 3D spatial power spectrum. By inverse
Fourier transforms, it follows that
µ ¶3 Ñ
1 ∞
S u (K) = e− iK·R Bu (R) d 3 R
2π −∞

Random Fields 69
2D Spatial Power Spectrum
For the special case in which a random field is

statistically homogeneous and isotropic, the prior 3D
Fourier transform relations reduce to
Z
1 ∞
S u (κ) = Bu (R )sin(κR )RdR
2 π2 κ 0
Z
4π ∞
Bu (R ) = S u (κ)sin(κR )κ d κ
R 0
where κ = |K| is the magnitude of the wave number vector

(vector spatial frequency).
Based on the previous relations, the 3D and 1D spatial
power spectrums are related by
1 dVu (κ)
S u (κ) = −
2πκ d κ
2D spatial power spectrum. Given the 3D spatial power

spectrum Su (κ x , κ y , κ z ), the 2D spatial power spectrum in
the κ x κ y plane is defined by the Fourier transform relation
Z ∞
Fu (κ x , κ y , 0; z) = S u (κ x , κ y , κ z )cos( zκ z ) d κ z
−∞
By properties of the Fourier transform, it follows that

Z
1 ∞
S u (κ x , κ y , κ z ) = Fu (κ x , κ y , 0; z)cos( zκ z ) dz
2π −∞
Because the spatial power spectrum Su (κ x , κ y , κ z ) is an

even function (by definition), these last relations have
been expressed as Fourier cosine transforms.

70 Random Fields
Structure Functions
When a random field is not statistically homogeneous,

but the mean of the field only varies by small amounts
over separation distances of interest, it can be useful to
characterize the random field by the structure function
rather than the covariance function.
Consider a random field represented in the form
u(R) = m(R) + u1 (R)
where m(R) is a nonconstant mean, and u1 (R) is

statistically homogeneous with mean 〈u1 (R)〉 = 0 for
all R. Random fields that permit a decomposition
into a varying mean and a statistically homogeneous
fluctuation are called locally homogeneous, which is the
spatial equivalent of a random process with stationary
increments.
The structure function for a locally homogeneous
random field u(R) is defined by
D u (R1 , R2 ) ≡ D u (R) ≈ 〈[u1 (R1 ) − u1 (R1 + R)]2 〉
and is related to the 3D spatial power spectrum by

Ñ ∞
D u (R) = 2 S u (K)[1 − cos(K · R)] d 3 κ
−∞
For the special case where a random field is statistically

homogeneous and isotropic, it follows that
D u (R ) = 2[Bu (0) − Bu (R )]
Z ∞ µ ¶
2 sin κR
= 8π κ Su (κ) 1 − dκ
0 κR
and
Z · ¸
1 ∞ sin κR d d
S u (κ) = R2 D u (R ) dR
4π2 κ2 0 κR dR dR

Random Fields 71
Example: PSD
Given that the structure function for a particular

statistically homogeneous and isotropic random field u(R )
is defined by
· µ ¶¸
R
D u (R ) = 2 1 − exp −
R0
find the corresponding 3D PSD.
Solution: The relation between the 3D PSD and the
structure function is
Z · ¸
1 ∞ sin κR d d
S u (κ) = 2 2
R2 D u (R ) dR
4π κ 0 κR dR dR
Z ∞ ½ · µ ¶¸¾
1 sin κR d 2 d R
= R 1 − exp − dR
2π2 κ2 0 κR dR dR R0
which reduces to
Z µ ¶
1 ∞ R
S u (κ) = e −R /R 0 2 − sin κRdR
2π2 κ3 R0 0 R0
On evaluation of this last integral, the result is
R 03
S u (κ) =
π2 (1 + R02 κ2 )2
Along similar lines, if the structure function for a

particular statistically homogeneous and isotropic random
field u(R ) is defined in terms of a Gaussian function by
D u (R ) = 2[1 − exp(−R 2 /R 02 )]
corresponding 3D PSD takes the form of a Gaussian

function
Ã !
R 03 R 02
S u (κ) = p exp −
8π π 4

72
Transformations of Random Processes

Signals embedded in random noise are ordinarily passed
through some filtering devices and analyzed by nonlinear
operations, such as mixing and rectification. In this case,
one can imagine a random process x( t) as the input to some
receiver device and y( t) as the output random process. The
relationship between x( t) and y( t) can be described by a
transformation represented by
y( t) = T[x( t)]
The symbol T is an operator that describes the relation
between input and output processes.

Memoryless Nonlinear Transformations
Consider a system in which the output y( t1 ) at time t1

depends only on the input x( t1 ) and not on any other past
or future values of x( t). If the system is designated by the
relation
y( t) = g[x( t)]
where y = g( x) is a function assigning a unique value of
y to each value of x, it is said that the system effects a
memoryless transformation. Because the function g( x)
does not depend explicitly on time t, it can also be said
that the system is time invariant. For example, if g( x) is
not a function of time t, it follows that the output of a time
invariant system to the input x( t + ε) can be expressed as
y( t + ε) = g[x( t + ε)]
If input and output are both sampled at times t1 , t2 , . . . , t n

to produce the samples x1 , x2 , . . . , xn and y1 , y2 , . . . , yn ,
respectively, then
yk = g(xk ), k = 1, 2, . . . , n
This relation is a transformation of the RVs x1 , x2 , . . . , xn

into a new set of RVs y1 , y2 , . . . , yn . It then follows that the
joint density of the RVs y1 , y2 , . . . , yn can be found directly
from the corresponding density of the RVs x1 , x2 , . . . , xn
through the above relationship.
Memoryless processes or fields have no memory of other

events in location or time. In probability and statistics,
memorylessness is a property of certain probability
distributions—the exponential distributions of non-
negative real numbers and the geometric distributions
of non-negative integers. That is, these distributions are
derived from Poisson statistics and as such are the only
memoryless probability distributions.

74 Transformations of Random Processes
Linear Systems
Relationships involving correlation/covariance and PSD

functions between the input and output of a linear
system are important in a variety of engineering
applications.
Linear systems are characterized by their unit impulse

response function h( t) and Fourier transform H (ω),
called the transfer function. If x( t) is a sample function
of a stationary random process, the random linear system
output y( t) is related by
Z ∞ Z ∞
y( t) = h(ξ)x( t − ξ) d ξ = h( t − η)x(η) d η
−∞ −∞
A physically realizable system is one that is causal,

i.e., where h( t) = 0, t < 0. Hence, the previous relation
reduces to
Z ∞ Z t
y( t) = h(ξ)x( t − ξ) d ξ = h( t − η)x(η) d η
0 −∞
If the system has constant parameters, then the

impulse response h( t) is independent of the time at which
a unit impulse is applied. The system is stable if bounded
inputs produce only bounded outputs. A system is linear
if, when input x1 produces output y1 , and x2 produces
y2 , the input C1 x1 + C2 x2 produces C1 y1 + C2 y2 for any
constants C1 and C2 .
An ideal linear system is one that: (1) is physically re-
alizable, (2) has constant (time-independent) parameters,
and (3) is stable and linear.

Expected Values of a Linear System
Input/output relations for a linear system with random

input are presented on the previous page. The expected
value of the random output y( t) of a linear system is
Z ∞
〈y( t)〉 = h(ξ)〈x( t − ξ)〉 d ξ
−∞
If x( t) is stationary, then 〈x( t)〉 = m (constant) and
〈y( t)〉 = mH (0)
where H (0) is the linear system transfer function

evaluated at zero (dc).
The second moment of the output y( t) leads to
Z ∞ Z ∞
〈 y2 ( t )〉 = h(ξ)R x (ξ − η) d ξ d η
−∞ −∞
where Rx (τ) is the autocorrelation function (or correlation

function) of the input x( t).
The correlation function of the output y( t) is
Z ∞ Z ∞
R y (τ) = h(ξ) h(η)R x (τ + ξ − η) d ξ d η
−∞ −∞
and the cross-correlation function satisfies

Z ∞
R xy (τ) = h(ξ)R x (τ − ξ) d ξ = R yx (−τ)
−∞
Finally, the relations between input and output PSDs are

given by
S y (ω) = S x (ω) H (ω) H (−ω) = S x (ω)| H (ω)|2

S xy (ω) = S x (ω) H (ω)
S yx (ω) = S x (ω) H (−ω)

Example: White Noise
If x( t) denotes a white-noise input to a linear system,

its spectral density is simply Sx (ω) = S0 , where S0 is
constant. Find the correlation function Ry (τ) for the output
y( t) of the linear system and the cross-correlation function
R xy (τ).
Solution: Because the Fourier transform of a constant is

the delta function, i.e., Rx (τ) = S0 δ(τ), it follows that a
white-noise process is uncorrelated at distinct times (τ 6=
0). Thus, the correlation function of the output takes the
form
Z ∞ Z ∞
R y ( τ) = S 0 h(ξ) h(η)δ(τ + ξ − η) d ξ d η
Z−∞
∞
−∞
= S0 h(η − τ) h(η) d η
−∞
Hence, the correlation function of the output is propor-

tional to the time correlation function of the impulse re-
sponse function.
For the cross-correlation function, the result is
Z ∞
R xy (τ) = S 0 h(ξ)δ(τ − ξ) d ξ = S 0 h(τ)
−∞
Therefore, it follows that for white-noise input to a

linear system, the cross-correlation function is directly
proportion to the impulse-response function of the system.
The result Rxy (τ) = S0 h(τ) above provides us with a useful
scheme for measuring the impulse response of any linear
system. Rather than rely on the output of a unit impulse
applied directly to the linear system to determine h( t),
which has certain inherent difficulties associated with it,
an alternative is to measure the cross-correlation function
between a white noise input and the corresponding output.

Detection Devices
Communication systems can be broadly classified in terms

of linear operations (amplification and filtering) and
nonlinear operations (modulation and detection).
Common detection devices include the following three
categories, where it is assumed that 〈x( t)〉 = 0.
1. Square-law detector: y( t) = x2 ( t)
The PDFs for the input and output are
" #
1 x2
f x ( x) = p exp − R x (0) = σ2x
2πRx (0) 2Rx (0)
· ¸
1 y
f y ( y) = p exp − U ( y)
2πRx (0) y 2Rx (0)
where U (·) is the unit step function. Expected values

include the following:
〈y( t)〉 = 〈x2 ( t)〉 = R x (0)

〈y2 ( t)〉 = R y (0) = 3R x2 (0)
σ2y = 2Rx2 (0)
R y (τ) = R x2 (0) + 2R x2 (τ)
2. Linear detector (full wave): z( t) = |x( t)|

The output PDF and mean are, respectively,
f z ( z) = 2 f x ( z)U ( z)
s
2Rx (0)
〈z( t)〉 =
π
3. Linear detector (half-wave): w( t) = x( t)U [x( t)]

The output PDF and mean are, respectively,
1
f w ( w) = δ(w) + f x (w)U (w)
2
s
R x (0)
〈z( t)〉 =
π

Zero-Crossing Problem
In some applications, one is interested in the frequency

of positive and negative crossings through zero (or some
nonzero threshold value x0 ) of a random signal. For a
Gaussian random process x( t), the derivative x0 ( t) and
all higher-order derivatives are Gaussian.
The correlation function of the derivative x0 ( t) is related to
the correlation function of x( t) by
R x0 (τ) = −R x00 (τ)
and the PSD function is
S x0 (ω) = ω2 S x (ω)
Also, it follows that
〈 x0 ( t )〉 = 0, 〈[x0 ( t)]2 〉 = −R x00 (0)
The cross-correlation function for x( t) and x0 ( t) is
R xx0 (τ) = R x0 (τ), R xx0 (0) = R x0 (0) = 0
The Gaussian random process x( t) and its derivative x0 ( t)

are statistically independent random processes with joint
density function
" # Ã !
0 1 ( x − m )2 1 x02
f xx0 ( x, x ) = p exp − p exp −
2πσ 2σ2 2π b 2 b2
where m = 〈x( t)〉, σ2 = Var[x( t)], and b2 = −Rx00 (0). The

expected number of positive and negative crossings of
level x0 per second (mean frequency of surges and mean
frequency of fades) is defined by
Z " #
1 ∞
0 0 0 ( x0 − m ) 2
〈 n( x0 )〉 = | x | f xx0 ( x0 , x ) dx = ν0 exp −
2 −∞ 2σ2
p
where ν0 = −R x00 (0)/(2πσ) represents the expected
number of fades or surges through the mean value m.

79

Procedures for analyzing random data are strongly
dependent on certain basic characteristics that the data
do or do not exhibit. Three basic characteristics of
random data that are important in the analysis are: (1)
stationarity of the data, (2) periodicities within the
data, and (3) normality of the data. Most of the material
presented in the previous three chapters is based on the
assumption that the random data is stationary. When the
data is not stationary, procedures for analyzing the data
are generally more complicated. Identifying periodicities
in the data, when they exist, can also be important.
If the statistics are Gaussian (normal), then certain
simplifications in the analysis take place.
In addition to discussing certain tests for stationarity,
periodicity, and normality, some standard methods for
analyzing nonstationary data are briefly introduced.

80 Random Data Analysis
Tests for Stationarity, Periodicity, and Normality
In some cases, the physics of the phenomenon producing

the random data gives a clue to the stationarity of
the data. However, in many practical cases this is
not possible—the stationarity of the data cannot be
determined from physical considerations alone and must
be evaluated through studies of available time history
records.
Test for stationarity: The stationarity of random data
can be tested from a single time record x( t) as follows:
1. Divide the sample record into N equal time
increments, where each time interval is considered
independent.
2. Compute a mean-square value (or mean and
variance) for each time interval x21 , x22 , . . . , x2N .
3. Test the sequence x21 , x22 , . . . , x2N for the presence of
underlying trends:
• sampling distributions
• hypothesis tests
• run test and reverse arrangement test.
Test for periodicity: Any periodic or almost-periodic

components that are present in otherwise random data
can usually be identified by the appearance of delta
functions (sharp peaks) in the PSD.
Test for normality: The most direct method to test
samples of stationary random data for normality is to
measure the PDF of the data and compare it to a
theoretical normal distribution with the same mean and
variance.

Nonstationary Data Analysis for Mean
Random data collected in practice are usually consid-

ered nonstationary when viewed as a whole. The gen-
eral probability structure for analyzing nonstationary pro-
cesses is presented in the section Sums of N Complex Ran-
dom Variables. In some cases, however, ensemble aver-
aging of sample time records can produce useful results.
Nonstationary mean values: For the collection of
sample time records xk ( t), 0 ≤ t ≤ T, k = 1, 2, . . . , N taken
from a nonstationary process x( t), the mean value at any
time t is estimated by
N
1 X
µ̂x ( t) = x k ( t)
N k=1
The quantity µ̂x ( t) is an unbiased estimator because

N
1 X
〈µ̂x ( t)〉 = 〈 x k ( t )〉 = m x ( t )
N k=1
where m x ( t) is the true mean of x( t).

Variance of estimator: If the N sample functions used to
calculate the estimate µ̂x ( t) are statistically independent,
then the sample variance of the estimator satisfies
σ2x
Var[µ̂x ( t)] =
N
where σ2x is the variance of x( t). Hence, µ̂x ( t) is considered
a consistent estimator of m x ( t) for all t. If the N
sample functions used to calculate the estimate µ̂x ( t) are
correlated, then
σ2x −1
2 NX
Var[µ̂x ( t)] = + 2
( N − j + k)[Rx ( j − k, t) − m2x ( t)]
N N k=1
where Rx ( j − k, t) is the cross-correlation function between

all pairs of time records x j ( t) and xk ( t).

Analysis for Single Time Record
In practice, it is common to have only one sample record

of data available for a given nonstationary process. In this
case, the mean value of the nonstationary process must be
estimated from the single sample record.
Consider the nonstationary random process
x( t) = m( t) + u( t)
where m( t) = 〈x( t)〉 is a deterministic function of time, and

u( t) is a stationary random process with zero mean. It is
assumed that variations of m( t) are slow compared to the
lowest frequency of u( t). Thus, the time function m( t) can
be separated from the random process u( t) through low-
pass filtering.
The mean value estimate of x( t) from a single time

record is obtained from the short-time average over T ,
given by
Z t+T /2
µ̂x ( t) = x( t)dt
t−T /2
However, in this case µ̂x ( t) is a biased estimator, because

its average value leads to
Z t+T /2 Z t+T /2
〈µ̂x ( t)〉 = 〈x( t)〉 dt = m( t) dt 6= m( t)
t−T /2 t−T /2
A first-order approximation to the bias error in µ̂x ( t) is

given by the expression T 2 m00 ( t)/24, where m00 ( t) is the
second time derivative of m( t).

Runs Test for Stationarity
A procedure called the runs test can be used to

identify a range of segment lengths of data, for which
a nonstationary random process can be considered
stationary.
The runs-test procedure makes no assumption as to the

probability distribution of the data being analyzed but
is based on two assumptions: (1) the sample record is
long enough to reflect the nonstationary character of the
random process, and (2) the sample record is very long
compared to the lowest-frequency component in the data.
The procedure starts with a statistic such as the mean,

variance, or median value. If the mean is chosen, a run
is a sequence of adjacent segments whose means are
above (positive +) or below (negative −) the median of the
entire dataset. Progressing segment by segment, the run
ends when the next segment does not match the previous
segment. For example, the sequence of six runs
++++−−−+++−−−+++++−−−
has three positive runs and three negative runs.
The runs-test algorithm for computing the stationary

segment length from a set of nonstationary data is as
follows.
1. Divide the entire set of sampled data into individual

segments of equal length.
2. Compute the mean for each segment of data.
3. Count the total number of runs above and below

the median within each segment of the entire set of
sampled data. This number is a RV r = n+ + n− .
4. Compare the number of runs found to known

probabilities of runs for random data.

Runs Test for Stationarity (cont.)
For example, if either n+ > 20 or n− > 20, the sampling

distribution of r approximates a normal distribution with
mean and variance.
2n+ n− 2 n+ n− (2 n+ n− − n+ − n− )
m= + 1, σ2 =
n+ + n− ( n+ + n− )2 ( n+ + n− − 1)
In this case, the run variable r can be transformed to the
normalized variable
r−m
Z=
σ
If one assumes that the data are random and there is no
underlying trend in the data, then the variable Z has a
value for the number of runs for the segment of data that
satisfies
| Z | ≤ 1.96
with a probability of 5% or less that the assumption is

incorrect.
The reasoning above is that these parameters do not
depend on the “+” and “−” having equal probability,
but only on the assumption that they are generated
independently and are identically distributed. The
algorithm can be repeated for any sample data length for
determining the range of stationarity within the data set.
Last, we point out there are several alternative
formulations of the runs test in the literature. For
example, a series of coin tosses would record a series of
heads and tails. A run of length r is r consecutive heads
or r consecutive tails. One could code a sequence of, say,
N = 10 coin tosses HHHHTTHTHH as
1234323234
where a head is coded as an increasing value and a tail is

coded as a decreasing value.

85
Equation Summary
Cumulative distribution function:
Fx ( x) = Pr(x ≤ x), −∞ < x < ∞

Pr(a < x ≤ b) = Fx (b) − Fx (a); Pr(x > x) = 1 − Fx ( x)
Probability density function:

Z x
dFx ( x)
f x ( x) = ; Fx ( x) = f x ( u) du
dx −∞
Z ∞ Z b
f x ( x) dx = 1; Pr(a < x ≤ b) = Fx (b) − Fx ( b) = f x ( u) du
−∞ a
∞
X ∞
X
f x ( x) = Pr(x = xk )δ( x − xk ), Fx ( x) Pr(x = xk )U ( x − xk )
k=1 k=1
Expected value:
Z ∞
E [ g(x)] = g( x) f x ( x) dx
Z ∞ −∞
E [(x − m)n ] = µn = ( x − m)n f x ( x) dx, n = 2, 3, 4, . . .
−∞
Characteristic equation:
Z ∞ ∞
X
Φx ( s) = E [ e isx ] = e isx f x ( x) dx; Φx ( s ) = e isxk Pr(x = xk )
−∞ k=1
Z
1 ∞
f x ( x) = e− isx Φx ( s) ds
2π −∞
Gaussian distribution:
" #
1 ( x − m )2
f x ( x) = p exp − , −∞ < x < ∞
σ 2π 2σ2
· µ ¶¸ Z
1 x−m 2 x − t2
Fx ( x) = 1 + erf p , erf( x) = p e dt
2 σ 2 π 0
Ã !
σ2 s2
Φx ( s) = exp ims −
2

86
Equation Summary
Conditional probability:
Pr( A ∩ B) = Pr(B | A )Pr( A );

Pr( A ∩ B)
Pr(B | A ) = , Pr( A ) 6= 0
Pr( A )
Bayes’ theorem:
Pr(B | A )Pr( A )
Pr( A | B) = , Pr(B) 6= 0
Pr(B)
Bayes’ theorem:
f x ( x | y) f y ( y)
f y ( y | x) =
f x ( x)
Conditional distribution function:

Pr[(x ≤ x) ∩ A ]
Fx ( x | A ) = Pr(x ≤ x | A ) =
Pr( A )
Conditional density function:
dFx ( x | A )
fx (x | A) =
dx
Joint distribution function:
Fxy ( x, y) = Pr(x ≤ x, y ≤ y)
Joint density function:

Z Z
∂2 Fxy ( x, y) x y
f xy ( x, y) = ; Fxy ( x, y) = f xy (ξ, η) d ξ d η
∂ x∂ y −∞ −∞
Marginal density function:

Z ∞ Z ∞
f x ( x) = f xy ( x, η) d η; f y ( y) = f xy (ξ, y) d ξ
−∞ −∞
Conditional distribution and density functions:

Z
Fxy ( x, y) 1 y
Fx ( x | y ≤ y) = ; f x ( x | y ≤ y) = f xy ( x, η) d η
F y ( y) F y ( y) −∞

87
Equation Summary
Statistically independent random variables:
Fxy ( x, y) = Fx ( x)Fy ( y); f xy ( x, y) = f x ( x) f y ( y)

f x ( x | y) = f x ( x); f y ( y | x ) = f y ( y)
Expected values of joint random variables:

Z ∞ Z ∞
E [ g(x, y)] = g( x, y) f xy ( x, y) dxd y
−∞ −∞
Z ∞Z ∞
m jk = E [x j yk ] = j k
x y f xy ( x, y) dxd y, j, k = 1, 2, 3, . . .
−∞ −∞
Covariance of joint random variables:
Cov(x, y) = E [xy] − E [x]E [y]
Correlation coefficient of joint random variables:
Cov(x, y)
ρ= ; σ2x = Var(x), σ2y = Var(y)
σx σ y
Bivariate Gaussian density function:
1
f xy ( x, y) = q
2πσ x σ y 1 − ρ2
 ( y− ȳ)2

( x− x̄)2 2ρ( x− x̄)( y− ȳ)
− +
 σ2x σx σ y σ2y 
× exp − 
2(1 − ρ2 )
Conditional expectation:
Z ∞ Z ∞
E [ g(x, y) | A ] = g( x, y) f x,y ( x, y | A ) dxd y
−∞ −∞

88
Equation Summary
Autocorrelation function (random process):

®
R x ( t 1 , t 2 ) ≡ R x (τ) = x( t 1 )x( t 2 )
Z ∞Z ∞
= x1 x2 f x ( x1 , x2 ; τ) dx1 dx2
−∞ −∞
Autocovariance function (random process):

®
C x ( t 1 , t 2 ) ≡ C x (τ) = [x( t 1 ) − m][x( t 2 ) − m] ;
C x (τ ) = R x ( τ ) − m 2
Long-time mean value:

Z T /2
1
x( t) = lim x( t)dt
T →∞ T −T /2
Long-time-average correlation function:

Z T /2
1
Rx (τ) = x( t)x( t + τ) = lim x( t)x( t + τ) dt
T →∞ T −T /2
Structure function (random process):

D E
D x ( t 1 , t 2 ) = [x( t 1 ) − x( t 2 )]2
Cross-correlation function (random process):

®
R xy ( t 1 , t 2 ) = x( t 1 )y( t 2 )
Z ∞ Z ∞
= x1 y2 f xy ( x1 , t 1 ; y2 , t 2 ) dx1 d y2
−∞ −∞
Cross-covariance function (random process):

® ®
C xy ( t 1 , t 2 ) = R xy ( t 1 , t 2 ) − x( t 1 ) y( t 2 )
Power spectral density function (random process):

Z Z
∞ 1 ∞
S x (ω) = e− iωτ C x (τ) d τ; C x (τ) = e iωτ S x (ω) d ω
−∞ 2π −∞

89
Equation Summary
Mean value (random field):

®
u(R) = m(R)
Autocovariance function (random field):

®
Bu (R1 , R2 ) = [u(R1 ) − m(R1 )][u ∗ (R2 ) − m ∗ (R2 )]
One-dimensional spatial power spectrum:

Z Z
1 ∞ 1 ∞
Vu (κ) = e− iκR Bu (R ) dR = cos(κR )Bu (R )dR
2π π 0
Z−∞
∞ Z ∞
Bu (R ) = e iκR Vu (κ) d κ = 2 cos(κR )Vu (κ)d κ
−∞ 0
Two-dimensional spatial power spectrum:

Z ∞
Fu (κ x , κ y , 0; z) = S u (κ x , κ y , κ z )cos( zκ z ) d κ z
−∞
Z ∞
1
S u (κ x , κ y , κ z ) = Fu (κ x , κ y , 0; z)cos( zκ z ) dz
2π −∞
Three-dimensional spatial power spectrum:

µ ¶3 Z Z Z
1 ∞
S u (K) = e− iK·R Bu (R) d 3 R
2π −∞
Z Z Z ∞
i K·R
Bu (R) = e S u (K) d 3 κ
−∞
Structure function (locally homogeneous and

isotropic random field):
Z Z Z ∞
D u (R) = 2 S u (K) [1 − cos(K · R)] d 3 κ
−∞
Z · ¸
1 ∞ sin κR d d
S u (κ) = R2 D u (R ) dR
4π2 κ2 0 κR dR dR

90
Bibliography
Andrews, L. C. and R. L. Phillips, Mathematical

Techniques for Engineers and Scientists, SPIE Press,
Bellingham, WA (2003) [doi:10.1117/3.467443].
Andrews, L. C. and R. L. Phillips, Laser Beam Propagation
through Random Media, 2nd ed., SPIE Press, Bellingham,
WA (2005) [doi:10.1117/3.626196].
Beckmann, P., Probability in Communication Engineering
Harcourt, Brace, & World, New York (1967).
Bendat, J. S. and A. G. Piersol, Random Data: Analysis
and Measurement Procedures, 2nd ed., John Wiley & Sons,
New York (1986).
Cramér, H., Mathematical Methods of Statistics, Princeton
University Press, Princeton, NJ (1946).
Davenport, Jr., W. B., Probability and Random Processes:
An Introduction for Applied Scientists and Engineers,
McGraw-Hill, New York, (1970).
Dougherty, E., Random Processes for Image and Sig-
nal Processing, SPIE Press, Bellingham, WA (1998)
[doi:10.1117/3.268105].
Frieden, B. R., Probability, Statistical Optics, and Data
Testing, 2nd ed., Springer-Verlag, New York (1991).
Helstron, C. W., Probability and Stochastic Processes for
Engineers, 2nd ed., Macmillan, New York (1991).
Papoulis, A., Probability, Random Variables, and Random
Processes, 2nd ed., McGraw-Hill, New York (1984).
Yaglom, A. M., An Introduction to the Theory of Stationary
Random Functions, Prentice-Hall, Englewood Cliffs, NJ
(1962).

91
Index
1D spatial power conditional distribution

spectrum, 68 function, 20
2D spatial power conditional expectations,
spectrum, 69 37
3D spatial power conditional expected
spectrum, 68 value, 21
conditional independence,
amplification, 77 29
associated conditional conditional PDF, 50
PDF, 20 conditional probability, 18
autocorrelation function, conditional variance, 21
52 consistent, 81
autocovariance function, constant parameters, 74
52 continuous random
average power, 64 process, 49
average power in the ac continuous RV, 3
component, 64 correlation, 30
axiomatic approach, 2 correlation coefficient, 30,
axioms of probability, 2 32
correlation functions, 48
Bayes’ theorem, 19, 26
covariance, 30
beta distribution, 10
covariance functions, 67
bias error, 82
cross-correlation
biased estimator, 82
functions, 48, 57, 78
binomial distribution, 12
cross-covariance function,
bivariate distribution, 61
57
bivariate Gaussian
cross-power spectral
density function, 32
densities, 48
bivariate Gaussian
cumulative distribution
process, 61
function (CDF), 3
bivariate PDF, 25
dc component, 64
Cauchy distribution, 10 detection, 77
causal system, 74 discrete random process,
central limit theorem, 39 49
central moments, 5 discrete RV, 3
characteristic function, 8 distribution function, 50
Chebyshev inequality, 13
conditional central ensemble, 49, 66
moments, 21 ensemble average, 52, 67

92
Index
ensemble averaging, 81 long-time average, 55

ergodic process, 55
even PSD, 64 marginal density
event, 2 function, 25
expected value, 5, 30, 52, marginal distributions, 24
75 maximum-likelihood
estimate (MLE), 28
fades, mean frequency, 78 mean-square estimation
filtering, 77 error, 37
finite-time average, 55 mean-square values, 48
first-order PDF, 50 mean value, 5
mean value estimate, 82
gamma distribution, 9 memoryless, 73
gamma–gamma modulation, 77
distribution, 11 moment-generating
Gaussian (normal) function, 7
distribution, 8 multivariate Gaussian
Gaussian process, 61 distribution, 62
Gaussian processes
sums of, 61 negative exponential PDF,
Gaussian random process, 10
78 nonlinear operations, 77
nonstationary, 51, 81
ideal linear system, 74
nonstationary mean
independent events, 19
values, 81
joint distribution nonstationary random
function, 24 process, 83
joint moments, 30 normality, 79
joint PDF, 48 normalized covariance
jointly stationary, 57 function, 53
not uniformly distributed,
K distribution, 10 42
law of large numbers, 14 orthogonal, 30

linear detector (full wave),
77 periodicities, 79
linear operations, 77 physically realizable
linear system, 74 system, 74
locally homogeneous, 70 Poisson distribution, 12

93
Index
posterior density function, second moment, 64, 75

28 second-order PDF, 50
power spectral density, 48 spatial autocovariance
power spectral density function, 67
(PSD), 58 square-law detector, 77
principle of maximum stable system, 74
likelihood, 28 standard conditional
probability, 2 moments, 21
probability density standard deviation, 5
function (PDF), 4 standard statistical
probability distribution, 3 moments, 5
product, 36 stationarity, 79
PSD function, 78 stationary, 48
stationary increments, 56
quotient, 36 stationary random
process, 51
random experiment, 2 statistically
random field, 65 homogeneous, 67
complex, 67 statistically independent,
random phasor, 41 29
random process, 48 statistically isotropic, 67
random variable (RV), 3 stochastic process, 49
Rayleigh distribution, 9 strict stationary process,
real function, 64 53
real PSD, 64 structure function, 56, 70
real stationary process, 64 sum, 35
realization, 49, 66 surges, mean frequency
regression curve, 37 of, 78
relative frequency, 2
relative frequency test for normality, 80
approach, 2 test for periodicity, 80
root-mean-square (RMS) test for stationarity, 80
value of the ac time invariant, 73
component, 64 total probability, 26
runs test, 83 transfer function, 74
runs-test algorithm, 83 transformation, 72, 73
trial, 2
sample function, 49
sample space, 2 unbiased estimator, 81

94
Index
uncorrelated variables, 30 variance, 5

uniform distribution, 9 variance of estimator, 81
unit impulse response wide-sense stationary, 53
function, 74 Wiener–Khinchin
universal set, 2 theorem, 58

Larry C. Andrews is Professor Emeritus
of Mathematics at the University of
Central Florida and an associate member
of the College of Optics/CREOL. He is
also an associate member of the Florida
Space Institute (FSI). Previously, he held
a faculty position at Tri-State University
and was a staff mathematician with
the Magnavox Company, antisubmarine
warfare (ASW) operation. He received a
doctoral degree in theoretical mechanics in 1970 from
Michigan State University. Dr. Andrews is a Fellow of
SPIE and has been an active researcher in optical wave
propagation through random media for more than 30
years and is the author or coauthor of twelve textbooks on
topics of differential equations, boundary value problems,
special functions, integral transforms, wave propagation
through random media, and mathematical techniques for
engineers. Along with wave propagation through random
media, his research interests include special functions,
random variables, atmospheric turbulence, and signal
processing.
Ronald L. Phillips is Professor
Emeritus at the University of Cen-
tral Florida. He holds appointments
in the Department of Electrical En-
gineering, Mathematics, and the
Townes Laser Institute in the Col-
lege of Optics/CREOL. He has held
positions on the faculties at Arizona
State University and the Univer-
sity of California. He received a doc-
toral degree in Electrical Engineering in 1970 from Ari-
zona State University. Dr. Phillips has been an active re-
searcher in the area of wave propagation through random
media for more than 38 years. He was awarded a Senior
NATO Postdoctoral Fellow in 1977 and the American So-
ciety for Engineering Education 1983 Medal for Outstand-
ing Contributions in Research. Dr Phillips is a SPIE Fel-
low, OSA Fellow, and an AIAA Fellow. He is the coauthor
of three books on wave propagation through random media
and a text book on mathematical techniques for engineers.
He has taught industry short courses on the topic of prop-
agation through turbulence and its effects on free-space
optical systems for the last 15 years. In addition to opti-
cal wave propagation, his research interests include free
space optical communications, active imaging, and laser
radar.
Probability,
Larry C. Andrews and Ronald L. Phillips
Mathematical theory in engineering and science usually
involves deterministic phenomena. Such is the case in
solving a differential equation that describes some linear
system where both the input and output are deterministic
quantities. In practice, however, the input to a linear system
may contain a “random” quantity that yields uncertainty
about the output. Such systems must be treated by
probabilistic methods rather than deterministic methods.
For this reason, probability theory and random process
theory have become indispensable tools in the mathematical
analysis of these kinds of engineering systems. This book
covers basic probability theory, random processes, random
fields, and random data analysis.
SPIE Field Guides

The aim of each SPIE Field Guide is to distill a major field of
optical science or technology into a handy desk or briefcase
reference that provides basic, essential information about
optical principles, techniques, or phenomena.
Written for you—the practicing engineer or scientist—
each field guide includes the key definitions, equations,
illustrations, application examples, design considerations,
methods, and tips that you need in the lab and in the field.
John E. Greivenkamp
Series Editor
P.O. Box 10
Bellingham, WA 98227-0010
ISBN: 9780819487018
SPIE Vol. No.: FG22
www.spie.org/press/ﬁeldguides

Field Guide To Probability Random Processes and Random Data Analysis

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Field Guide To Probability Random Processes and Random Data Analysis

Hochgeladen von

Copyright:

Verfügbare Formate

SPIE PRESS | Field Guide

SPIE Field Guides

John E. Greivenkamp, Series Editor

Bellingham, Washington USA

Copyright © 2012 Society of Photo-Optical Instrumenta-

All rights reserved. No part of this publication may be re-

The content of this book reflects the work and thought of

Printed in the United States of America.

John E. Greivenkamp, Series Editor

Field Guide to Probability, Random Processes, and Random Data Analysis

Keep information at your fingertips with all of the titles in

Field Guide to Probability, Random Processes, and Random Data Analysis

Developed in basic courses in engineering and science,

Field Guide to Probability, Random Processes, and Random Data Analysis

Glossary of Symbols and Notation x

Probability: One Random Variable 1

Probability: Two Random Variables 23

Field Guide to Probability, Random Processes, and Random Data Analysis

Sums of N Complex Random Variables 38

Transformations of Random Processes 72

Field Guide to Probability, Random Processes, and Random Data Analysis

Example: White Noise 76

Random Data Analysis 79

Field Guide to Probability, Random Processes, and Random Data Analysis

Glossary of Symbols and Notation

a, x, u, etc. Random variable, process, or field

Field Guide to Probability, Random Processes, and Random Data Analysis

Glossary of Symbols and Notation

Field Guide to Probability, Random Processes, and Random Data Analysis

Probability: One Random Variable

Field Guide to Probability, Random Processes, and Random Data Analysis

Terms and Axioms

Some of the terms used in discussing random happenings

Field Guide to Probability, Random Processes, and Random Data Analysis

Random Variables and Cumulative Distribution

A probability distribution shows the probabilities

• Discrete RV: a variable that can only take on certain

For a given RV x, there are three primary events to

{x ≤ a}, {a < x ≤ b}, {x > b}

For the general event {x ≤ x}, where x is any real number,

Fx ( x) = Pr(x ≤ x), −∞ < x < ∞

The CDF is a probability and thus satisfies the following

We also note that

Pr(a < x ≤ b) = Fx ( b) − Fx (a)

Field Guide to Probability, Random Processes, and Random Data Analysis

Probability Density Function

If x is a continuous RV, its probability density

The shaded area in the

Because the probability Fx ( x) is nondecreasing, it follows

Also, by virtue of axiom 2, we see that

where U ( x − a) is the unit step function, and δ( x − a) =

Field Guide to Probability, Random Processes, and Random Data Analysis

Expected Value: Moments

If x is a continuous RV, the expected value of any

For a case when g(x) = xn , n = 1, 2, 3, . . . , one obtains

Var(x) ≡ σ2x = m 2 − m21

Variance is the central moment Var(x) = µ2 .

Field Guide to Probability, Random Processes, and Random Data Analysis

Example: Expected Value

Calculate moments E [xn ], n = 1, 2, 3 . . . of the Rayleigh

where U ( x) is the step function. From the first two

where Γ( x) is the gamma function. For special cases n = 1

From these two moments, we can calculate the variance