Sie sind auf Seite 1von 7

Chapter 1

Probability Review
1.1

Events and Probabilities

The central notion of probability is a random experiment which may produce different results each
time it is performed. Such experiments have three natural components:
1) The experiment has a set of possible outcomes.
2) Outcomes are observed through measurements. An event is a subset of outcomes that yield
the same measurement.
3) Each event has a certain probability. An intuitive definition of the probability of an event is
its relative frequency, which is the fraction of experiments which result in that event when the
experiment is repeated a great many times.
The theory of probability captures these components via the triplet (S, F, P ), which is called a
probability system. Here S is the sample space and denotes the set of possible outcomes, F is the set
of possible events, which is a collection of subsets of the sample space S, and finally the probability
P : F 7 [0, 1] is a function whose argument is an event and whose value lies between 0 and 1. Hence
for each event E F, 0 P (E) 1 is called the probability of event E, that is, the probability
that the outcome belongs to E.
The events F and the probability function P should satisfy certain minimal conditions in order
to be consistent with our intuitive understanding of probability. Namely,
a1) S F
a2) If A F then Ac F
a3) The event set F is closed under set operations,
and
b1) P (S) = 1
b2) If events A1 , A2 , are mutually exclusive then P (A1 A2 ) = P (A1 ) + P (A2 ) + .
1

(Two events are called mutually exclusive if they cannot occur simultaneously, i.e. A B = .) All
other relations involving probabilities are derived from the above axioms. For example,
A B P (A) P (B)
P (Ac ) = 1 P (A)

[
X
P ( Ai )
P (Ai ).
i=1

i=1

We frequently seek the probability that the outcome has a certain property (i.e. a certain event
occurs) when we already know that it has another property (i.e. another event has occurred). The
conditional probability of event A given that event B has occurred is defined as
P (A|B) =

P (A B)
P (B)

whenever P (B) > 0.

If P (B) = 0 then P (A|B) can be defined arbitrarily as probabilities that are conditioned on an
impossible event are not relevant. Note that P (A|B) is the probability of all outcomes common to
both events, normalized by dividing with the probability of the given event.
Two events A, B are said to be statistically mutually independent if
P (A B) = P (A)P (B).
If P (B) > 0 this translates to P (A|B) = P (A) and leads to the intuitive interpretation that the
knowledge of whether event B has occurred or not does not change our perception of event A. (Note
that if P (B) = 0 then B is mutually independent of any other event. Show that the same conclusion
holds if P (B) = 1.) Three events A, B, C are said to be mutually independent if any pair within
these events are mutually independent as defined above, and in addition
P (A B C) = P (A)P (B)P (C).
Independence of four and more events are defined recursively in the same fashion. Events A, B are
called conditionally mutually independent given event C if
P (A B|C) = P (A|C)P (B|C).

1.2

Random Variables

A random variable (rv) X is a real valued function defined on the sample space (that is, X : S 7 R),
so that
P (X x) = P (w S : X(w) x) for each x R.
That is, P (X x) is the probability of all outcomes for which X is no more than the value x. In
applications we are usually interested in the probability distribution function (PDF) of a rv, rather
than the explicit function that specifies the rv. The PDF of a real valued rv X is defined as
FX (x) = P (X x).
The PDF is a non-negative and non-decreasing function, furthermore
lim FX (x) = 1 and

lim FX (x) = 0.

Note that if P (X = x) > 0 for some x then FX () is discontinuous at the point x, and the size of the
discontinuity is exactly P (X = x).
While the PDF is a useful statistical description of the random variable X, it is sometimes more
convenient to work with the probability density function (pdf ), defined by
fX (x) =

d
FX (x),
dx

provided that FX () is differentiable. Such random variables are called continuous, and by definition
Z b
P (a X b) =
fX (x)dx
a

for any interval [a, b]. Since the probability of each event lies between 0 and 1, the above relation
implies that
Z
fX (x) = 1.
f (x) 0 and

Discrete random variables take values from a discrete set {x1 , x2 , }, which may be finite or
countably infinite. A discrete random variable X is usually specified by its probability mass function
(pmf ) (pk : k = 1, 2, ) where pk = P (X = xk ). Analogous to the continuous case
X
pk 0 and
pk = 1.
k

The expectation (or the mean) of a real-valued random variable X, E[X], is defined as1
 R
for continuous rv,
xfX (x)dx
E[X] = P
x
P
(X
=
x
)
for discrete rv.
k
k k
For any real valued function g : R 7 R on real numbers, g(X) is also a random variable. The
expectation of g(X) is given by
Z
E[g(X)] =
g(x)fX (x)dx.

2 , is the expectation of the squared displacement of X with respect


For example the variance of X, X
to its mean:
Z
2
X
= E[(X E[X])2 ] =
(x E[X])2 fX (x)dx.

An expectation integral need not be finite. For example, verify that the expectation integral of a
Cauchy rv, whose pdf is fX (x) = 1/((1 + x2 )), x R, does not converge.
Some fundamental discrete random variables are as follows.
Example 1.2.1 (Uniform rv) Discrete rv X is uniformly distributed on the interval [0, n] if
1
P (X = k) = n+1
for k = 0, 1, 2, , n (so pk = 0 for other values k). Check that E[X] = n2 and
2 = n(n + 2)/12.
X
1
For practical purposes, it is usually accepted that the two expressions yield the same result if fX is allowed to
include impulses located at the discontinuities of FX . This perspective is useful for rvs that display both continuous
and discrete characteristics.

Example 1.2.2 (Bernouilli rv) X has a Bernouilli distribution with parameter 0 p 1 if


2 = p(1 p).
P (X = 1) = p = 1 P (X = 0). Note that E[X] = p and X
Example 1.2.3 (Geometric rv) X has a geometric distribution with parameter 0 p 1 if
P (X = k) = p(1 p)k1 for each k = 1, 2, 3, . A geometric random variable is discrete as it takes
on integer values only. Its expectation is computed via the sum
E[X] =

kp(1 p)k1 = p1 .

k=1
2 = (1 p)/p2 . Verify that the geometric distribution also has the
A similar computation yields X
memoryless property.

Example 1.2.4 (Binomial rv) X has Binomial distribution (with parameter n, p) if




n

pk (1 p)nk if k = 0, 1, , n
k
P (X = k) =

0
else.
2 = np(1 p).
It can be computed that E[X] = np and X

Example 1.2.5 (Poisson rv) A Possion rv X is a discrete rv with pmf P (X = k) = e k /k!, k =


0, 1, 2, for some > 0. Verify that the mean and the variance of X are both equal to .
Some fundamental continuous random variables are:
Example 1.2.6 (Uniform rv) Continuous rv X is uniformly distributed on the interval [a, b] if
fX (x) =

1
for a x b
ba

2 = (b a)2 /12.
(and so fX (x) = 0 for other values of x). Check that E[X] = (a + b)/2 and X

Example 1.2.7 (Exponential rv) A r.v. X is said to have an exponential distribution with parameter > 0 if it has a pdf of the form fX (x) = ex . It is verified via integration by parts that
2 = 2 .
E[X] = 1 and X
Example 1.2.8 (Erlang or Gamma rv) X has Erlang distribution (with parameter n, ) if
( n n1
x
x
for x 0
(n1)! e
fX (x) =
0
else.
2 = n/2 .
In that case E[X] = n/ and X

Example 1.2.9 (Gaussian or Normal rv) X is Gaussian (with parameter , ) if


(x)2
1
fX (x) = e 22
2

< x < .

2 = 2 so parameters of a Gaussian rv readily specify its mean and variance.


Here E[X] = and X

The moment generating function (mgf) of X, X (s), is defined as


X (s) = E[esX ].
If the mgf is known then the nth moment of X can be obtained via
E[X n ] =

dn
X (s)|s=0
dsn

provided that the derivative exists.


Example 1.2.10 If X is an exponential random variable with parameter > 0 then X (s) =
/( s) for s < .
A k dimensional random vector is a random variable that takes values in Rk . The PDF and the
pdf of a random vector X = (X1 , , Xk ) are respectively defined as
FX1 , ,Xk (x1 , , xk ) = P (X1 x1 , , Xk xk )
dn FX1 , ,Xk (x1 , , xk )
fX1 , ,Xk (x1 , , xk ) =
.
dx1 dx2
The entries of a random vector are indeed random variables defined on the same probability space.
Two random variables X, Y are called mutually independent if
FX,Y (x, y) = FX (x)FY (y)
and therefore fX,Y (x, y) = fX (x)fY (y) for all real numbers x, y. Here FX (respectively fX ) is called
the marginal PDF (respectively the marginal pdf ) of the random variable X. Note the two different
definitions of mutual independence for events and random variables. The random variables X, Y, Z
are mutually independent if
FX,Y,Z (x, y, z) = FX (x)FY (y)FZ (z)
for all x, y, z. Mutual independence of four or more variables are defined similarly. If X1 , , Xk are
mutually independent then
E[X1 , , Xk ] = E[X1 ] E[Xk ],
however the converse of this statement is not true in general.
The fundamental random variables that are mentioned in the previous paragraphs have appealing
interpretations in terms of independent random variables that offer substantial help in identifying
their expectations and variances (and other statistics as well) without resorting to tedious calculations:
A Binomial (n, p) rv is the sum of n independent Bernouilli (p) rvs.
An Erlang (n, ) rv is the sum of n independent Exponential () rvs.
A Geometric (p) rv is the number of independent Bernouilli trials until the first success (i.e.
the first occurrance of 1).
A Poisson () rv is the limit of a Binomial (n, /n) rv as n .
5

1.3

Random Processes

A random process is a collection of random variables (Xt : t T ) indexed by a set T . A generic


realization, or outcome, (xt : t T ) of the random process is called a sample path. In most cases of
interest the index t denotes the time. For example T is a subset of integers (e.g. T = {0, 1, 2, }) for
discrete-time random processes, and a subset of the real numbers (e.g. T = [0, )) for continuoustime random processes.
In either case T has infinitely many members, which poses a technical difficulty in expressing the
distribution of the process (Xt : t T ) in terms of random vectors. The distribution of a random
process (Xt : t T ) is identified by specifying all finite-dimensional distributions, i.e. distributions
of the random vectors
(Xt1 , Xt2 , , Xtn )

for every positive integer n and all instants t1 , t2 , , tn T .

Although a complete account of all such distributions requires an enormous effort in general, interesting processes such as Markov processes have tractable specifications.

1.4

Homework 1

Question 1.1 Either prove correctness of, or provide a counter example for each of the following
claims: (var[X] denotes the variance of random variable X)
a) If X1 and X2 are independent then var[X1 + X2 ] = var[X1 ] + var[X2 ].
P
P
b) If X1 , X2 , , Xn are independent then var[ ni=1 Xi ] = ni=1 var[Xi ].

Question 1.2 Let X1 and X2 be independent exponential random variables with respective parameters 1 , 2 . What is the probability density function of Y = min{X1 , X2 }?
Question 1.3 Let X1 and X2 be independent Poisson random variables with respective means
1 , 2 > 0.
a) Find P (X1 + X2 = k) for non-negative integer k.
b) Find P (X1 = k|X1 + X2 = n) for non-negative integers k, n.

Question 1.4 (Required for ELE571; optional for ELE471) k balls are drawn at random
from an urn that contains n blue balls and m red balls (k m + n). Let X denote the number
of
P
red balls in that random selection. Compute the mean and the variance of X. (Hint: X = ki=1 Xi
where Xi = 1 if the ith ball is red, and it is 0 otherwise. Each Xi s is a Bernouiili rv but are they
independent?.)

Das könnte Ihnen auch gefallen