Beruflich Dokumente
Kultur Dokumente
Probability Review
1.1
The central notion of probability is a random experiment which may produce different results each
time it is performed. Such experiments have three natural components:
1) The experiment has a set of possible outcomes.
2) Outcomes are observed through measurements. An event is a subset of outcomes that yield
the same measurement.
3) Each event has a certain probability. An intuitive definition of the probability of an event is
its relative frequency, which is the fraction of experiments which result in that event when the
experiment is repeated a great many times.
The theory of probability captures these components via the triplet (S, F, P ), which is called a
probability system. Here S is the sample space and denotes the set of possible outcomes, F is the set
of possible events, which is a collection of subsets of the sample space S, and finally the probability
P : F 7 [0, 1] is a function whose argument is an event and whose value lies between 0 and 1. Hence
for each event E F, 0 P (E) 1 is called the probability of event E, that is, the probability
that the outcome belongs to E.
The events F and the probability function P should satisfy certain minimal conditions in order
to be consistent with our intuitive understanding of probability. Namely,
a1) S F
a2) If A F then Ac F
a3) The event set F is closed under set operations,
and
b1) P (S) = 1
b2) If events A1 , A2 , are mutually exclusive then P (A1 A2 ) = P (A1 ) + P (A2 ) + .
1
(Two events are called mutually exclusive if they cannot occur simultaneously, i.e. A B = .) All
other relations involving probabilities are derived from the above axioms. For example,
A B P (A) P (B)
P (Ac ) = 1 P (A)
[
X
P ( Ai )
P (Ai ).
i=1
i=1
We frequently seek the probability that the outcome has a certain property (i.e. a certain event
occurs) when we already know that it has another property (i.e. another event has occurred). The
conditional probability of event A given that event B has occurred is defined as
P (A|B) =
P (A B)
P (B)
If P (B) = 0 then P (A|B) can be defined arbitrarily as probabilities that are conditioned on an
impossible event are not relevant. Note that P (A|B) is the probability of all outcomes common to
both events, normalized by dividing with the probability of the given event.
Two events A, B are said to be statistically mutually independent if
P (A B) = P (A)P (B).
If P (B) > 0 this translates to P (A|B) = P (A) and leads to the intuitive interpretation that the
knowledge of whether event B has occurred or not does not change our perception of event A. (Note
that if P (B) = 0 then B is mutually independent of any other event. Show that the same conclusion
holds if P (B) = 1.) Three events A, B, C are said to be mutually independent if any pair within
these events are mutually independent as defined above, and in addition
P (A B C) = P (A)P (B)P (C).
Independence of four and more events are defined recursively in the same fashion. Events A, B are
called conditionally mutually independent given event C if
P (A B|C) = P (A|C)P (B|C).
1.2
Random Variables
A random variable (rv) X is a real valued function defined on the sample space (that is, X : S 7 R),
so that
P (X x) = P (w S : X(w) x) for each x R.
That is, P (X x) is the probability of all outcomes for which X is no more than the value x. In
applications we are usually interested in the probability distribution function (PDF) of a rv, rather
than the explicit function that specifies the rv. The PDF of a real valued rv X is defined as
FX (x) = P (X x).
The PDF is a non-negative and non-decreasing function, furthermore
lim FX (x) = 1 and
lim FX (x) = 0.
Note that if P (X = x) > 0 for some x then FX () is discontinuous at the point x, and the size of the
discontinuity is exactly P (X = x).
While the PDF is a useful statistical description of the random variable X, it is sometimes more
convenient to work with the probability density function (pdf ), defined by
fX (x) =
d
FX (x),
dx
provided that FX () is differentiable. Such random variables are called continuous, and by definition
Z b
P (a X b) =
fX (x)dx
a
for any interval [a, b]. Since the probability of each event lies between 0 and 1, the above relation
implies that
Z
fX (x) = 1.
f (x) 0 and
Discrete random variables take values from a discrete set {x1 , x2 , }, which may be finite or
countably infinite. A discrete random variable X is usually specified by its probability mass function
(pmf ) (pk : k = 1, 2, ) where pk = P (X = xk ). Analogous to the continuous case
X
pk 0 and
pk = 1.
k
The expectation (or the mean) of a real-valued random variable X, E[X], is defined as1
R
for continuous rv,
xfX (x)dx
E[X] = P
x
P
(X
=
x
)
for discrete rv.
k
k k
For any real valued function g : R 7 R on real numbers, g(X) is also a random variable. The
expectation of g(X) is given by
Z
E[g(X)] =
g(x)fX (x)dx.
An expectation integral need not be finite. For example, verify that the expectation integral of a
Cauchy rv, whose pdf is fX (x) = 1/((1 + x2 )), x R, does not converge.
Some fundamental discrete random variables are as follows.
Example 1.2.1 (Uniform rv) Discrete rv X is uniformly distributed on the interval [0, n] if
1
P (X = k) = n+1
for k = 0, 1, 2, , n (so pk = 0 for other values k). Check that E[X] = n2 and
2 = n(n + 2)/12.
X
1
For practical purposes, it is usually accepted that the two expressions yield the same result if fX is allowed to
include impulses located at the discontinuities of FX . This perspective is useful for rvs that display both continuous
and discrete characteristics.
kp(1 p)k1 = p1 .
k=1
2 = (1 p)/p2 . Verify that the geometric distribution also has the
A similar computation yields X
memoryless property.
pk (1 p)nk if k = 0, 1, , n
k
P (X = k) =
0
else.
2 = np(1 p).
It can be computed that E[X] = np and X
1
for a x b
ba
2 = (b a)2 /12.
(and so fX (x) = 0 for other values of x). Check that E[X] = (a + b)/2 and X
Example 1.2.7 (Exponential rv) A r.v. X is said to have an exponential distribution with parameter > 0 if it has a pdf of the form fX (x) = ex . It is verified via integration by parts that
2 = 2 .
E[X] = 1 and X
Example 1.2.8 (Erlang or Gamma rv) X has Erlang distribution (with parameter n, ) if
( n n1
x
x
for x 0
(n1)! e
fX (x) =
0
else.
2 = n/2 .
In that case E[X] = n/ and X
< x < .
dn
X (s)|s=0
dsn
1.3
Random Processes
Although a complete account of all such distributions requires an enormous effort in general, interesting processes such as Markov processes have tractable specifications.
1.4
Homework 1
Question 1.1 Either prove correctness of, or provide a counter example for each of the following
claims: (var[X] denotes the variance of random variable X)
a) If X1 and X2 are independent then var[X1 + X2 ] = var[X1 ] + var[X2 ].
P
P
b) If X1 , X2 , , Xn are independent then var[ ni=1 Xi ] = ni=1 var[Xi ].
Question 1.2 Let X1 and X2 be independent exponential random variables with respective parameters 1 , 2 . What is the probability density function of Y = min{X1 , X2 }?
Question 1.3 Let X1 and X2 be independent Poisson random variables with respective means
1 , 2 > 0.
a) Find P (X1 + X2 = k) for non-negative integer k.
b) Find P (X1 = k|X1 + X2 = n) for non-negative integers k, n.
Question 1.4 (Required for ELE571; optional for ELE471) k balls are drawn at random
from an urn that contains n blue balls and m red balls (k m + n). Let X denote the number
of
P
red balls in that random selection. Compute the mean and the variance of X. (Hint: X = ki=1 Xi
where Xi = 1 if the ith ball is red, and it is 0 otherwise. Each Xi s is a Bernouiili rv but are they
independent?.)