Beruflich Dokumente
Kultur Dokumente
S = {(t, t, t), (t, t, h), (t, h, t), (h, t, t), (t, h, h), (h, t, h), (h, h, t), (h, h, h)}.
The random variable X counts the number of heads. Thus if w = (t, t, h) occurs then X(t, t, h) = 1.
Assigning probabilities to value of random variables is done by connecting probability associates with
outcomes of an experiment with the values of the random variables.
Example: (continued) Assume all elements in S are equally likely. Then X = 1 corresponds to the set
{(t, t, h), (t, h, t), (h, t, t)} thus
Let B = {2, 3} by X ∈ B we mean the subset A = {(t, h, h), (h, t, h), (h, h, t), (h, h, h)} of S such that
X(w) ∈ B for all w ∈ A so
4
P (X ∈ B) = P (A) = = 0.5.
8
By X = j we mean the set A = {w ∈ S : X(w) = j}. So X = 0 is the set A = {(t, t, t)} and P (X = 0) =
P (A) = 1/8, etc.
Definition: A function X : S → < mapping elements of the sample space into the real numbers will be
called a random variable.
Remark: A random variable is a deterministic function. What is random is the selection of w ∈ S.
Example: Suppose you select a point at p random in the unit circle {(x, y) : x2 + y 2 ≤ 1}. Let D be the
distance of the selected point to the origin x2 + y 2 . Then D is a random variable.
p(xi ) = P (X = xi ) = P (w ∈ S : X(w) = xi )
is called the probability mass function. Notice that p(xi ) ≥ 0 and that
∞
X X
p(xi ) = P (w ∈ S : X(w) = xi ) = P (S) = 1.
i=1 i
Convenient notation (not used in book): pX (a) = P (X = a). This notation is useful when you want to
emphasize that you are refereing to random variable X. When there is no danger of confusion we drop the
subscript.
Example: (continued) p(0) = 1/8, p(1) = 3/8, p(2) = 3/8, p(3) = 1/8.
The probability mass function summarizes all probability information associated with the random vari-
able.
1
Example: (continued) P (X ≤ 1) = p(0) + p(1) = 0.5. P (1 ≤ X ≤ 2) = p(1) + p(2) = 3/4, P (X > 1) =
p(2) + p(3) = 0.5.
Although the probability mass function contains all information w.r.t. a discrete random variable, the
cumulative distribution function is frequently used.
Definition: The cumulative distribution function (cdf) F of a random variable X is given by
FX (a) = P (X ≤ a).
When it is clear that we are taking about the random variable X we may simply write F (a). The notation
X ∼ F signifies that F is the distribution of the random variable X.
If X is a discrete random variable we can write
X X
F (a) = p(X = t) = p(t).
t≤a t≤a
1. F (−∞) = P (X ≤ −∞) = 0,
2. F (∞) = P (X ≤ ∞) = 1,
3. If a < b then F (a) ≤ F (b).
Any function satisfying the above three properties is the cdf of a random variable.
Facts:
1. P (a < X ≤ b) = F (b) − F (a).
2. P (a ≤ X ≤ b) = F (b) − F (a) + p(a).
3. P (a < X < b) = F (b) − F (a) − p(b).
4. P (a ≤ X < b) = F (b) − F (a) − p(b) + p(a).
Example: (continued) P (1 ≤ X ≤ 3) = F (3) − F (1) + p(1) = 1 − 4/8 + 3/8 = 7/8.
2
Example: Before exploring the properties of the cdf of continuous random variables, let us work out the
cdf of the distance to the origin of a point selected at random within the unit circle. Clearly FD (d) = 1 for
all d > 1 and FD (d) = 0 for all d < 0. What about values of d ∈ [0, 1]?
p
FD (d) = P r( x2 + y 2 ≤ d) (1)
= P r(x2 + y 2 ≤ d2 ) (2)
= πd2 /π (3)
= d2 (4)
Consequently
Z b
P (a < X ≤ b) = F (b) − F (a) = f (x)dx.
a
3
• f (x) ≥ 0, and
R∞
• −∞ f (x)dx = 1.
Since
P (x − δ/2 < X ≤ x + δ/2) = f (x)δ,
the value of f (x) is related to the probability that X takes values close to x. For example, if f (x) = 2f (y)
it is almost twice as likely for X to fall in a small neighborhood of x than of y.
Example: Notice that for random variable D, f (1/4) = 1/2 while at f (3/4) = 3/2, so it is 3 times more
likely that you end at a distance of 3/4 than that you are of ending up at a distance of 1/4.
To summarize, for continuous random variables there is a probability density function f (x) ≥ 0 such that
Z
P (X ∈ A) = f (u)du.
A
Given the cdf F (x) we can obtain the density function f (x) by differentiation. Conversely, given the
density function f (x) we can obtain the cdf F (x) by integration.
Properties of cdf: As with the case of discrete random variables:
1. F (−∞) = P (X ≤ −∞) = 0,
2. F (∞) = P (X ≤ ∞) = 1,
3. If a < b then F (a) ≤ F (b).
However, the calculation of probabilities over intervals is simpler:
1. P (a < X ≤ b) = F (b) − F (a).
2. P (a ≤ X ≤ b) = F (b) − F (a).
3. P (a < X < b) = F (b) − F (a).
4. P (a ≤ X < b) = F (b) − F (a).
Example: f (t) = 0 on t < 0, f (t) = t/2 on 0 < t ≤ 1, f (t) = 0.75 on 1 < t ≤ 2 and f (t) = 0 on t > 2.
Then F (a) = a2 /4 on 0 < a ≤ 1, F (a) = .25 + .75(a − 1) on 1 < a ≤ 2, and F (a) = 1 on a > 2.
The cdf is convenient for probability calculations:
P (.1 < X ≤ 1.2) = F (1.2) − F (.1) = .25 + .15 − .01/4 = .40 − .0025 = .3975.
4
2.1 Joint Distributions of Discrete Random Variables
Example: Toss three fair coins and let X number of heads in first two tosses and Y number of heads
in first three tosses. Give an explicit mapping to obtain p(0, 0) = p(0, 1) = p(2, 2) = p(2, 3) = 1/8 and
p(1, 1) = p(1, 2) = 1/4. Let A = {(0, 0), (1, 1), (2, 2)} then p(A) = 0.5.
What is P (X = 1)? Well X = 1 is equivalent to the set {(x, y) : (1, 1), (1, 2)} so P (X = 1) = 0.5.
p(x, y) ≥ 0
XX
p(x, y) = 1.
x y
X
P ((X, Y ) ∈ A) = p(x, y).
(x,y)∈A
p(x, y)
pX|Y (x|y) = P (X = x|Y = y) =
pY (y)
5
2.2 Joint Distribution of Continuous Random Variables
If X and Y are continuous random variables, there exists a joint density function f (x, y) with the following
three properties:
f (x, y) ≥ 0,
Z Z
f (x, y)dxdy = 1
Example, f (x, y) = 1 on the unit square. What is the probability that (X, Y ) is in the set [0, .5] × [.5, .8]?
Now, Z ·Z ¸
x ∞
P (X ≤ x) = F (x, ∞) = f (u, v)dv du.
−∞ −∞
So it follows that Z ∞
fX (x) = f (x, v)dv.
−∞
Consequently, the density of X is obtained by integrating out over the second variable. Similarly, the
density of Y is obtained by integrating the first variable.
Z ∞
fY (y) = f (u, y)du.
−∞
Example: f (x, y) = 12/7 × (x2 + xy) on the unit square. Compute fX and fY
Integrating we obtain: fX (x) = 12/7 × (x2 + x/2) and fY (y) = (4 + 6y)/7.
Q. How can we compute P (x1 < X < x2 , y1 < Y < y2 ) from F (·, ·)?
A.
F (x2 , y2 ) − F (x1 , y2 ) − F (x2 , y1 ) + F (x1 , y1 ).
6
2.2.2 Conditional Distribution: Continuous Case
In the continuous case the conditional density of X given Y = y is defined as
f (x, y)
fX|Y (x|y) =
fY (y)
f (x, y) 1
fX|Y (x|y) = =
fY (y) y
on 0 ≤ x ≤ y.
Notice that f (x, y) = fX|Y (x|y)fY (y) and that integrating over y we can obtain fX (x).
Notice that this definition is valid for both discrete and continuous random variables.
If X and Y are continuous and independent then