Sie sind auf Seite 1von 35

Course 003: Basic Econometrics, 2012-2013

Topic 2: Random Variables and Probability Distributions

Rohini Somanathan
Course 003, 2014-2015

Page 0

Rohini Somanathan

'

Sample spaces and random variables

The outcomes of some experiments inherently take the form of real numbers:
crop yields with the application of a new type of fertiliser
students scores on an exam
miles per litre of an automobile
Other experiments have a sample space that is not inherently a subset of Euclidean space
Outcomes from a series of coin tosses
The character of a politician
The modes of transport taken by a citys population
The degree of satisfaction respondents report for a service provider -patients in a
hospital may be asked whether they are very satisfied, satisfied or dissatisfied with the
quality of treatment. Our sample space would consist of arrays of the form
(VS, S, S, DS, ....)
The caste composition of elected politicians.
The gender composition of children attending school.
A random variable is a function that assigns a real number to each possible outcome s S.

&
Page 1

%
Rohini Somanathan

'

Course 003: Basic Econometrics, 2012-2013

Random variables
Definition: Let (S, S, ) be a probability space. If X : S < is a real-valued function having as
its domain the elements of S, then X is called a random variable.
A random variable is therefore a real-valued function defined on the space S. Typically x is
used to denote this image value, i.e. x = X(s).
If the outcomes of an experiment are inherently real numbers, they are directly
interpretable as values of a random variable, and we can think of X as the identity function,
so X(s) = s.
We choose random variables based on what we are interested in getting out of the
experiment. For example, we may be interested in the number of students passing an exam,
and not the identities of those who pass. A random variable would assign each element in
the sample space a number corresponding to the number of passes associated with that
outcome.
We therefore begin with a probability space (S, S, ) and arrive at an induced probability
space (R(X), B, PX (A)).
How exactly do we arrive at the function Px (.)? As long as every set A R(X) is associated
with an event in our original sample space S, Px (A) is just the probability assigned to that
event by P.

&
Page 2

%
Rohini Somanathan

'

Course 003: Basic Econometrics, 2012-2013

Random variables..examples
1. Tossing a coin ten times.
The sample space consists of the 210 possible sequences of heads and tails.
There are many different random variables that could be associated with this
experiment: X1 could be the number of heads, X2 the longest run of heads divided by
the longest run of tails, X3 the number of times we get two heads immediately before a
tail, etc...
For s = HT T HHHHT T H, what are the values of these random variables?
2. Choosing a point in a rectangle within a plane
An experiment involves choosing a point s = (x, y) at random from the rectangle
S = {(x, y) : 0 x 2, 0 y 1/2}
The random variable X could be the xcoordinate of the point and an event is X taking
values in [1, 2]
Another random variable Z would be the distance of the point from the origin,
p
Z(s) = x2 + y2 .
3. Heights, weights, distances, temperature, scores, incomes... In these cases, we can have
X(s) = s since these are already expressed as real numbers.

&
Page 3

%
Rohini Somanathan

'

Induced probability spaces..examples

Lets look at some examples of how we arrive at our probability measure PX (A).
A coin is tossed once and were interested in the number of heads, X. The probability
assigned to the set A = {1} in our new space is just the probability associated with one head
in our original space. So Pr(X = x) = 12 , x {0, 1}.
With two tosses, the probability attached to the set A = {1} is the sum of the probabilities
associated with the disjoint sets {H, T } and {T , H} whose union forms this event. In this case

Pr(X = x) = x2 ( 21 )2 x {0, 1, 2}
Now consider a sequence of flips of an unbiased coin and our random variable X is the
number of flips needed for the first head. We now have
 x1    x
1
1
1
=
Pr(X = x) = f(x) =
2
2
2

x = 1, 2, 3 . . .

Is this a valid probability measure?

How is the nature of the sample space in the first two coin-flipping examples is different
from the third?
In all these cases we have a discrete random variable .

&
Page 4

%
Rohini Somanathan

'

The distribution function

Once weve assigned real numbers to all the subsets of our sample space S that are of
interest, we can restrict our attention to the probabilities associated with the occurrence of
sets of real numbers.
Consider the set A = (, x]
Now P(A) = Pr(X x)
F(x) is used to denote the probability Pr(X x) and is called the distribution function of x
Definition: The distribution function F of a random variable X is a function defined for each
real number x as follows:
F(x) = P(X x) for < x <
If there are a finite number of elements w in A, this probability can be computed as
F(x) =

f(w)

wx

In this case, the distribution function will be a step function, jumping at all points x in
R(X) which are assigned positive probability.
Consider the experiment of tossing two fair coins. Describe the probability space induced
by the random variable X, the number of heads, and derive the distribution function of X.

&
Page 5

%
Rohini Somanathan

'

Course 003: Basic Econometrics, 2012-2013

Discrete distributions
Definition: A random variable X has a discrete distribution if X can take only a finite number k
of different values x1 , x2 , . . . , xK or an infinite sequence of different values x1 , x2 , . . . .
The function f(x) = P(X = x) is the probability function of x. We define it to be f(x) for all
values x in our sample space R(X) and zero elsewhere.
If X has a discrete distribution, the probability of any subset A of the real line is given by
P
P(X A) =
f(xi ).
xi A

Examples:
1. The discrete uniform distribution: picking one of the first k non-negative integers at
random

1
for x = 1, 2, ...k,
f(x) = k
0
otherwise
2. The binomial distribution: the probability of x successes in n trials.

n px qnx
for x = 0, 1, 2, ...n,
x
f(x) =
0
otherwise
Derive the distribution functions for each of these.

&
Page 6

%
Rohini Somanathan

'

Course 003: Basic Econometrics, 2012-2013

Continuous distributions
The sample space associated with our random variable often has an infinite number of points.

Example: A point is randomly selected inside a circle of unit radius with origin (0, 0) where the probability
assigned to being in a set A S is P(A) = areaof A and X is the distance of the selected point from the
origin. In this case F(x) = Pr(X x) = area of circle with radius x , so the distribution function of X is given by

0
F(x) =

for x < 0

x2

0x<1
1x

Definition: A random variable X has a continuous distribution if there exists a nonnegative

function f defined on the real line, such that for any interval A,
Z
P(X A) =
f(x)dx
A

The function f is called the probability density function or p.d.f. of X and must satisfy the
conditions below
1. f(x) 0

2.

f(x)dx = 1

What is f(x) for the above example? How can you use this to compute P( 14 < X 21 )? How would
you use F(x) instead?

&
Page 7

%
Rohini Somanathan

'

Course 003: Basic Econometrics, 2012-2013

Continuous distributions..examples
1. The uniform distribution on an interval: Suppose a and b are two real numbers with a < b.
A point x is selected from the interval S = {x : a x b} and the probability that it
belongs to any subinterval of S is proportional to the length of that subinterval. It follows
that the p.d.f. must be constant on S and zero outside it:

1
for a x b
f(x) = ba
0
otherwise
Notice that the value of the p.d.f is the reciprocal of the length of the interval, these values
can be greater than one, and the assignment of probabilities does not depend on whether
the distribution is defined over the closed interval or the open interval (a, b)
2. Unbounded random variables: It is sometimes convenient to define a p.d.f over unbounded
sets, because such functions may be easier to work with and may approximate the actual
distribution of a random variable quite well. An example is:

0
for x 0
f(x) =
1 2
for x > 0
(1+x)

3. Unbounded densities: The following function is unbounded around zero but still represents
a valid density.

2 x 13
for 0 < x < 1
f(x) = 3
0
otherwise

&
Page 8

%
Rohini Somanathan

'

Course 003: Basic Econometrics, 2012-2013

Mixed distributions
Often the process of collecting or recording data leads to censoring, and instead of obtaining
a sample from a continuous distribution, we obtain one from a mixed distribution.
Examples:
The weight of an object is a continuous random variable, but our weighing scale only
records weights up to a certain level.
Households with very high incomes often underreport their income, for incomes above a
certain level (say \$250,000), surveys often club all households together - this variable is
therefore top-censored.
In each of these examples, we can derive the probability distribution for the new random
variable, given the distribution for the continuous variable. In the example weve just
considered:

0
for x 0
f(x) =
1 2
for x > 0
(1+x)
suppose we record X = 3 for all values of X 3 The p.f. for our new random variable Y is
given by the same p.f. for values less than 3 and by 14 for Y=3.
Some variables, such as the number of hours worked per week have a mixed distribution in
the population, with mass points at 0 and 40.

&
Page 9

%
Rohini Somanathan

'

Properties of the distribution function

Recall that the distribution function or cumulative distribution function (c.d.f ) for a random
variable X is defined as
F(x) = P(X x) for < x < .
It follows that for any random variable (discrete, continuous or mixed), the domain of F is the
real line and the values of F(x) must lie in [0, 1]. We can also establish that all distribution
functions have the following three properties:
1. F(x) is a nondecreasing function of x, i.e. if x1 < x2 then F(x1 ) < F(x2 ).
( The occurrence of the event {X x1 } implies the occurrence of {X x2 } so P(X x1 ) P(X x2 ))

2. limx F(x) = 0 and limx F(x) = 1

( {x : x } is the entire sample space and {x : x } is the null set. )

3. F(x) is right-continuous, i.e. F(x) = F(x+ ) at every point x, where F(x+ ) is the right hand
limit of F(x).
( for discrete random variables, there will be a jump at values that are taken with positive probability)

&
Page 10

%
Rohini Somanathan

'

Computing probabilities using the distribution function

RESULT 1: For any given value of x, P(X > x) = 1 F(x)
RESULT 2: For any values x1 and x2 where x1 < x2 , P(x1 < X x2 ) = F(x2 ) F(x1 )
Proof: Let A be the event X x1 and B be the event X x2 . B can be written as the union of two events
B = (A B) (Ac B). Since A B, P(A B) = P(A). The event were interested in is Ac B whose probability
is given by P(B) P(A) or P(x1 < X x2 ) = P(X x2 ) P(X x1 ). Now apply the definition of a d.f.

RESULT 3: For any given value x

P(X < x) = F(x )
RESULT 4: For any given value x
P(X = x) = F(x+ ) F(x )
The distribution function of a continuous random variable will be continuous and since
Rx
F(x) =
f(t)dt,

F0 (x) = f(x)
For discrete and mixed discrete-continous random variables F(x) will exhibit a countable number
of discontinuities at jump points reflecting the assignment of positive probabilities to a countable
number of events.

&
Page 11

%
Rohini Somanathan

'

Examples of distribution functions

Consider the experiment of rolling a die or tossing a fair coin, with X in the first case being
the number of dots and in the second case the number of heads. Graph the distribution
function of X in each of these cases.
What about the experiment of picking a point in the unit interval [0, 1] with X as the
distance from the origin?
3.3 The Cumulative
Distribution
Function
109 distribution function?
What type of probability function
corresponds
to the
following

le of a

F(x)
1
z3
z2

z1
z0

x1

x2

x3

x4

Section 1.10. Similarly, the fact that Pr(X x) approaches 1 as x follows from
&
Exercise 12 in Sec. 1.10.

Page 12

%
Rohini Somanathan

'

The quantile function

The distribution function X gives us the probability that X x for all real numbers x
Suppose we are given a probability p and want to know the value of x corresponding to this
value of the distribution function.
If F is a one-to-one function, then it has an inverse and the value we are looking for is given
by F1 (p)
Examples: median income would be found by F1 ( 21 ) where F is the distribution function of
income.
Definition: When the distribution function of a random variable X is continuous and one-to-one
over the whole set of possible values of X, we call the function F1 the quantile function of X. The
value of F1 (p) is called the pth quantile of X or the 100 pth percentile of X for each 0 < p < 1.
Example:

xa
ba

over this interval, 0

for x a and 1 for x > b. Given a value p, we simply solve for the pth quantile:
x = pb + (1 p)a. Compute this for p = .5, .25, .9, . . .

&
Page 13

%
Rohini Somanathan

'

Examples: computing quantiles, etc.

1. The p.d.f of a random variable is given by:

1x
f(x) = 8
0

for 0 x 4
otherwise

(a) P(X t) =
(b) P(X t) =

1
4
1
2

cx2
f(x) =
0

for 1 x 2
otherwise

Find the value of the constant c and Pr(X > 23 )

&
Page 14

%
Rohini Somanathan

'

Course 003: Basic Econometrics, 2012-2013

Bivariate distributions
Social scientists are typically interested in the manner in which multiple attributes of
people and the societies they live in. The object of interest is a multivariate probability
distribution. examples: education and earnings, days ill per month and age, sex-ratios and
areas under rice cultivation)
This involves dealing with the joint distribution of two or more random variables. Bivariate
distributions attach probabilities to events that are defined by values taken by two random
variables (say X and Y).
Values taken by these random variables are now ordered pairs, (xi , yi ) and an event A is a
set of such values.
If both X and Y are discrete random variables, the probability function
P
f(x, y) = P(X = x and Y = y) and P(X, Y) A =
f(xi , yi )
(xi ,yi )A

&
Page 15

%
Rohini Somanathan

'

Representing a discrete bivariate distribution

If both X and Y are discrete, this function takes only a finite number of values.
If there are only a small number of these values, they can be usefully presented in a table.
The table below could represent the probabilities of receiving different levels of education.
X is the highest level of education and Y is gender:
education

gender

male

female

none

.05

.2

primary

.25

.1

middle

.15

.04

high

.1

.03

senior secondary

.03

.02

.02

.01

What are some features of a table like this one? In particular, how would we obtain
probabilities associated with the following events:
receiving no education
becoming a female graduate
completing primary school
What else do you learn from the table about the population of interest?

&
Page 16

%
Rohini Somanathan

'

Continuous bivariate distributions

We can extend our definition of a continuous univariate distribution to the bivariate case:
Definition: Two random variables X and Y have a continuous joint distribution if there exists a
nonnegative function f defined over the xy-plane such that for any subset A of the plane
Z Z
P[(X, Y) A] =
f(x, y)dxdy
A

f is now called the joint probability density function and must satisfy
1. f(x, y) 0 for < x < and < y <
2.

f(x, y)dxdy = 1

Example 1: Given the following joint density function on X and Y, well calculate P(X Y)

f(x, y) =

cx2 y

for x2 y 1

otherwise

First find c to make this a valid joint density (notice the limits of integration here)-it will turn out to be 21/4.
3 .
Then integrate the density over Y (x2 , x) and X (1, 1). Now using this density, P(X Y) = 20
Example 2: A point (X, Y) is selected at random from inside the circle x2 + y2 9. Determine the joint density
function, f(x, y).

&
Page 17

%
Rohini Somanathan

'

Bivariate distribution functions

Definition: The joint distribution function of two random variables X and Y is defined as the
function F such that for all values of x and y ( < x < and < y < )
F(x, y) = P(X x and Y y)
The probability that (X, Y) will lie in a specified rectangle in the xy-plane is given by
Pr(a < X b and c < Y d) = F(b, d) F(a, d) F(b, c) + F(a, c)
Note: The distinction between weak and strict inequalities is important when points on the boundary of the
rectangle occur with positive probability.

The distribution functions of X and Y can be derived as:

Pr(X x) = F1 (x) = lim F(x, y) and Pr(Y y) = F2 (y) = lim F(x, y)
y

If F(x, y) is continuously differentiable in both its arguments, the joint density is derived as:
f(x, y) =

2 F(x, y)
xy

and given the density, we can integrate w.r.t x and y over the appropriate limits to get the
distribution function.

Example:

1 xy(x + y), derive the distribution functions of

Suppose that, for x and y [0, 2], we have F(x, y) = 16

X and Y and their joint density. Notice the (x, y) range over which F(x, y) is strictly increasing.

&
Page 18

%
Rohini Somanathan

'

Course 003: Basic Econometrics, 2012-2013

Marginal distributions
A distribution of X derived from the joint distribution of X and Y is known as the marginal
distribution of X. For a discrete random variable:
f1 (x) = P(X = x) =

P(X = x and Y = y) =

f(x, y)

and analogously
f2 (y) = P(Y = y) =

P(X = x and Y = y) =

f(x, y)

For a continuous joint density f(x, y), the marginal densities for X and Y are given by:

f(x, y)dx

f(x, y)dy and f2 (y) =

f1 (x) =

Go back to our table representing the joint distribution of gender and education and find
the marginal distribution of education.
Can one construct the joint distribution from one of the marginal distributions?

&
Page 19

%
Rohini Somanathan

'

Independent random variables

Definition: The two random variables X and Y are independent if, for any two sets A and B of
real numbers,
P(X A and Y B) = P(X A)P(Y B)
In other words, if A is an event whose occurrence depends only values taken by X and Bs
occurrence depends only on values taken by Y, then the random variables X and Y are
independent only if the events A and B are independent, for all such events A and B.
The condition for independence can be alternatively stated in terms of the joint and
marginal distribution functions of X and Y by letting the sets A and B be the intervals
(, x) and (, y) respectively.
F(x, y) = F1 (x)F2 (y)
For discrete distributions, we simply define the sets A and B as the points x and y and
require f(x, y) = f1 (x)f2 (y).
In terms of the density functions, we say that X and Y are independent if it is possible to
choose functions f1 and f2 such that the following factorization holds for
( < x < and < y < )
f(x, y) = f1 (x)f2 (y)

&
Page 20

%
Rohini Somanathan

'

Independent random variables..examples

There are two independent measurements X and Y of rainfall at a certain location:

2x
for 0 x 1
g(x) =
0
otherwise
Find the probability that X + Y 1.
The joint density 4xy is got by multiplying the marginal densities because these variables
are independent. The required probability of 61 is then obtained by integrating over
y (0, 1 x) and x (0, 1)
How might we use a table of probabilities to determine whether two random variables are
independent?
Given the following density, can we tell whether the variables X and Y are independent?

ke(x+2y)
for x 0 and y 0
f(x, y) =
0
otherwise
Notice that we can factorize the joint density as the product of k1 ex and k2 e2y where
k1 k2 = k. To obtain the marginal densities of X and Y, we multiply these functions by
appropriate constants which make them integrate to unity. This gives us
f1 (x) = ex for x 0 and f2 (y) = 2e2y for y 0

&
Page 21

%
Rohini Somanathan

'

Dependent random variables..examples

Given the following density densities, lets see why the variables X and Y are dependent:
1.
f(x, y) =

x + y

for 0 < x < 1 and 0 < y < 1

otherwise

Notice that we cannot factorize the joint density as the product of a non-negative function
of x and another non-negative function of y. Computing the marginals gives us
f1 (x) = x +

1
1
for 0 < x < 1 and f2 (y) = y + for 0 < y < 1
2
2

so the product of the marginals is not equal to the joint density.

2. Suppose we have
f(x, y) =

kx2 y2

for x2 + y2 1

otherwise

In this case the possible values X can take depend on Y and therefore, even though the joint
density can be factorized, the same factorization cannot work for all values of (x, y).
More generally, whenever the space of positive probability density of X and Y is bounded by a
curve, rather than a rectangle, the two random variables are dependent.

&
Page 22

%
Rohini Somanathan

'

Dependent random variables..a result

Whenever the space of positive probability density of X and Y is bounded by a curve, rather
than a rectangle, the two random variables are dependent. If, on the other hand, the support of
f(x, y) is a rectangle and the joint density is of the form f(x, y) = kg(x)h(y), then X and Y are
independent.
Proof: For the latter part of the result, suppose the support of f(x, y) is given by the rectangle abcd where
a < b and c < d and a x b and c y d. Now the joint density f(x, y) can be written as
1
1
k1 g(x)k2 h(y) where k1 = b
and k2 = d
.
R

g(x)dx

h(y)dy

c
d
R
c

b
R
a

k1 g(x)dx, whose product gives us the joint

density.

Now to show that if the support is not a rectangle, the variables are dependent: Start with a point (x, y) outside
the domain where f(x, y) > 0. If x and y are independent, we have f(x, y) = f1 (x)f2 (y), so one of these must be zero.
Now as we move due south and enter the set where f(x, y) > 0, our value of x has not changed, so it could not be
that f1 (x) was zero at the original point. Similarly, if we move west, y is unchanged so it could not be that f2 (y)
was zero at the original point. So we have a contradiction.

&
Page 23

%
Rohini Somanathan

'

Course 003: Basic Econometrics, 2012-2013

Conditional distributions
Definition: Consider two discrete random variables X and Y with a joint probability function
f(x, y) and marginal probability functions f1 (x) and f2 (y). After the value Y = y has been
observed, we can write the the probability that X = x using our definition of conditional
probability:
f(x, y)
P(X = x and Y = y)
=
P(X = x|Y = y) =
Pr(Y = y)
f2 (y)
g1 (x|y) =

f(x,y)
f2 (y)

is called the conditional probability function of X given that Y = y. Notice that:

1. for each fixed value of y, g1 (x|y) is a probability function over all possible values of X
because it is non-negative and
X

g1 (x|y) =

1 X
1
f(x, y) =
f2 (y) = 1
f2 (y) x
f2 (y)

2. conditional probabilities are proportional to joint probabilities because they just divide
these by a constant.
We cannot use the definition of condition probability to derive the conditional density for
continuous random variables because the probability that Y takes any particular value y is zero.
We simply define the conditional probability density function of X given Y = y as
g1 (x|y) =

f(x, y)
for ( < x < and < y < )
f2 (y)

&
Page 24

%
Rohini Somanathan

'

Conditional versus joint densities

f(x,y)

The numerator in g1 (x|y) = f (y) is a section of the surface representing the joint density and
2
the denominator is the constant by which we need to divide the numerator to get a valid density
(which integrates to unity)

&
Page 25

%
Rohini Somanathan

'

Deriving conditional distributions... the discrete case

For the education-gender example, we can find the distribution of educational achievement
conditional on being male, the distribution of gender conditional on completing college, or any
other conditional distribution we are interested in :
education

gender

male

female

f(education|gender=male)

none

.05

.2

.08

primary

.25

.1

.42

middle

.15

.04

.25

high

.1

.03

.17

senior secondary

.03

.02

.05

.02

.01

.03

.67

.33

&
Page 26

%
Rohini Somanathan

'

Deriving conditional distributions... the continuous case

For the continuous joint distribution weve looked at before

cx2 y
for x2 y 1
f(x, y) =
0
otherwise
the marginal distribution of X is given by
Z1

21 2
21 2
x ydy =
x (1 x4 )
4
8

x2

f(x,y)
f1 (x) :

g2 (y|x) =

2y
1x4

for x2 y 1
otherwise

If X = 12 , we can compute P(Y 41 |X = 12 ) = 1 and P(Y 34 |X = 12 ) =

R1
3
4

g2 (y| 21 ) =

7
15

&
Page 27

%
Rohini Somanathan

'

Construction of the joint distribution

We can use conditional and marginal distributions to arrive at a joint distribution:
f(x, y) = g1 (x|y)f2 (y) = g2 (y|x)f1 (x)

(1)

Notice that the conditional distribution is not defined for a value y0 at which f2 (y) = 0, but this is irrelevant
because at any such value f(x, y0 ) = 0.
Example: X is first chosen from a uniform distribution on (0, 1) and then Y is chosen from a uniform distribution
on (x, 1). The marginal distribution of X is straightforward:

f1 (x) =

otherwise



1
1x

otherwise

1
1x

otherwise

g2 (y|x) =


f(x, y) =

and the marginal distribution for Y can now be derived as:

f2 (y) =

y
Z

f(x, y)dx =

1
dx = log(1 y) for 0 < y < 1
1x

&
Page 28

%
Rohini Somanathan

'

Course 003: Basic Econometrics, 2012-2013

Multivariate distributions
Our definitions of joint, conditional and marginal distributions can be easily extended to an
arbitrary finite number of random variables. Such a distribution is now called a multivariate
distributon.
The joint distribution function is defined as the function F whose value at any point
(x1 , x2 , . . . xn ) <n is given by:
F(x1 , . . . , xn ) = P(X1 x1 , X2 x2 , . . . , Xn xn )
For a discrete joint distribution, the probability function at any point (x1 , x2 , . . . xn ) <n is given
by:
f(x1 , . . . , xn ) = P(X1 = x1 , X2 = x2 , . . . , Xn = xn )
(2)
and the random variables X1 , . . . , Xn have a continuous joint distribution if there is a nonnegative
function f defined on <n such that for any subset A <n ,
Z
Z
P[(X1 , . . . , Xn ) A] =
f(x1 , . . . , xn )dx1 . . . dxn
(3)
...A ...

The marginal distribution of any single random variable Xi can now be derived by integrating
over the other variables
Z
Z
f1 (x1 ) =
...
f(x1 , . . . , xn )dx2 . . . dxn
(4)

and the conditional probability density function of X1 given values of the other variables is:
g1 (x1 |x2 . . . xn ) =

f(x1 , . . . , xn )
f0 (x2 , . . . , xn )

(5)

&
Page 29

%
Rohini Somanathan

'

Independence for the multivariate case

Independence: The n random variables X1 , . . . Xn are independent if for any n sets
A1 , A1 , . . . An or real numbers,
P(X1 A1 , X2 A2 , . . . , Xn An ) = P(X1 A1 )P(X2 A2 ) . . . P(Xn An )
If the joint distribution function of X1 , . . . Xn is given by F and the marginal d.f. for Xi by
Fi , it follows that X1 , . . . Xn will be independent if and only if, for all points (x1 , . . . xn ) <n
F(x1 , . . . xn ) = F1 (x1 )F2 (x2 ) . . . Fn (xn )
and, if these random variables have a continuous joint distribution with joint density
f(x1 , . . . xn ):
f(x1 , . . . xn ) = f1 (x1 )f2 (x2 ) . . . fn (xn )
In the case of a discrete joint distribution the above equality holds for the probability
function f.
Random samples: The n random variables X1 , . . . Xn form a random sample if these
variables are independent and the marginal p.f. or p.d.f. of each of them is f. It follows that
for all points (x1 , . . . xn ), their joint p.f or p.d.f. is given by
g(x1 , . . . , xn ) = f(x1 ) . . . f(xn )
The variables that form a random sample are said to be independent and identically
distributed (i.i.d.) and n is the sample size.

&
Page 30

%
Rohini Somanathan

'

Course 003: Basic Econometrics, 2012-2013

Multivariate distributions..example
Suppose we start with the following density function for a variable X1 :

ex for x > 0
f1 (x) =
0

otherwise

and are told that for any given value of X1 = x1 , two other random variables X2 and X3 are
independently and identically distributed with the following conditional p.d.f.:

x ex1 t for t > 0

1
g(t|x1 ) =
0
otherwise
The conditional p.d.f. is now given by g23 (x2 , x3 |x1 ) = x21 ex1 (x2 +x3 ) for non-negative values of
x2 , x3 (and zero otherwise) and the joint p.d.f of the three random variables is given by:
f(x1 , x2 , x3 ) = f1 (x1 )g23 (x2 , x3 |x1 ) = x21 ex1 (1+x2 +x3 )
for non-negative values of each of these variables. We can now obtain the marginal joint p.d.f of
X2 and X3 by integrating over X1

&
Page 31

%
Rohini Somanathan

'

Distributions of functions of random variables

Wed like to derive the distribution of X2 , knowing that X has a uniform distribution on (1, 1)
the density f(x) of X over this interval is

1
2

we know further than Y takes values in [0, 1).

the distribution function of Y is therefore given by

Zy

f(x)dx =

1 for 0 < y < 1

g(y) = 2 y
0

otherwise

&
Page 32

%
Rohini Somanathan

'

The Probability Integral Transformation

RESULT: Let X be a continuous random variable with the distribution function F and let
Y = F(X). Then Y must be uniformly distributed on [0, 1]. The transformation from X to Y is
called the probability integral transformation.
We know that the distribution function must take values between 0 and 1. If we pick any of
these values, y, the yth quantile of the distribution of X will be given by some number x and
Pr(Y y) = Pr(X x) = F(x) = y
which is the distribution function of a uniform random variable.
This result helps us generate random numbers from various distributions, because it allows
us to transform a sample from a uniform distribution into a sample from some other
distribution provided we can find F1 .
Example: Suppose we want a sample from an exponential distribution. The density is ex
defined over all x > 0 and the distribution function is 1 ex . If we pick from a uniform
between 0 and 1, and get (say) .3, we can invert the distribution function to get
x = log(10/7) = .36 as an observation of an exponential random variable.

&
Page 33

%
Rohini Somanathan

'

Random number generators

Historically, tables of random digits were used to generate a sample from a uniform
distribution. For example, consider the following series of digits
553617280595580771997955130480651347088612
If we want 10 numbers between 1 and 9, we start at a random digit in the table, and pick
the next 10 numbers. What about numbers between 1 and 100?
Today, we would never do this, but use a statistical package to generate these. In stata for
example:
runiform() returns uniformly distributed random variates on the interval [0,1).
Many packages also allow us to draw directly from the distribution we are interested in:
rnormal(m, s) returns normal(m, s) random variates, where m is the mean and s is the
standard deviation.

&
Page 34

%
Rohini Somanathan