Beruflich Dokumente
Kultur Dokumente
Page 0
Rohini Somanathan
'
&
Page 1 Rohini Somanathan
'
Random variables
Denition: Let (S, S, ) be a probability space. If X : S is a real-valued function having as its domain the elements of S, then X is called a random variable. A random variable is therefore a real-valued function dened on the space S. Typically x is used to denote this image value, i.e. x = X(s). If the outcomes of an experiment are inherently real numbers, they are directly interpretable as values of a random variable, and we can think of X as the identity function, so X(s) = s. Note that just like there are lots of ways of dening a sample space, depending on what we want to get out of the experiment, there are also many random variables that can be constructed based on the same experiment. We choose random variables based on what we are interested in getting out of the experiment. For example, we may be interested in the number of students passing an exam, and not the identities of those who pass. A random variable would assign each element in the sample space a number corresponding to the number of passes associated with that outcome. We therefore begin with a probability space (S, S, ) and arrive at an induced probability space (R(X), B, PX (A)). How exactly do we arrive at the function Px (.)? As long as every set A R(X) is associated with an event in our original sample space S, Px (A) is just the probability assigned to that event by P.
&
Page 2 Rohini Somanathan
'
Random variables..examples
1. Tossing a coin ten times. The sample space consists of the 210 possible sequences of heads and tails. There are many dierent random variables that could be associated with this experiment: X1 could be the number of heads, X2 the longest run of heads divided by the longest run of tails, X3 the number of times we get two heads immediately before a tail, etc... For s = HT T HHHHT T H, what are the values of these random variables? 2. Choosing a point in a plane Each outcome in the sample space is a point of the form s = (x, y) The random variable X could be the xcoordinate of the point. Another random variable Z would be the distance of the point from the origin, Z(s) = x2 + y2 . 3. Heights, weights, distances, temperature, scores, incomes... In these cases, we can have X(s) = s since these are already expressed as real numbers.
&
Page 3 Rohini Somanathan
'
&
Page 4 Rohini Somanathan
'
f(w)
In this case, the distribution function will be a step function, jumping at all points x in R(X) which are assigned positive probability.
&
Page 5 Rohini Somanathan
'
Discrete distributions
Denition: A random variable X has a discrete distribution if X can take only a nite number k of dierent values x1 , x2 , . . . , xK or an innite sequence of dierent values x1 , x2 , . . . . The function f(x) = P(X = x) is the probability function of x. We dene it to be f(x) for all values x in our sample space R(X) and zero elsewhere. If X has a discrete distribution, the probability of any subset A of the real line is given by P(X A) = f(xi ).
xi A
In each of the 3 examples we considered above, we have a clean expression for this probability function. Sometimes, we may want to use these types of functions if they can closely approximate the probabilities dened by more messy functions. Examples: 1. The discrete uniform distribution: picking one of the rst k non-negative integers at random 1 for x = 0, 1, 2, ...k, f(x) = k 0 otherwise 2. The binomial distribution: the probability of x successes in n trials. n px qnx for x = 0, 1, 2, ...n, x f(x) = 0 otherwise
&
Page 6 Rohini Somanathan
'
Continuous distributions
The sample space associated with our random variable often has an innite number of points.
Example: A point is randomly selected inside a circle of unit radius with origin (0, 0) where the probability assigned to being in a set A S is P(A) = area of A and X is the distance of the selected point from the origin. In this case F(x) = Pr(X x) = area of circle with radius x , so the distribution function of X is given by 0 for x < 0 F(x) = x2 0 x < 1 1 1x
Denition: A random variable X has a continuous distribution if there exists a nonnegative function f dened on the real line, such that for any interval A, P(X A) =
A
f(x)dx
The function f is called the probability density function or p.d.f. of X and must satisfy the conditions below 1. f(x) 0
2.
f(x)dx = 1
1 What is f(x) for the above example? How can you use this to compute P( 1 < X 2 )? How would 4
&
Page 7 Rohini Somanathan
'
Continuous distributions..examples
1. The uniform distribution on an interval: Suppose a and b are two real numbers with a < b. A point x is selected from the interval S = {x : a x b} and the probability that it belongs to any subinterval of S is proportional to the length of that subinterval. We say that a point is chosen at random from the interval (a, b). It follows that the p.d.f. must be constant on S and zero outside it: 1 for a x b f(x) = ba 0 otherwise Notice that the value of the p.d.f is the reciprocal of the length of the interval, these values can be greater than one, and the assignment of probabilities does not depend on whether the distribution is dened over the closed interval or the open interval (a, b) 2. Unbounded random variables: It is sometimes convenient to dene a p.d.f over unbounded sets, because such functions may be easier to work with and may approximate the actual distribution of a random variable quite well. An example is: 0 for x 0 f(x) = 1 2 for x > 0
(1+x)
3. Unbounded densities: The following function is unbounded around zero but still represents a valid density. 2 x 1 3 for 0 < x < 1 f(x) = 3 0 otherwise
&
Page 8 Rohini Somanathan
'
Mixed distributions
Often the process of collecting or recording data leads to censoring, and instead of obtaining a sample from a continuous distribution, we obtain one from a mixed distribution. Examples: The weight of an object is a continuous random variable, but our weighing scale only records weights up to a certain level. Households with very high incomes often underreport their income, for incomes above a certain level (say $250,000), surveys often club all households together - this variable is therefore top-censored. In each of these examples, we can derive the probability distribution for the new random variable, given the distribution for the continuous variable. In the example weve just considered: 0 for x 0 f(x) = 1 2 for x > 0 (1+x) suppose we record X = 3 for all values of X 3 The p.f. for our new random variable Y is given by the same p.f. for values less than 3 and by 1 for Y=3. 4 Some variables, such as the number of hours worked per week have a mixed distribution in the population, with mass points at 0 and 40.
&
Page 9 Rohini Somanathan
'
3. F(x) is right-continuous, i.e. F(x) = F(x+ ) at every point x, where F(x+ ) is the right hand limit of F(x).
( for discrete random variables, there will be a jump at values that are taken with positive probability)
&
Page 10 Rohini Somanathan
'
RESULT 3: For any given value x P(X < x) = F(x ) RESULT 4: For any given value x P(X = x) = F(x+ ) F(x ) The distribution function of a continuous random variable will be continuous and since
x
F(x) =
For discrete and mixed discrete-continous random variables F(x) will exhibit a countable number of discontinuities at jump points reecting the assignment of positive probabilities to a countable number of events.
&
Page 11 Rohini Somanathan
'
le of a
z1 z0
x1
x2
x3
x4
& 1.10. Similarly, the fact that Pr(X x) approaches 1 as x follows from Section
Page 12
%
Rohini Somanathan
'
Denition: When the distribution function of a random variable X is continuous and one-to-one over the whole set of possible values of X, we call the function F1 the quantile function of X. The value of F1 (p) is called the p quantile of X or the 100p percentile of X for each 0 < p < 1. Example: If X has a uniform distribution over the interval [a, b], F(x) =
xa ba
for x a and 1 for x > b. Given a value p, we simply solve for the pth quantile: x = pb + (1 p)a. Compute this for p = .5, .25, .9, . . .
&
Page 13 Rohini Somanathan
'
for 0 x 4 otherwise
&
Page 14 Rohini Somanathan
'
Bivariate distributions
Social scientists are typically interested in the manner in which multiple attributes of people and the societies they live in. The object of interest is a multivariate probability distribution. examples: education and earnings, days ill per month and age, sex-ratios and areas under rice cultivation) This involves dealing with the joint distribution of two or more random variables. Bivariate distributions attach probabilities to events that are dened by values taken by two random variables (say X and Y). Values taken by these random variables are now ordered pairs, (xi , yi ) and an event A is a set of such values. If both X and Y are discrete random variables, the probability function f(x, y) = P(X = x and Y = y) and P(X, Y) A = f(xi , yi )
(xi ,yi )A
&
Page 15 Rohini Somanathan
'
What are some features of a table like this one? In particular, how would we obtain probabilities associated with the following events: receiving no education becoming a female graduate completing primary school What else do you learn from the table about the population of interest?
&
Page 16 Rohini Somanathan
'
f(x, y)dxdy
f is now called the joint probability density function and must satisfy 1. f(x, y) 0 for < x < and < y <
2.
f(x, y)dxdy = 1
Example: Given the following joint density function on X and Y, well calculate P(X Y) (textbook section 3.4) cx2 y for x2 y 1 f(x, y) = 0 otherwise First nd c to make this a valid joint density (notice the limits of integration here)-it will turn out to be 21/4. Then integrate the density over Y (x2 , x) and X (0, 1). You could alternatively integrate over X (y, y) and Y (0, 1).
&
Page 17 Rohini Somanathan
'
&
If F(x, y) is continuously dierentiable in both its arguments, the joint density can be derived from it as: 2 F(x, y) f(x, y) = xy and given the density, we can integrate w.r.t x and y over the appropriate limits to get the distribution function.
Rohini Somanathan
Page 18
'
Marginal distributions
Weve seen that given the joint distribution of two random variables X and Y, we can derive the distribution of one of these random variables. A distribution of X derived from the joint distribution of X and Y is known as the marginal distribution of X. We can also derive marginal density or probability mass functions given the joint density. For a discrete random variable: f1 (x) = P(X = x) =
y
P(X = x and Y = y) =
y
f(x, y)
P(X = x and Y = y) =
x
f(x, y)
For a continuous joint density f(x, y), the marginal densities for X and Y are given by:
f1 (x) =
f(x, y)dx
Go back to our tabular distribution of the joint discrete distribution and see if can nd the marginal distribution of education. Can one construct the joint distribution from one of the marginal distributions?
&
Page 19 Rohini Somanathan
'
&
Page 20 Rohini Somanathan
'
&
Page 21 Rohini Somanathan
'
Notice that we cannot factorize the joint density as the product of a non-negative function of x and another non-negative function of y. Computing the marginals gives us f1 (x) = x + 1 1 for 0 < x < 1 and f2 (y) = y + for 0 < y < 1 2 2
so the product of the marginals is not equal to the joint density. 2. Suppose we have f(x, y) = kx2 y2 0 for x2 + y2 1 otherwise
In this case the possible values X can take depend on Y and therefore, even though the joint density can be factorized, the same factorization cannot work for all values of (x, y). The probability that X2 1, for example, will be obtained by integrating over the entire (1, 1) 3 1 interval if Y = 0, but if Y = 2 , X2 is constrained to be less than 4 More generally, whenever the space of positive probability density of X and Y is bounded by a curve that is neither a horizontal nor a vertical line, the two random variables are dependent.
&
Page 22 Rohini Somanathan
'
Conditional distributions
Denition: Consider two discrete random variables X and Y with a joint probability function f(x, y) and marginal probability functions f1 (x) and f2 (y). After the value Y = y has been observed, we can write the the probability that X = x using our denition of conditional probability: f(x, y) P(X = x and Y = y) = P(X = x|Y = y) = Pr(Y = y) f2 (y) g1 (x|y) =
f(x,y) f2 (y)
1. for each xed value of y, g1 (x|y) is a probability function over all possible values of X because it is non-negative and g1 (x|y) =
x
1 f2 (y)
f(x, y) =
x
1 f2 (y) = 1 f2 (y)
2. conditional probabilities are proportional to joint probabilities because they just divide these by a constant. We cannot use the denition of condition probability to derive the continuous conditional distributions because the probability that Y takes any particular value y is zero. For continuous random variables, we simply dene the conditional probability density function of X given Y = y as f(x, y) for ( < x < and < y < ) g1 (x|y) = f2 (y)
g1 (x|y)dx = 1
Rohini Somanathan
&
Page 23
'
&
Page 24 Rohini Somanathan
'
21 2 21 2 x ydy = x (1 x4 ) 4 8
x2
f(x,y) f1 (x) :
g2 (y|x) =
2y 1x4
for x2 y 1 otherwise
1
3 4
1 g2 (y| 2 ) =
7 15
&
Page 25 Rohini Somanathan
'
f1 (x) =
g2 (y|x) =
f2 (y) =
f(x, y)dx =
0
&
Page 26 Rohini Somanathan
'
Multivariate distributions
Our denitions of joint, conditional and marginal distributions can be easily extended to an arbitrary nite number of random variables. Such a distribution is now called a multivariate distributon. The joint distribution function is dened as the function F whose value at any point (x1 , x2 , . . . xn ) n is given by: F(x1 , . . . , xn ) = P(X1 x1 , X2 x2 , . . . , Xn xn ) For a discrete joint distribution, the probability function at any point (x1 , x2 , . . . xn ) by: f(x1 , . . . , xn ) = P(X1 = x1 , X2 = x2 , . . . , Xn = xn )
n
is given (2)
and the random variables X1 , . . . , Xn have a continuous joint distribution if there is a nonnegative function f dened on n such that for any subset A n , P[(X1 , . . . , Xn ) A] =
...A ...
(3)
The marginal distribution of any single random variable Xi can now be derived by integrating over the other variables f1 (x1 ) =
...
(4)
and the conditional probability density function of X1 given values of the other variables is: g1 (x1 |x2 . . . xn ) = f(x1 , . . . , xn ) f0 (x2 , . . . , xn ) (5)
&
Page 27 Rohini Somanathan
'
&
Page 28 Rohini Somanathan
'
Multivariate distributions..example
Suppose we start with the following density function for a variable X1 : ex for x > 0 f1 (x) = 0
otherwise
and are told that for any given value of X1 = x1 , two other random variables X2 and X3 are independently and identically distributed with the following conditional p.d.f.: x ex1 t for t > 0 1 g(t|x1 ) = 0 otherwise The conditional p.d.f. is now given by g23 (x2 , x3 |x1 ) = x2 ex1 (x2 +x3 ) for non-negative values of 1 x2 , x3 (and zero otherwise) and the joint p.d.f of the three random variables is given by: f(x1 , x2 , x3 ) = f1 (x1 )g23 (x2 , x3 |x1 ) = x2 ex1 (1+x2 +x3 ) 1 for non-negative values of each of these variables. We can now obtain the marginal joint p.d.f of X2 and X3 by integrating over X1
&
Page 29 Rohini Somanathan
'
we know further than Y takes values in [0, 1). the distribution function of Y is therefore given by
y
f(x)dx =
otherwise
note: Functions of continuous random variables need not be continuous- consider the example is Y = r(X) = c for any continuous random variable X.
&
Page 30 Rohini Somanathan
'
f(x) =
Therefore
ds(y) dy
g(y) =
0 otherwise 2. Suppose the density function for X is given by f(x) = 3x2 for x (0, 1) and Y = 1 X2 . In this case s(y) = 1 y so g(y) = 3 1 y 2
&
Page 31 Rohini Somanathan
'
&
Page 32 Rohini Somanathan