Beruflich Dokumente
Kultur Dokumente
Chapter 4
variance
<4.1>
covariance
correlation
standard deviation
The square root of the variance of a random variable is called its standard deviation.
As with expectations, variances and covariances can also be calculated conditionally on
various pieces of information.
Try not to confuse properties of expected values with properties of variances. For example, if a given piece of information implies that a random variable X must take the constant value C then E(X | information) = C, but var(X | information) = 0. More generally, if
the information implies that X must equal a constant then cov(X, Y ) = 0 for every random
variable Y . (You should check these assertions; they follow directly from the Definition.)
Notice that cov(X, X ) = var(X ). Results about covariances contain results about variances as special cases.
A few facts about variances and covariances
Write Y for EY , and so on, as above.
(i) cov(Y, Z ) = E(Y Z ) (EY )(EZ ) and, in particular, var(X ) = E(X 2 ) (EX )2 :
cov(Y, Z ) = E (Y Z Y Z Z Y + Y Z )
= E(Y Z ) Y EZ Z EY + Y Z
= E(Y Z ) Y Z
(ii) For constants a, b, c, d, and random variables U, V, Y, Z ,
cov(aU + bV,cY + d Z )
= ac cov(U, Y ) + bc cov(V, Y ) + ad cov(U, Z ) + bd cov(V, Z )
Statistics 241: 23 September 1997
c David
Pollard
Chapter 4
It is easier to see the pattern if we work with the centered random variables U 0 =
U U , . . . , Z 0 = Z Z . For then the left-hand side of the asserted equality
expands to
Notice how the constant c disppeared, and the d turned into d 2 . Many students
confuse the formula for var(c + d Z ) with the formula for E(c + d Z ). Again, when
in doubt, rederive.
independent
<4.2>
Example. Suppose a random variable X has a discrete distribution. The expected value
E(X Y ) can then be rewritten as a weighted sum of conditional expectations:
X
E(X Y ) =
P{X = x}E(X Y |X = x)
by rule E4 for expectations
x
If Y is independent of X , the information X = x does not help with the calculation of the
conditional expectation:
E(Y | X = x) = E(Y )
if Y is independent of X.
if Y independent of X.
c David
Pollard
Page 2
Chapter 4
uncorrelated
<4.3>
Example. For two independent rolls of a fair die, let X denote the value rolled the first
time and Y denote the value rolled the second time. The random variables X and Y are
independent, and they have the same distribution. Consequently cov(X, Y ) = 0, and
var(X ) = var(Y ).
The two random variables X + Y and X Y are uncorrelated:
cov(X + Y, X Y ) = cov(X, X ) + cov(X, Y ) + cov(Y, X ) + cov(Y, Y )
= var(X ) cov(X, Y ) + cov(Y, X ) var(Y )
= 0.
The two random variables X + Y and X Y are not independent:
1
P{X + Y = 12} = P{X = 6}P{Y = 6} =
36
but
P{X + Y = 12 | X Y = 5} = P{X + Y = 12 | X = 6, Y = 1} = 0
If Y and Z are uncorrelated, the covariance term drops out from the expression for the
variance of their sum, leaving
var(Y + Z ) = var(Y ) + var(Z )
Similarly, if X 1 , . . . , X n are random variables for which cov(X i , X j ) = 0 for each i 6= j then
var(X 1 + . . . + X n ) = var(X 1 ) + . . . + var(X n )
You should check the last assertion by expanding out the quadratic in the variables X i EX i ,
observing how all the cross-product terms disappear because of the zero covariances.
Probability bounds
Tchebychevs inequality
!
X
X
2
X i = 1/n 2
var(X i ) = 2 /n.
var(X ) = (1/n) var
in
in
c David
Pollard
Page 3
Chapter 4
For example, there is at most a 1% chance that X lies more than 10/ n away from . (A
normal approximation will give a much tighter bound.) Note well the dependence on n.
Variance as a measure of concentration in sampling theory
<4.4>
random sample
var(Yi ) =
2
2
1 X
f () f P{X i = } =
f () f
N
Y =
1
(Y1 + . . . + Yn ),
n
1
(EY1 + . . . + EYn ) = f
n
and variance
1
var(Y1 + . . . + Yn )
n2
1
by independence
= 2 (var(Y1 ) + . . . + var(Yn ))
n
2
=
n
The sample average Y concentrates around f with a standard deviation / n that tends to
zero as n gets larger.
Now consider a sample X 1 , . . . , X n taken without replacement. By symmetry, each X i
has the same distribution as before, and
EYi = f
and
var(Yi ) = 2 .
var(Y ) =
c David
Pollard
Page 4
Chapter 4
This time the dependence between the X i has an important effect on the variance of Y .
By symmetry, for each pair i 6= j, the pair (X i , X j ) takes each of the N (N 1) values
(, ), for 1 6= N , with probabilities 1/N (N 1). Consequently, for i 6= j,
X
1
( f () f)( f () f)
cov(Yi , Y j ) =
N (N 1) 6=
If we added the diagonal terms ( f () f)2 to the sum we would have the expansion for
X
X
f () f
f () f ,
!
X
1
2
( f () f)2 =
cov(Yi , Y j ) =
02
N (N 1)
N 1
If N is large, the covariance between any Yi and Y j , for i 6= j, is small; but there a lot
of them:
1
var(Y ) = 2 var(Y1 + . . . + Yn )
n
2
1
= 2 E (Y1 f) + . . . + (Yn f)
n
If you expand out the last quadratic you will get n terms of the form (Yi f)2 and n(n 1)
terms of the form (Yi f)(Y j f) with i 6= j. Take expectations:
1
(nvar(Y1 ) + n(n 1)cov(Y1 , Y2 ))
n2
n1
2
1
=
n
N 1
N n 2
=
N 1 n
var(Y ) =
correction factor
Compare with the 2 /n for var(Y ) under sampling with replacement. The correction
factor (N n)/(N 1) is close to 1 if the sample size n is small compared with the
population size N , but it can decrease the variance of Y appreciably if n/N is not small.
For example, if n N /6 (as with the Census long form) the correction factor is approximately 5/6. If n = N , the correction factor is zero. That is, var(Y ) = 0 if the whole population is sampled. Why? What value must Y take if n = N ? (As an exercise, see if you can
use the fact that var(Y ) = 0 when n = N to rederive the expression for cov(Y1 , Y2 ).)
You could safely skip the rest of this Chapter, at least for the moment.
zzzzzzzzzzzzzzzz
Variances via conditioning
Suppose F = {F1 , F2 , . . .} is a partition of the sample space into disjoint events. Let X be a
random variable. Write yi for E(X | Fi ) and vi for
var(X | Fi ) = E(X 2 | Fi ) (E(X | Fi ))2 .
Define two new random variables by the functions
Y (s) = yi
V (s) = vi
Statistics 241: 23 September 1997
for s Fi
for s Fi
c David
Pollard
Page 5
Chapter 4
Notice that E(Y | Fi ) = yi , because Y must take the value yi if the event Fi occurs. By rule
E4 for expectations,
X
E(Y | Fi )PFi
EY =
i
yi PFi
E(X | Fi )PFi
= EX
by E4 again
Similarly,
var(Y ) = EY 2 (EY )2
X
=
E(Y 2 | Fi )PFi (EY )2
i
X
i
vi PFi
= EX 2
yi2 PFi
c David
Pollard
Page 6