Beruflich Dokumente
Kultur Dokumente
Econ 210
) + (E[]
)2
= E ( E[])
]
h
i
h
i
(E[]
) + (E[]
)2
2 +2 E E[]
= E ( E[])
|
{z }
{z
}
|
{z
}
|
2
=0
V ar()
Bias()
+ Bias()
2.
= V ar()
= Bias()
= 0. Then
(b) Let Bias()
M SE()
M SE()
+ Bias()
2 V ar()
+ Bias()
2
V ar()
V ar().
V ar()
(c) Indeed, we might prefer a biased estimator over an unbiased one if both are consistent
and the biased estimator has a smaller mean squared error. Imagine is slightly biased
but has a very small sampling variance, while is unbiased but has an enormous variance
in the same sized sample. Since the goal of inference is to learn about the true parameter
which is equivalent to
N ) = 0.
lim M SE(X
1
NV
N ) = 0; hence
ar(X) and Bias(X
N ) = 1 V ar(X).
M SE(X
N
Takiing the limit as N yields the result:
N ) = lim 1 V ar(X) = 0
lim M SE(X
N
N N
(as long as X has finite variance).
2. We start by showing that (a) and (c) are equal:
PN
n=1 xn yn N X Y
2
2
n=1 xn N X
PN
P
P
N
N
1
1
x
y
N
x
y
n=1 n n
n=1 n
n=1 n
N
N
PN
2
2
n=1 xn N X
P
PN
P
N
x
y
x
y
N N
n
n
n
n
n=1
n=1
n=1
PN
2
2
2
N n=1 xn N X
P
P
PN
N
N
x
y
N n=1 xn yn
n=1 n
n=1 n
P
2
PN
x
N n=1 x2n N 2 N1 N
n=1 n
P
P
PN
N
N
x
y
N n=1 xn yn
n=1 n
n=1 n
.
P
2
PN
N
N n=1 x2n
x
n=1 n
PN
Next, we show that (b) and (d) are equal. Since the denominators are the same, it is enough
to show that the numerators are equal:
N
X
(xn X)(yn Y ) =
n=1
N
X
(xn yn xn Y Xyn + XY )
n=1
N
X
(xn X)yn +
n=1
N
X
N
X
(XY Y xn )
n=1
(xn X)yn + N XY N Y
N
1 X
xn
N
n=1
n=1
N
X
(xn X)yn + N XY N XY
n=1
N
X
(xn X)yn .
n=1
PN
n=1 xn yn N XY
PN
2
2
n=1 xn N X
n=1 xn yn
PN
n=1 (xn
= PN
xn Y Xyn + XY
2
X + X)2 N X
PN
n=1 (xn X)(yn Y )
2
=
=
=
=
3. (a) We can use any of the expressions from the previous problem for b. Using 2(d), for
example, we have
P6
(xn X)(zn Z)
b = n=1
= 1.
P6
2
n=1 (xn X)
Then
a
= Z bX = 26 40 = 14.
(b)
P
(
zn Z)2
ESS
R =
= Pn
0.73,
2
T SS
n (zn Z)
2
where zn = a
+ bxn .
(c) We can estimate the variance of U with the sample variance of the residuals:
6
Su2
1 X 2
u
n = 7.5,
=
62
n=1
where u
n = zn a
bxn . Recall that the degrees of freedom correction of 2 in the
denominator stems from the fact that both a
and b are estimated from the data.
(d) Our R2 of 0.73 means that 73 percent of the variability in water consumption is explained
by variation in temperature. Our simple model has a good deal of explanatory power,
since most of the variation of water consumption around its mean is driven by predictable
variation in z.
2 2
N
1 X
2
(yn 0 1 xn )2
2
n=1
n=1
which gives us
0M L = y 1M L x
and
[1 ] :
N
1 X
(yn 0 1 xn )xn = 0
2
n=1
which gives us
N
X
z }|0 { X
N
N
X
yn xn (y 1 x)
xn 1
x2n = 0
n=1
n=1
1M L
PN
n=1
= P
N
n=1
yn xn N yx
2
n=1 xn
N x2
LS
n=1 xn (yn Y )
n=1 yn xn N y
= P
= 1M L
1 = PN
N
2 N x2
x
(x
X)
x
n=1 n n
n=1 n
We could now simply refer to the result in problem 2, conclude that 1LS = 1M L =
1M OM , and move on. However, before we do that, note that the FOC for 1 found in
the previous part of this problem is exactly the same of as in the least squares case! This
implies that the estimators also have to be equal as well. Hence, we didnt actually have
to compute the MLE to know that it was equivalent to the least squares estimator.1
(c) We want to show that the ML sample regression line passes through the point (X, Y ).
The ML regression line is
Y = 0M L + 1M L X
At X = X, we have
Y = 0M L + 1M L X = Y 1M L X + 1M L X = Y
(d) We now assume V ar[X] (0, ) and E[XU ] = E[U ] = 0. The objective is to show that
the ML estimators are consistent. We have
0 = E[Y ] 1 E[X]
and
1 =
Cov[X, Y ]
V ar[X]
Note that
PN
x
ML
n=1 yn xn N y
1 = P
N
2
2
n=1 xn N x
PN
(xn x)(yn y)
= n=1
PN
2
n=1 (xn x)
PN
1 PN
n=1 (0 + 1 xn + un ))
n=1 (xn x)(0 + 1 xn + un N
=
PN
2
n=1 (xn x)
PN
(xn x)(1 (xn x) + un + u))
= n=1
PN
2
n=1 (xn x)
Pn
(x x)(un u)
Pn n
= 1 + n=1
2
n=1 (xn x)
Hence, we want to show that
Pn
(x x)(un
n=1
Pn n
2
n=1 (xn x)
u)
This is a result of the assumption that the errors are normally distributed.
and
E[(xn x)2 ] = V ar[X]
By the weak law of large numbers (which we can apply since the observations are i.i.d.),
we have
n
1 X
p
(xn x)(un u) 0
N
n=1
and
n
1 X
p
(xn x)2 V ar[X]
N
n=1
Hence, by Slutskys theorem (which we can apply since V ar[X] (0, )), we have
1 Pn
0
p
n=1 (xn x)(un u)
N
=0
1 Pn
2
V
ar[X]
x)
(x
n=1 n
N
For the MLE of 0 , we want to show that
p
0 = y 1 x 0
Note that
N
1 X
(0 + 1 xn + un ) 1 x = 0 + x(1 1 ) + u.
0 =
N
n=1
p
p
Here, u E[U ] = 0 by the WLLN, and x(1 1 ) 0 by Slutskys theorem
and the result that we just showed. Finally, applying Slutskys theorem one more time
gives us that
p
0 + x(1 1 ) + u 0
(e) We showed in part (b) that the MLE is equivalent to the least squares estimator, which
is consistent under the given assumptions. Hence, the formula we derived in part (a)
will still be a consistent estimator of the population slope coefficient.
5. The model is
Y = 0 + 1 X + U
where
E[U ] = 0
but
Cov[X, U ] =
(a) We want to derive the population slope coefficient. In order to do so, we use the two
conditions E[U ] = 0 and Cov[X, U ] = .
=U
E[U ] = 0
}|
{
z
E[Y 0 1 X] = 0
E[Y ] 0 1 E[X] = 0
0 = E[Y ] 1 E[X]
=0
Cov[X, U ] =
z }| {
E[(X E[X])(U E[U ])] = E[XU E[X]U ] =
E[XU ] =
E[X(Y 0 1 X)] =
Cov[X, Y ]
(b) The model is NOT identified, because we dont get to observe any data on U . Therefore,
it is impossible to estimate = COV (X, U ), which in turn means that there is no way
of estimating 1 using only data on X and Y .
(c) Recall that we know
SXY
b1LS = 2 .
SX
As we saw in class, the continuous mapping theorem and the weak law of large numbers
P
P
imply that SY2
V AR(X) and SXY
COV (X, Y ). Therefore, since 0 < V AR(X) <
, the continuous mapping theorem also implies that b1LS = SSXY
converges in proba2
X
(X,Y )
bility to COV
V AR(X) = 1 + V ar[X] 6= 1 . Since the probability limit is not 1 , that means
bLS is both biased in finite samples and inconsistent as well.
1
(d) In light of the previous part, the OLS estimate does NOT meaningfully reflect the
causal link between X and Y . Moreover, since we have no way of knowing what =
COV (X, U ) is, there is also no way of telling how far off b1LS is from the actual population
slope 1 . NOT GOOD!
(e) If > 0 then our estimate will be biased upwards. The effect of schooling will appear
bigger than it really is since some of the increase in wages comes from the higher ability
of the people selecting into schooling. This would make it harder for us to defend the
position that schooling leads to higher return in the labor market.
7
(f) If < 0 then there is a downwards bias. Since this goes against our hypothesis that
schooling drives wages upward, we can interpret our least squares estimate b1 as a lower
bound on the causal effect. If b1 happens to be positive anyway, then this still provides
evidence in favor of a positive relationship between schooling and wage. In this case we
may still be able to argue that schooling actually causes wages to rise, even if we still
cant say what the exact magnitude of the effect is. Of course, if our point estimate b1
comes out negative, the estimated lower bound is not terribly informative, given that
we already suspected a positive effect to begin with.
In the real world, we would expect the covariance between schooling and unobserved
productivity to be positive. One reason for this is that schooling is costly to acquire,
in terms of both the effort involved and monetary investments too. Therefore, the
more motivated someone is, the easier it will be to stay in school longer and learn
new concepts. Moreover, if people have to pay money to get a college degree after
high school, then only workers who expect the highest labor market returns will be
willing to incur that investment cost. These tend to be workers with higher unobserved
productivity characteristics (like responsibility and leadership) who will get promoted
more and make more money to pay of their student loans more quickly. Workers with
very low productivity characteristics will not expect to make enough money on the
labor market to pay off their student loans quickly enough, so they will either drop out
of college or never enroll to begin with.
6. (a) We can use the command
. count
to find that there are 420 observations in the data set.
(b) The variable avginc is the average district income in 1000s of dollars. We define a new
variable income by
. gen income = ( avginc * 1000 )
i. As avginc measures average income in 1000s, income measures average income in
actual dollars.
ii. We use
. tabstat avginc, statistics( mean sd )
To find that the mean of avginc is 15.3166 and that the standard deviation is 7.22589.
(There are of course other ways to do this. You can for example use the summarize
command.)
iii. In the same way as for avginc, we find the mean of standard deviation of income.
We see that the variable has a mean of 15316.6 and a standard deviation of 7225.89.
Not surprisingly, the mean and standard deviation are multiplied by a factor of 1000.
(c)
E[ math scr | class size 20] = E[math scr | class size > 20 ]
H1 :
E[ math scr | class size 20] 6= E[math scr | class size > 20 ]
In other words, our null hypothesis is that the conditional expectations are the same
in each type of class. In order to perform this test in Stata, create a dummy variable
that assumes the value 1 for each district that has an average class size less than 20:
. gen less20 = 0
. replace less20 = 1 if class_size <= 20
Run a t-test of whether the math score differs between the groups:
. ttest math_scr, by(less20) level(90) unequal
9
Ha: diff != 0
Pr(|T| > |t|) = 0.0019
Note that the p-value for the hypothesis that the conditional expectations are different is 0.0019. Also note that the 90 percent confidence interval for the difference
in means lies strictly below zero. We are thus able to reject the null hypothesis at
the 10 percent level.
vi. The command
. correlate avginc math_scr, cov
gives us that the covariance between avg inc and math scr is 94.7795. Using the
same command for income and math score gives us a covariance of 94779.5. The
covariances differ since Cov(aX, Y ) = aCov(X, Y ) where a is a constant, equal to
1000 in this case.
vii. The command
. correlate avginc math_scr
gives us that the correlation between avg inc and math scr is 0.6994. Using the
same command for income and math score gives us the same correlation of 0.6994.
10
11