Suggested Solutions: Problem Set 3 Econ 210: April 27, 2015

Suggested Solutions: Problem Set 3
Econ 210
April 27, 2015

1. (a) Using the hint, we have
= E[( )2 ]
M SE()

2
+ (E[]
)
= E ( E[])
i
h
2 + 2( E[])(E[
) + (E[]
)2
= E ( E[])
]
h
i
h
i
(E[]
) + (E[]
)2
2 +2 E E[]
= E ( E[])
|
{z }
{z
}
|
{z
}
|
2
=0
V ar()
Bias()
+ Bias()
2.
= V ar()
= Bias()
= 0. Then
(b) Let Bias()
M SE()
M SE()
+ Bias()
2 V ar()
+ Bias()
2
V ar()
V ar().
V ar()
(c) Indeed, we might prefer a biased estimator over an unbiased one if both are consistent
and the biased estimator has a smaller mean squared error. Imagine is slightly biased
but has a very small sampling variance, while is unbiased but has an enormous variance
in the same sized sample. Since the goal of inference is to learn about the true parameter
in a finite sample, we may learn more from than .

(d) We want to show
N )2 ] = 0,
lim E[(X
which is equivalent to
N ) = 0.
lim M SE(X
Using the result from part (a), we have

N ) = V ar(X
N ) + Bias(X
N )2
M SE(X
N ) =
We proved in lecture that V ar(X
1
NV
N ) = 0; hence
ar(X) and Bias(X
N ) = 1 V ar(X).
M SE(X
N
Takiing the limit as N yields the result:
N ) = lim 1 V ar(X) = 0
lim M SE(X
N
N N
(as long as X has finite variance).
2. We start by showing that (a) and (c) are equal:
PN
n=1 xn yn N X Y
2
2
n=1 xn N X
PN
P
P

N
N
1
1
x
y
N
x
y
n=1 n n
n=1 n
n=1 n
N
N
PN
2
2
n=1 xn N X

P

PN
P
N
x
y
x
y
N N
n
n
n
n
n=1
n=1
n=1
PN
2
2
2
N n=1 xn N X
P
P

PN
N
N
x
y
N n=1 xn yn
n=1 n
n=1 n
P
2
PN
x
N n=1 x2n N 2 N1 N
n=1 n
P
P

PN
N
N
x
y
N n=1 xn yn
n=1 n
n=1 n
.
P
2
PN
N
N n=1 x2n
x
n=1 n
PN
Next, we show that (b) and (d) are equal. Since the denominators are the same, it is enough
to show that the numerators are equal:
N
X
(xn X)(yn Y ) =
n=1
N
X
(xn yn xn Y Xyn + XY )
n=1
N
X
(xn X)yn +
n=1
N
X
N
X
(XY Y xn )
n=1
(xn X)yn + N XY N Y
N
1 X
xn
N
n=1
n=1
N
X
(xn X)yn + N XY N XY
n=1
N
X
(xn X)yn .
n=1
Finally, we show that (a) and (d) are equal:

PN
PN
n=1 xn yn N XY
PN
2
2
n=1 xn N X
n=1 xn yn
PN
n=1 (xn
= PN
xn Y Xyn + XY
2
X + X)2 N X
PN
n=1 (xn X)(yn Y )
2
=
=
=
=
X)2 + X + 2X(xn X)] N X

PN
n=1 (xn X)(yn Y )
PN
PN
2
2
2
n=1 (xn X) + N X + 2X( n=1 xn N X) N X
PN
n=1 (xn X)(yn Y )
PN
2
2
2
n=1 (xn X) + N X + 2N X(X X) N X
PN
n=1 (xn X)(yn Y )
PN
2
2
2
n=1 (xn X) + N X N X
PN
n=1 (xn X)(yn Y )
.
PN
2
n=1 (xn X)
n=1 [(xn
3. (a) We can use any of the expressions from the previous problem for b. Using 2(d), for
example, we have
P6
(xn X)(zn Z)
b = n=1
= 1.
P6
2
n=1 (xn X)
Then
a
= Z bX = 26 40 = 14.
(b)
P
(
zn Z)2
ESS
R =
= Pn
0.73,
2
T SS
n (zn Z)
2
where zn = a
+ bxn .
(c) We can estimate the variance of U with the sample variance of the residuals:
6
Su2
1 X 2
u
n = 7.5,
=
62
n=1
where u
n = zn a
bxn . Recall that the degrees of freedom correction of 2 in the
denominator stems from the fact that both a
and b are estimated from the data.
(d) Our R2 of 0.73 means that 73 percent of the variability in water consumption is explained
by variation in temperature. Our simple model has a good deal of explanatory power,
since most of the variation of water consumption around its mean is driven by predictable
variation in z.
4. The regression model is

Y = 0 + 1 X + U
(a) We want to find the MLE of 0 and 1 . Since we know the distribution of U , we simply
follow the hint and substitute un = yn 0 1 xn into the pdf to obtain the likelihood
function:
N

1 PN
2
1
e 22 n=1 (yn 0 1 xn )
L(0 , 1 ; x, y) =
2 2
The log-likelihood function is

l(0 , 1 ; x, y) = N ln
2 2
N
1 X
2
(yn 0 1 xn )2
2
n=1
We have two FOCs; one wrt 0 and one wrt 1 :

N
1 X
(yn 0 1 xn ) = 0
[0 ] : 2
n=1
which gives us
0M L = y 1M L x
and
[1 ] :
N
1 X
(yn 0 1 xn )xn = 0
2
n=1
which gives us
N
X
z }|0 { X
N
N
X
yn xn (y 1 x)
xn 1
x2n = 0
n=1
n=1
1M L
PN
n=1
= P
N
n=1
yn xn N yx
2
n=1 xn
N x2
Note that this is the same expression as (a) in problem 2.

(b) We now want so show that 1M L and the least squares estimator of 1 are equivalent to
the method of moments estimator. There are several things to note there. First, our
expression for the maximum likelihood estimator that we just found was equal to (a) in
problem 2. As we have already shown, (a) is equivalent to (d), which is the method of
moments estimator. Second, it is easy to see that
PN
PN
x
LS
n=1 xn (yn Y )
n=1 yn xn N y
= P
= 1M L
1 = PN
N
2 N x2
x
(x
X)
x
n=1 n n
n=1 n
We could now simply refer to the result in problem 2, conclude that 1LS = 1M L =
1M OM , and move on. However, before we do that, note that the FOC for 1 found in
the previous part of this problem is exactly the same of as in the least squares case! This
implies that the estimators also have to be equal as well. Hence, we didnt actually have
to compute the MLE to know that it was equivalent to the least squares estimator.1
(c) We want to show that the ML sample regression line passes through the point (X, Y ).
The ML regression line is
Y = 0M L + 1M L X
At X = X, we have
Y = 0M L + 1M L X = Y 1M L X + 1M L X = Y
(d) We now assume V ar[X] (0, ) and E[XU ] = E[U ] = 0. The objective is to show that
the ML estimators are consistent. We have
0 = E[Y ] 1 E[X]
and
1 =
Cov[X, Y ]
V ar[X]
Note that
PN
x
ML
n=1 yn xn N y
1 = P
N
2
2
n=1 xn N x
PN
(xn x)(yn y)
= n=1
PN
2
n=1 (xn x)
PN
1 PN
n=1 (0 + 1 xn + un ))
n=1 (xn x)(0 + 1 xn + un N
=
PN
2
n=1 (xn x)
PN
(xn x)(1 (xn x) + un + u))
= n=1
PN
2
n=1 (xn x)
Pn
(x x)(un u)
Pn n
= 1 + n=1
2
n=1 (xn x)
Hence, we want to show that
Pn
(x x)(un
n=1
Pn n
2
n=1 (xn x)
u)
Consider the numerator. Since E[XU ] = E[U ] = 0, we have

E[(xn x)(un u)] = 0
1
This is a result of the assumption that the errors are normally distributed.
and
E[(xn x)2 ] = V ar[X]
By the weak law of large numbers (which we can apply since the observations are i.i.d.),
we have
n
1 X
p
(xn x)(un u) 0
N
n=1
and
n
1 X
p
(xn x)2 V ar[X]
N
n=1
Hence, by Slutskys theorem (which we can apply since V ar[X] (0, )), we have
1 Pn
0
p
n=1 (xn x)(un u)
N
=0
1 Pn
2
V
ar[X]
x)
(x
n=1 n
N
For the MLE of 0 , we want to show that
p
0 = y 1 x 0
Note that
N
1 X
(0 + 1 xn + un ) 1 x = 0 + x(1 1 ) + u.
0 =
N
n=1
p
p
Here, u E[U ] = 0 by the WLLN, and x(1 1 ) 0 by Slutskys theorem
and the result that we just showed. Finally, applying Slutskys theorem one more time
gives us that
p
0 + x(1 1 ) + u 0
(e) We showed in part (b) that the MLE is equivalent to the least squares estimator, which
is consistent under the given assumptions. Hence, the formula we derived in part (a)
will still be a consistent estimator of the population slope coefficient.
5. The model is
Y = 0 + 1 X + U
where
E[U ] = 0
but
Cov[X, U ] =
(a) We want to derive the population slope coefficient. In order to do so, we use the two
conditions E[U ] = 0 and Cov[X, U ] = .
=U
E[U ] = 0
}|
{
z
E[Y 0 1 X] = 0
E[Y ] 0 1 E[X] = 0
0 = E[Y ] 1 E[X]
=0
Cov[X, U ] =
z }| {
E[(X E[X])(U E[U ])] = E[XU E[X]U ] =
E[XU ] =
E[X(Y 0 1 X)] =
E[X(Y E[Y ] + E[Y ] 0 1 X)] =
E[X(Y E[Y ])] 1 E[X(X E[X])] =
where we used the result E[Y ] = 0 + 1 E[X]. Finally, we obtain:

1 =
Cov[X, Y ]
E[X(Y E[Y ])]
E[X(X E[X])]] E[X(X E[X])]]

V ar[X]
V ar[X]
(b) The model is NOT identified, because we dont get to observe any data on U . Therefore,
it is impossible to estimate = COV (X, U ), which in turn means that there is no way
of estimating 1 using only data on X and Y .
(c) Recall that we know
SXY
b1LS = 2 .
SX
As we saw in class, the continuous mapping theorem and the weak law of large numbers
P
P
imply that SY2
V AR(X) and SXY
COV (X, Y ). Therefore, since 0 < V AR(X) <
, the continuous mapping theorem also implies that b1LS = SSXY
converges in proba2
X
(X,Y )
bility to COV
V AR(X) = 1 + V ar[X] 6= 1 . Since the probability limit is not 1 , that means
bLS is both biased in finite samples and inconsistent as well.
1
(d) In light of the previous part, the OLS estimate does NOT meaningfully reflect the
causal link between X and Y . Moreover, since we have no way of knowing what =
COV (X, U ) is, there is also no way of telling how far off b1LS is from the actual population
slope 1 . NOT GOOD!
(e) If > 0 then our estimate will be biased upwards. The effect of schooling will appear
bigger than it really is since some of the increase in wages comes from the higher ability
of the people selecting into schooling. This would make it harder for us to defend the
position that schooling leads to higher return in the labor market.
7
(f) If < 0 then there is a downwards bias. Since this goes against our hypothesis that
schooling drives wages upward, we can interpret our least squares estimate b1 as a lower
bound on the causal effect. If b1 happens to be positive anyway, then this still provides
evidence in favor of a positive relationship between schooling and wage. In this case we
may still be able to argue that schooling actually causes wages to rise, even if we still
cant say what the exact magnitude of the effect is. Of course, if our point estimate b1
comes out negative, the estimated lower bound is not terribly informative, given that
we already suspected a positive effect to begin with.
In the real world, we would expect the covariance between schooling and unobserved
productivity to be positive. One reason for this is that schooling is costly to acquire,
in terms of both the effort involved and monetary investments too. Therefore, the
more motivated someone is, the easier it will be to stay in school longer and learn
new concepts. Moreover, if people have to pay money to get a college degree after
high school, then only workers who expect the highest labor market returns will be
willing to incur that investment cost. These tend to be workers with higher unobserved
productivity characteristics (like responsibility and leadership) who will get promoted
more and make more money to pay of their student loans more quickly. Workers with
very low productivity characteristics will not expect to make enough money on the
labor market to pay off their student loans quickly enough, so they will either drop out
of college or never enroll to begin with.
6. (a) We can use the command
. count
to find that there are 420 observations in the data set.
(b) The variable avginc is the average district income in 1000s of dollars. We define a new
variable income by
. gen income = ( avginc * 1000 )
i. As avginc measures average income in 1000s, income measures average income in
actual dollars.
ii. We use
. tabstat avginc, statistics( mean sd )
To find that the mean of avginc is 15.3166 and that the standard deviation is 7.22589.
(There are of course other ways to do this. You can for example use the summarize
command.)
iii. In the same way as for avginc, we find the mean of standard deviation of income.
We see that the variable has a mean of 15316.6 and a standard deviation of 7225.89.
Not surprisingly, the mean and standard deviation are multiplied by a factor of 1000.
(c)
i. With the command

. tabstat math_scr, statistics( mean sd )
we obtain that the mean math score is 653.343 and that the standard deviation is
18.7542.
ii. Assuming that there is only one teacher per class, we can generate class size by
. gen class_size = ( enrl_tot / teachers )
To get find out what fraction of districts that have an average class size of 20 or
fewer students, we can do the following:
. count if class_size <= 20
to see that 243 districts met this criterion. Hence, a share 243/420 0.58 have 20
students or fewer. To find the mean math score these districts, use
. tabstat math_scr if class_size <= 20, statistics( mean )
The mean is 655.7177.
iii. Repeating the same commands as in part (ii), but with > instead of <= gives us
that the share of districts with average class size > 20 is about 0.42, and the average
math score in these districts is 650.0819.
iv. share20 score20 + share>20 score>20 = scoretot
v. We want to test
H0 :
E[ math scr | class size 20] = E[math scr | class size > 20 ]
H1 :
E[ math scr | class size 20] 6= E[math scr | class size > 20 ]
In other words, our null hypothesis is that the conditional expectations are the same
in each type of class. In order to perform this test in Stata, create a dummy variable
that assumes the value 1 for each district that has an average class size less than 20:
. gen less20 = 0
. replace less20 = 1 if class_size <= 20
Run a t-test of whether the math score differs between the groups:
. ttest math_scr, by(less20) level(90) unequal
9
We get the following result:

Two-sample t test with unequal variances
-----------------------------------------------------------------------------Group |
Obs
Mean
Std. Err.
Std. Dev.
[90% Conf. Interval]
---------+-------------------------------------------------------------------0 |
177
650.0819
1.311092
17.44295
647.914
652.2499
1 |
243
655.7177
1.24102
19.34559
653.6686
657.7668
---------+-------------------------------------------------------------------combined |
420
653.3426
.9151114
18.7542
651.8341
654.8512
---------+-------------------------------------------------------------------diff |
-5.63577
1.805296
-8.61212
-2.65942
-----------------------------------------------------------------------------diff = mean(0) - mean(1)
t = -3.1218
Ho: diff = 0
Satterthwaites degrees of freedom = 399.455
Ha: diff < 0
Pr(T < t) = 0.0010
Ha: diff != 0
Pr(|T| > |t|) = 0.0019
Ha: diff > 0

Pr(T > t) = 0.9990
Note that the p-value for the hypothesis that the conditional expectations are different is 0.0019. Also note that the 90 percent confidence interval for the difference
in means lies strictly below zero. We are thus able to reject the null hypothesis at
the 10 percent level.
vi. The command
. correlate avginc math_scr, cov
gives us that the covariance between avg inc and math scr is 94.7795. Using the
same command for income and math score gives us a covariance of 94779.5. The
covariances differ since Cov(aX, Y ) = aCov(X, Y ) where a is a constant, equal to
1000 in this case.
vii. The command
. correlate avginc math_scr
gives us that the correlation between avg inc and math scr is 0.6994. Using the
same command for income and math score gives us the same correlation of 0.6994.
10
The correlations are the same since

Cov(aX, Y )
Corr(aX, Y ) = p
V ar(aX)V ar(Y )
aCov(X, Y )
=p
2
a V ar(X)V ar(Y )
Cov(X, Y )
=p
V ar(X)V ar(Y )
= Corr(X, Y ).
11

Suggested Solutions: Problem Set 3 Econ 210: April 27, 2015

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Suggested Solutions: Problem Set 3 Econ 210: April 27, 2015

Hochgeladen von

Copyright:

Verfügbare Formate

Suggested Solutions: Problem Set 3

April 27, 2015

in a finite sample, we may learn more from than .

Using the result from part (a), we have

Finally, we show that (a) and (d) are equal:

X)2 + X + 2X(xn X)] N X

4. The regression model is

We have two FOCs; one wrt 0 and one wrt 1 :

Note that this is the same expression as (a) in problem 2.

Consider the numerator. Since E[XU ] = E[U ] = 0, we have

E[X(Y E[Y ] + E[Y ] 0 1 X)] =

E[X(Y E[Y ])] 1 E[X(X E[X])] =

where we used the result E[Y ] = 0 + 1 E[X]. Finally, we obtain:

E[X(Y E[Y ])]

E[X(X E[X])]] E[X(X E[X])]]

i. With the command

We get the following result:

Ha: diff > 0

The correlations are the same since

Das könnte Ihnen auch gefallen