Beruflich Dokumente
Kultur Dokumente
FACULTY OF ECONOMICS
M.Phil. in Economics
M.Phil. in Economic Research
Subject M300 Econometric Methods
Exercise Sheet 1
1. Consider the following joint probability density
fX;Y (x; y) =
(x + xy + y) =4 if x 2 [0; 1] ; y 2 [0; 2]
0 otherwise
x2
yx
x + xy + y
dx = (1 + y)
+
4
8
4
=
x=0
1 + 3y
8
x + xy + y
y2
xy
dy = (x + 1)
+
4
8
4
=
y=0
1 + 2x
:
2
x + xy + y
fX;Y (x; y)
=
:
fX (x)
2 + 4x
Marginal expectation of Y is
Z 2
y + 3y 2
4
8
5
E (Y ) =
dy =
+ = :
8
16 8
4
0
Marginal expectation of X is
Z 1
x + 2x2
1 2
7
E (X) =
dx = + =
:
2
4
6
12
0
Conditional expectation of Y is
Z 2
2X + 83 (X + 1)
X + Xy + y
7X + 4
E (Y jX) =
y
dy =
=
2
+
4X
2
+
4X
3 + 6X
0
(b) Whats the BLP of Y given X? Graph the results from (a) and (b).
1
Cov (Y; X)
:
V ar (X)
To compute Cov(Y; X) :
E (XY )
4X + 7X 2
=
= E [XE (Y jX)] = E
3 + 6X
2
7
=
+
:7222
6 18
4x + 7x2
dx
6
Therefore,
Cov (Y; X) = E (XY )
E (X) E (Y )
:7222
7 5
12 4
0:0070:
To compute Var(X) :
E X
x2 + 2x3
1 1
5
dx = + =
;
2
6 4
12
so that
2
V ar (X) = E X 2
E (X) =
72
122
5
12
0:0764:
Therefore,
0:0070
0:0764
and
0:0916;
5
+ 0:0916
4
7
12
1:3034:
0:0916 X:
Note that
OLS
^ OLS
(xi
P
^
x) (yi
(xi
OLS
y)
2
x)
x:
d (X; Y )
Cov
Vd
ar (X)
1.34
1.32
1.3
1.28
1.26
1.24
1.22
1.2
0
0.2
0.4
0.6
0.8
These equations are the sample analogs of the equations dening the
coe cients of the BLP. OLS of Y on X (no constant) estimates the
BLP of Y when the intercept is restricted to be zero. Note that the
mean squared error of such a constrained linear predictor is
M SE = E (Y
bX) :
bX) X = 0:
Hence,
b=
E (XY )
E (X 2 )
0:7222
5=12
1:7333:
2. Suppose that Y and X are n 1 vectors of data, and the following conditions hold: (1) Y = X + e; P
(2) rank (X) = 1; P
(3) E (ejX) = 0 and (4)
n
n
V ar (ejX) = 2 In : Let X = n1 i=1 Xi and Y = i=1 Yi :
(a) Consider the estimator ~ = Y =X: Show that ~ is linear and conditionally unbiased. Calculate its conditional variance and compare it
to the conditional variance of the OLS estimator.
Solution: ~ =
Y
X
10 Y
10 X :
Therefore ~ is linear in y
10 X
10 E (Y jX)
=
=
E ~ jX =
10 X
10 X
Hence, ~ is conditionally (and unconditionally) unbiased. For the
conditional variance, we have
2
n
10 V ar (Y jX) 1
=
V ar ~ jX =
2
0 X)2
0
(1
(1 X)
Let ^ OLS =
X0Y
X0X
2
X 0 V ar (Y jX) X
V ar ^ jX =
=
:
2
(X 0 X)
(X 0 X)
Since
X 0X =
n
X
x2i =
i=1
n
X
x)2 + nx2
(xi
nx2 =
i=1
we have
(X 0 X)
(10 X)2
n
(10 X)2
and therefore
V ar ^ jX
V ar ~ jX :
Xm Xm
X 0 E (Ym jX))
X 0 Xm
E ^ m jX = m 0
= m0
=
Xm Xm
Xm Xm
Hence, ^ m is conditionally unbiased.
2
X 0 V ar (Ym jX) Xm
V ar ^ m jX = m
= 0
2
0 X )
Xm Xm
(Xm
m
Since X 0 X =
n
X
i=1
x2i
m
X
0
x2i = Xm
Xm , we have
i=1
(X 0 X)
4
2
0 X
Xm
m
and therefore
V ar ^ jX
V ar ^ m jX
(SSRr SSRu ) =q
;
SSRu = (n k)
5
where q is the number of constraints, n is the number of observations, and k is the number of variables in the unconstrained regression. Let us count the number of constriants. In our data, there are
14 dierent values of education (only 13 dummies included to avoid
multicollinearity). The constrained regression E (lwagejeduc,age) =
2
0 + 1 educ+ 2 age+ 3 (age) "sets" all coe cients on the 13 dummies equal to the same unspecied value, which introduces 12 constraints. Next, there are two dierent values of age (only one dummy
is included to avoid multicollinearity). In the constrained regression,
age is omitted to avoid multicollinearity with age2. All in all, there
is only one coe cient corresponding to the age dummy in the saturated regression, and there is only one coe cient corresponding to
age2 in the restricted regression. Hence, no additional restriction results. Finally, there are 11 included interaction dummies in the saturated regression. No interaction eects is specied in the constrained
model. Eectively, this imposes 11 constraints (the coe cients on
the interaction dummies are set to zero). Hence, the total number of
constraints is q =12+11=23. (We could have computed this directly
by subtracting "model degrees of freedom df" (given in the ANOVA
outcome of the regression command) of constrained regression from
that of the unconstrained one. Further, n = 392; and k = 26: Hence,
(221:16 210:37) =23
210:37= (392 26)
0:82:
The 95% critical value of F (23; 366) is 1.5588. We do not reject the
null.
4. Consider the regression model
yi = x0i + ei ; i = 1; :::; n;
where (1) (yi ; xi ) are i.i.d, (2) E (xi x0i ) is non-degenerate, (3) E (ei jxi ) = 0;
and (4) V ar (ei jxi ) = 2 : Assume that xi does not contain a constant term.
The corresponding uncentered coe cient of determination is dened by
Pn
y^2
R2 = Pni=1 i2
i=1 yi
y^i2 =n =
n
X
i=1
(x0i b )2 =n =
6
n
X
i=1
0
0
( b xi )(x0i b )=n = b (
n
X
i=1
xi x0i =n) b :
Pn
p
ii.
^i2 =n ! 0 E[xi x0i ] .
i=1 y
Solution: By the Law of Large Numbers
n
X
i=1
xi x0i =n ! E[xx0 ]:
p
Since, OLS is consistent, b ! : By Continuous Mapping Theorem,
n
n
X
0 X
p
y^i2 =n = b (
xi x0i =n) b ! 0 E[xi x0i ] :
i=1
Pn
p
2
i=1 yi =n !
i=1
iii.
+ E[xi x0i ] .
Solution: By the Law of Large Numbers
n
X
i=1
R2 !
E[xi x0i ]
:
+ 0 E[xi x0i ]
Solution: This convergence follows from (ii), (iii) and the Continuous Mapping Theorem (actually, it is su cient to use Slutskys
theorem).
5. Consider regression
yig =
1 xg
+ eig ;
(1)
2
e e:
y = (y11 ; y21 ; :::; yn1 ; y21 ; :::; yn2 ; :::; yG1 ; :::; yGn ) ;
X=
1
x1
1
x1
:::
:::
1
x1
1
x2
7
:::
:::
1
x2
:::
:::
1
xG
:::
:::
1
xG
+ 0 E[xx0 ]
=(
0;
0
1)
and
e = (e11 ; e21 ; :::; en1 ; e21 ; :::; en2 ; :::; eG1 ; :::; eGn ) :
0
0
0
2
e diag(1n 1n ; :::; 1n 1n )
n tim es
{z
G tim es
1n 10n ; where
Solution: The elements of E (ee0 jX) outside the n n diagonal blocks are
zero assuing eig1 and ejg2 are uncorrelated for g1 6= g2 : The elements
in the g th n n block are equal to E (eig ejg ) = e 2e = 2e : Hence,
the block looks like follows
1
0 2
2
2
:::
e
e
e
2
2 C
B 2e
:::
2
0
e
e C
g-th block= B
@ ::: ::: ::: ::: A = e 1n 1n :
2
2
2
:::
e
e
e
(c) Show that E (X 0 ee0 XjX) =
2
0
e nX X
Solution:
E (X 0 ee0 XjX) = X 0
2
0
0
e diag(1n 1n ; :::; 1n 1n )X:
X=
:::
:::
{z
G tim es
10n
xG 10n
Hence,
0
E (X ee XjX)
=
=
!
PG
PG
0
0
0
0
g=1 xg 1n 1n 1n 1n
g=1 1n 1n 1n 1n
PG
PG
0
0
0
2 0
g=1 xg 1n 1n 1n 1n
g=1 xg 1n 1n 1n 1n
!
PG
1
2 2
i=1 xi
P
P
n
= 2e nX 0 X
G
G
e
2
x
x
i=1 i
i=1 i
2
e
1) e .
V ar (X 0 X)
(X 0 X)
X 0 e = (X 0 X)
2
0
e nX X
(X 0 X)
E (X 0 ee0 X) (X 0 X)
1
2
0
e n (X X)
Now,
V arhom ^ 1 =
(X 0 X)
2
e
Therefore,
V ar ^ 1 =V arhom ^ 1 = n:
6. Consider a poor elderly population that uses emergency rooms for primary
care. Let yi measure health of the i-th randomly selected person, and let
di be the dummy that equals 1 i the person is admitted to the hospital.
Let the potential outcomes y1i and y0i be dened as
yi =
y1i if di = 1
:
y0i if di = 0
(a) Explain in words what is the meaning of E (y0i jdi = 1) and of E (y1i jdi = 0) :
Solution: E (y0i jdi = 1) is the average of what the health of all those admitted to the hospital would have been, had they been not admitted.
Similarly, E (y1i jdi = 0) is the average of what health of all those not
admitted to the hospital would have been, had they been admitted.
(b) Why it may be the case that E (y0i jdi = 1) 6= E (y0i jdi = 0)?
(s + 1) ;
and
E (yi jsi = s) =
2s
Therefore,
= E (yi jsi = s + 1)
E (yi jsi = s)
= E (ys+1;i jsi = s + 1)
= E (ys+1;i jsi = s + 1)
=
E (ys;i jsi = s + 1)
E (ys;i jsi = s)
E (ys;i jsi = s + 1)
E (ys;i jsi = s)
+ [E (ys;i jsi = s + 1)
Since schooling is randomly assigned (by assumption), it is independent of the potential outcomes. Therefore,
E (ys;i jsi = s + 1)
E (ys;i jsi = s) = 0
and
2
= :
(b) In addition to the assumptions made in (a), assume that E (yi jsi ; wi ) =
1 + 2 si + 3 wi : Show that
2
+ E (ysi jsi = s + 1; wi )
E (ysi jsi = s; wi ) :
Solution: We have
E (yi jsi = s + 1; wi ) =
(s + 1) +
3 wi
and
E (yi jsi = s; wi ) =
2s
3 wi :
= E (yi jsi = s + 1; wi )
E (yi jsi = s; wi )
= E (ys+1;i jsi = s + 1; wi )
E (ys;i jsi = s; wi )
+E (ys;i jsi = s + 1; wi )
E (ys;i jsi = s; wi )
= E (ys+1;i jsi = s + 1; wi )
=
+ E (ys;i jsi = s + 1; wi )
E (ys;i jsi = s + 1; wi )
E (ys;i jsi = s; wi ) :
(c) Since si is independent from ysi ; we have E (ysi jsi ) = E (ysi ) : Explain in words, how is this possible that, despite the latter equality, E (ysi jsi ; wi ) 6= E (ysi jwi ) in general, so that the selection bias
E (ysi jsi = s + 1; wi ) E (ysi jsi = s; wi ) is not equal zero.
Solution: Even though si is independent from ysi ; it is statistically dependent on wi (because the schooling aects the future choice of the
occupation). The occupational dummy, in its turn, may depend on
the potential outcomes. Hence, even though E (ysi jsi ) = E (ysi ) ; we
may have E (ysi jsi ; wi ) 6= E (ysi jwi ) :
10
(d) Based on your analysis in (c), explain which variables can be called
"bad controls" in an experimental setting.
Solution: Those variables that are aected by the treatment, and cannot
be thought as xed at the time of the treatment.
8. In the setting of problem (5), let gi be a variable that equals 1 if the ith randomly chosen person is a woman, and 1 if the person is a man.
Suppose that E (yi jsi ; gi ) = 1 + 2 si + 3 gi :
(a) Argue that, if si is randomly assigned, then E (ysi jsi = s + 1; gi )
E (ysi jsi = s; gi ) = 0; so that there is no selection bias, and 2 = .
OLS
Pn
si
P
P s2i
si
1
1
2 si
+ ei :
P
P yi
si yi
1 + 2 si + 3 gi + ui :
We have
(X 0 X) X 0 Y
0
1 10 P
1
P
P
Pn
P s2i P gi
P yi
A @
A
= @ P si P si
Psi g2 i
P si yi
gi
si gi
gi
gi yi
0
1 10 P
1
P
n
s
0
y
i
i
P
P 2
P
si
si P0 A @ P si yi A
= @
2
0
0
gi
gi yi
!
^
:
=
P 2 OLS
1P
gi
gi yi
2
(c) Suppose that the adjusted R2 from the "long regression" is Rlong
=
0:2 and that from the short regression is Rshort = 0:1: Prove that,
under the assumptions made in (b), the homoskedastic standard error
estimate
pfor the OLS coe cient on si from the "long regression"
equals 8=9 times the homoskedastic standard error estimate for
the OLS coe cient on si from the "short regression". Hence, the
precision of the estimate in the "long regression" is higher than that
in the "short regression".
equals
SSRshort
1
0
(Xshort
Xshort )22 ;
n kshort
0
where (Xshort
Xshort )22 is the element in the second row and second
1
0
column of the matrix (Xshort
Xshort ) : Similarly,
s
1
SSRlong
0
d ^2;OLS =
SE
Xlong
Xlong
:
n klong
22
0
0
(Xshort
Xshort )22 = Xlong
Xlong
Hence,
d ^2;OLS
SE
d ^ 2;OLS
SE
1
22
SSRlong n kshort
:
n klong SSRshort
n
n
SSR
:
k T SS
Therefore,
SSRlong
T SSlong
=
1
n klong
n
and
SSRshort
T SSshort
=
1
n kshort
n
2
Rlong
=
T SSlong
0:8
n
2
Rshort
=
T SSlong
0:9:
n
On the other hand, T SSlong = T SSshort because the dependent variable in both regressions is the same. Hence,
s
r
SSRlong n kshort
8
=
:
n klong SSRshort
9
12
(d) In light of your answers to (a), (b), and (c), explain in words why gi
is a "good control".
Solution: Including gi into the regression does not introduce selection
bias because gi is xed at the time of the treatment assigniment. On
the other hand, including gi into regression, reduces uncertainty, and
leads to lower standard errors of the OLS estimates.
13