Sie sind auf Seite 1von 13

University of Cape Town

School of Economics

Eco4016F
Honours Econometrics

Midterm Examination
April 2012

Time: 3 hours

Marks: 100

Instructions:
The examination consists of 4 questions.
Answer all 4 questions.
Full marks are only awarded for questions where all working is shown.
Provide enough mathematical detail (assumptions, calculations etc.) so that the
logical progression of your answer is clear. If you get stuck on the algebra, part
marks will be awarded if you can demonstrate that you know how to approach the
problem.
Only non-programmable scientific calculators are allowed.
Total number of pages (including cover page): 13

1. Consider the simple regression model, y = 0 + 1 x + u, where we have m measures


on x . Write these as zh = x + eh for all h = 1, . . . , m. Make the following
assumptions
Cov(x , u) = 0 (i.e., x would be exogenous if it could be observed)
Cov(x , eh ) = 0 (i.e., the CEV assumption holds)
the errors are pairwise uncorrelated.
Var(e1 ) = Var(e2 ) =, . . . , = Var(em ) = e2 .
Let w = (z1 + . . . + zm )/m be the average of the measures on x , so that for each
observation i, wi = (zi1 + . . . + zim )/m is the average of the m measures. Let 1 be
the OLS estimator from the simple regression yi on 1, wi , for i = 1, . . . , n using a
random sample of data.
(a) Show that
plim(1 ) = 1


[x2

x2
+ (e2 /m)]

Hint: the plim of 1 is Cov(w, y)/ Var(w).


(b) With reference to your answer to question 1a, explain why using the average of
all m measures of x is better than using any single measure zh .
(20)
2. Consider a simple regression model of the return to schooling
ln(wage) = 0 + 1 education + u.
(a) Suppose that you know the birth dates of the individuals in your sample. Suppose you also knew that children have to stay in school till the age of 16,
and cannot begin school till the age of 7. Explain how you might construct a
plausible binary instrumental variable with this information.
(b) Now consider the following regression output, where y = log(wage), x =
education, and z is your binary instrument.
. su y x z
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------y |
3010
6.261832
.4437976
4.60517
7.784889
x |
3010
13.26346
2.676913
1
18
z |
3010
.6820598
.4657535
0
1
. ivregress 2sls y (x = z)

Instrumental variables (2SLS) regression

Number of obs
Wald chi2(1)
Prob > chi2
R-squared
Root MSE

=
=
=
=
=

3010
51.20
0.0000
.
.55667

-----------------------------------------------------------------------------y |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------x |
.1880626
.0262826
7.16
0.000
.1365497
.2395756
_cons |
3.767472
.3487458
10.80
0.000
3.083942
4.451001
-----------------------------------------------------------------------------Instrumented: x
Instruments:
z
. bysort z: su y x
------------------------------------------------------------------------> z = 0
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------y |
957
6.155494
.4328417
4.60517
7.474772
x |
957
12.69801
2.791523
1
18
------------------------------------------------------------------------> z = 1
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------y |
2053
6.311401
.4402214
4.60517
7.784889
x |
2053
13.52703
2.580455
2
18

(i) Verify that the IV estimate 1 = 0.1880626 can be written as


y1 y0
1 =
,
x
1 x
0
where y0 and x
0 are the sample averages of y and x over the part of the
sample with z = 0, and y1 and x
1 are the sample averages of y and x over
the part of the sample with z = 1.
(ii) Now prove this result mathematically; i.e., show that
Pn
(zi z)(yi y)
y1 y0

1 = Pni=1
=
x
1 x
0
)(xi x
)
i=1 (zi z
Hint: you might find the following results useful:


Pn
P
P
= 0;
z = ni=1 zi /n;
n
z = ni=1 zi
i=1 yi y
n = n0 + n1

y =

n0
n

y0 +

n1
n

y1
(30)

3. You and Ms. Analyst work for the Treasury of the Government of South Africa.
Ms. Analyst has been tasked by her Boss, Mr. Bigshot, to analyze data from a
skills training experiment targeted to a sample of young people. Ms. Analyst has
invited you (an intern) to shadow her during the project. Participants were randomly
assigned to a treatment group and a control group. The control group did not get
any training. Those assigned to the treatment group did receive training, and could
enter the programme from 1 January 2010. Some people in the treatment group only
entered in mid-2011. The programme ended on 31 December 2011. Ms. Analyst has
been asked to investigate whether participation in the experiment had any effect on
the participants unemployment probability in 2012. The variables she has in her
dataset are as follows:
storage display
value
variable name
type
format
label
variable label
---------------------------------------------------------------------------------------------train
byte
%9.0g
=1 if assigned to treatment group
age
byte
%9.0g
age in 2011
educ
byte
%9.0g
years of education
black
byte
%9.0g
=1 if black
coloured
byte
%9.0g
= 1 if Coloured
married
byte
%9.0g
=1 if married
nodegree
byte
%9.0g
no tertiary qualification
mosinex
byte
%9.0g
months prior to 1/2012 in experiment
unem08
byte
%9.0g
=1 if unemployed all of 2008
unem09
byte
%9.0g
=1 if unemployed all of 2009
unem12
byte
%9.0g
=1 if unemployed all of 2012
lwage08
float %9.0g
Log of real wage in 2008; zero if wage is 0
lwage09
float %9.0g
Log of real wage in 2009; zero if wage is 0
lwage12
float %9.0g
Log of real wage in 2012; zero if wage is 0
agesq
int
%9.0g
age^2
mostrn
byte
%9.0g
months in training
---------------------------------------------------------------------------------------------

The descriptive statistics for these variables are as follows:


. su
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------train |
445
.4157303
.4934022
0
1
age |
445
25.37079
7.100282
17
55
educ |
445
10.19551
1.792119
3
16
black |
445
.8337079
.3727617
0
1
coloured |
445
.0876404
.2830895
0
1
-------------+-------------------------------------------------------married |
445
.1685393
.3747658
0
1
nodegree |
445
.7820225
.4133367
0
1
mosinex |
445
18.1236
5.311937
5
24
unem08 |
445
.7325843
.4431092
0
1
unem09 |
445
.6494382
.4776829
0
1
-------------+-------------------------------------------------------unem12 |
445
.3078652
.46213
0
1
lwage08 |
445
.4198245
.8862537
-.809299
3.678089
lwage09 |
445
.2771078
.7967834 -2.599059
3.224548
lwage12 |
445
1.135802
1.136259 -3.106541
4.099463
agesq |
445
693.9775
429.7818
289
3025
-------------+-------------------------------------------------------mostrn |
445
7.68764
9.656205
0
24

For this question, note that when G specializes to the logistic distribution, we have
G(x) = (x) = 1/(1 + ex ) = ex /(1 + ex )
The associated density function for the logistic CDF is
g(x) 0 (x) =

ex
(1 + ex )2

(a) Ms. Analyst starts by asking some basic questions: how many young people
participated in the job training programme? Is there any reason to suspect,
just by looking at the descriptive statistics, that the programme had an effect
on unemployment? Help her answer these questions.
(b) She then runs the following below. What do you think she hopes to find out
by running such a regression? Do you think that after looking at the results,
her hopes would be dashed? Why/why not?
. global x unem08 unem09 age educ black coloured married
. logit train $x
Iteration
Iteration
Iteration
Iteration

0:
1:
2:
3:

log
log
log
log

likelihood
likelihood
likelihood
likelihood

=
-302.1
= -297.04498
= -297.03096
= -297.03096

Logistic regression

Number of obs
LR chi2(7)
Prob > chi2
Pseudo R2

Log likelihood = -297.03096

=
=
=
=

445
10.14
0.1809
0.0168

-----------------------------------------------------------------------------train |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------unem08 |
.0818541
.3193627
0.26
0.798
-.5440852
.7077934
unem09 | -.3939164
.2966978
-1.33
0.184
-.9754334
.1876006
age |
.0134344
.0141347
0.95
0.342
-.0142691
.0411379
educ |
.0514515
.0560396
0.92
0.359
-.0583842
.1612872
black | -.3318542
.359652
-0.92
0.356
-1.036759
.3730508
coloured |
-.868989
.5029685
-1.73
0.084
-1.854789
.1168112
married |
.1536043
.2656568
0.58
0.563
-.3670735
.6742822
_cons | -.6917486
.788755
-0.88
0.380
-2.23768
.8541828
------------------------------------------------------------------------------

(c) She then proceeds to the main business at hand: investigating the effects of the
programme on the probability of unemployment. She runs the two regressions
shown in the abbreviated STATA output below. On the basis of these results,
she claims that the training program reduces the probability of being unemployed in 2012 to approximately 0.24 and this holds whether one estimates a
linear probability model or a Probit model. Is she correct? Why/why not?
(Hint: start by transforming the logit coefficients so that they are comparable
to probit coefficients.)
5

. reg unem12 train


Source |
SS
df
MS
-------------+-----------------------------Model | 1.32226401
1 1.32226401
Residual | 93.5002079
443 .211061417
-------------+-----------------------------Total | 94.8224719
444 .213564126

Number of obs
F( 1,
443)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

445
6.26
0.0127
0.0139
0.0117
.45941

-----------------------------------------------------------------------------unem12 |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------train | -.1106029
.0441888
-2.50
0.013
-.1974486
-.0237572
_cons |
.3538462
.0284917
12.42
0.000
.2978505
.4098418
-----------------------------------------------------------------------------. logit unem12 train
Logistic regression

Log likelihood =

Number of obs
LR chi2(1)
Prob > chi2
Pseudo R2

-271.5828

=
=
=
=

445
6.30
0.0120
0.0115

-----------------------------------------------------------------------------unem12 |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------train | -.5328045
.2149117
-2.48
0.013
-.9540237
-.1115854
_cons | -.6021754
.1296994
-4.64
0.000
-.8563816
-.3479692
------------------------------------------------------------------------------

(d) She then runs the following Logit model where she controls for other factors.
Verify that the marginal effect of educ = .0003291 that STATA computes is
indeed correct.
. logit unem12 train $x
Logistic regression

Number of obs
=
445
LR chi2(8)
=
22.63
Prob > chi2
=
0.0039
Log likelihood = -263.42168
Pseudo R2
=
0.0412
-----------------------------------------------------------------------------unem12 |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------train | -.5531597
.2198136
-2.52
0.012
-.9839865
-.122333
unem08 |
.1958456
.3522676
0.56
0.578
-.4945862
.8862775
unem09 |
.0826864
.3245389
0.25
0.799
-.5533982
.718771
age |
.0003619
.0151608
0.02
0.981
-.0293527
.0300766
educ | -.0015829
.0605203
-0.03
0.979
-.1202005
.1170346
black |
1.102427
.5009618
2.20
0.028
.1205603
2.084294
coloured | -.2436418
.6937825
-0.35
0.725
-1.60343
1.116147
married | -.1358157
.29638
-0.46
0.647
-.7167097
.4450783
_cons | -1.707333
.9105037
-1.88
0.061
-3.491887
.0772213
------------------------------------------------------------------------------

. predict xbhat3, index


. su xbhat3
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------xbhat3 |
445
-.8722221
.5529723 -2.650983 -.3112156
. mfx
Marginal effects after logit
y = Pr(unem12) (predict)
= .29479214
-----------------------------------------------------------------------------variable |
dy/dx
Std. Err.
z
P>|z| [
95% C.I.
]
X
---------+-------------------------------------------------------------------train*|
-.112445
.04331
-2.60
0.009 -.197337 -.027552
.41573
unem08*|
.0399288
.07035
0.57
0.570 -.097947 .177805
.732584
unem09*|
.017101
.06676
0.26
0.798 -.113751 .147953
.649438
age |
.0000752
.00315
0.02
0.981 -.006102 .006253
25.3708
educ | -.0003291
.01258
-0.03
0.979 -.024988
.02433
10.1955
black*|
.191368
.06889
2.78
0.005
.056337 .326399
.833708
coloured*| -.0484808
.13163
-0.37
0.713 -.306464 .209503
.08764
married*| -.0277014
.05925
-0.47
0.640 -.143828 .088425
.168539
-----------------------------------------------------------------------------(*) dy/dx is for discrete change of dummy variable from 0 to 1

(e) How should Ms. Analyst interpret the coefficient on unem08?


(f) On the basis of this evidence, Ms. Analyst believes that skills training could
be the magic bullet in combating unemployment in South Africa, and that the
take-home message of the experiment is to offer the training to all young people
in South Africa (i.e., to scale up the training programme). Mr. Bigshot however, is not convinced because he believes that the programme has a stronger
benefit to people who dont have a long history of being unemployed. Is he correct? Does it follow that the programme should not be scaled up? Why/why
not?
(20)
4. Consider the following labour supply model
whrs = 0 + 1 kl6 + 2 k618 + 3 nwif einc + 4 wa + 5 wa2 + 6 we + 7 we2
+8 (wa we) + 9 lww2 + u
The STATA output given below shows the variable definitions, as well as the regression output where the given labour supply model has been estimated using the
Tobit approach. Study the output and then answer the questions that follow.

--------------------------------------------------------------------------------------storage display
variable name
type
format
variable label
--------------------------------------------------------------------------------------lfp
float %9.0g
A dummy variable = 1 if woman worked in 2011, else 0
whrs
float %9.0g
Number of hours the woman worked in 2011
kl6
float %9.0g
Number of children less than 6 years old in household
k618
float %9.0g
Number of children between ages 6 and 18 in household
wa
float %9.0g
Womans age
we
float %9.0g
Womans educational attainment, in years
lww
float %9.0g
Log of womans hourly earnings (defined only for lfp = 1)
lww2
float %9.0g
Log of womans hourly earnings (imputed when lfp = 0)
ax
float %9.0g
Actual years of womans previous labor market experience
prin
float %9.0g
Womans Property Income in rands
nwifeinc
float %9.0g
Prin/1000
we2
float %9.0g
Square of Education
wa2
float %9.0g
Square of Age
wawe
float %9.0g
Age times Education
----------------------------------------------------------------------------------------. tobit whrs $W $C $I

lww2, ll(0) robust

Tobit regression

Number of obs
=
753
F(
6,
747) =
20.78
Prob > F
=
0.0000
Log pseudolikelihood = -3891.0413
Pseudo R2
=
0.0161
-----------------------------------------------------------------------------|
Robust
whrs |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------wa | -36.49461
7.613938
-4.79
0.000
-51.44187
-21.54735
we |
104.5203
26.88408
3.89
0.000
51.74297
157.2977
kl6 | -1044.926
133.85
-7.81
0.000
-1307.693
-782.1587
k618 | -100.1299
41.95978
-2.39
0.017
-182.503
-17.75679
nwifeinc | -22.20082
5.068873
-4.38
0.000
-32.15175
-12.24988
lww2 |
202.4707
113.6298
1.78
0.075
-20.60111
425.5425
_cons |
1172.027
474.3755
2.47
0.014
240.7594
2103.295
-------------+---------------------------------------------------------------/sigma |
1258.636
48.07551
1164.257
1353.015
-----------------------------------------------------------------------------Obs. summary:
325 left-censored observations at whrs<=0
428
uncensored observations
0 right-censored observations
. predict xb1, xb
. summarize xb1
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------xb1 |
753
295.009
621.3692 -3268.431
2008.768
. mfx compute, predict(ystar(0,.))
Marginal effects after tobit
y = E(whrs*|whrs>0) (predict, ystar(0,.))
= 663.35746
-----------------------------------------------------------------------------variable |
dy/dx
Std. Err.
z
P>|z| [
95% C.I.
]
X
---------+-------------------------------------------------------------------wa | -21.62883
4.47256
-4.84
0.000 -30.3949 -12.8628
42.5378
we |
61.94481
15.632
3.96
0.000
31.3063 92.5833
12.2869
kl6 | -619.2837
78.297
-7.91
0.000 -772.744 -465.824
.237716
k618 |
-59.3428
24.951
-2.38
0.017 -108.245 -10.4402
1.35325
nwifeinc | -13.15749
2.96863
-4.43
0.000 -18.9759 -7.33908
20.129
lww2 |
119.9959
67.414
1.78
0.075 -12.1337 252.125
1.09613
------------------------------------------------------------------------------

(a) Verify that marginal effect


E(whrs|x)
= 119.9959
lww2
that STATA computes is indeed correct. Is this partial effect of any economic
importance? Why/why not?
(b) Study carefully the marginal effects for the conditional mean function
E(whrs|whrs > 0, x) given in the STATA output below. Then answer the
questions that follow. .
. mfx compute, predict(e(0,.))
Marginal effects after tobit
y = E(whrs|whrs>0) (predict, e(0,.))
=
1119.292
-----------------------------------------------------------------------------variable |
dy/dx
Std. Err.
z
P>|z| [
95% C.I.
]
X
---------+-------------------------------------------------------------------wa | -15.24023
3.15081
-4.84
0.000 -21.4157 -9.06477
42.5378
we |
43.64793
11.029
3.96
0.000
22.0316 65.2643
12.2869
kl6 | -436.3634
55.076
-7.92
0.000
-544.31 -328.417
.237716
k618 | -41.81448
17.57
-2.38
0.017 -76.2506 -7.37833
1.35325
nwifeinc | -9.271113
2.0924
-4.43
0.000 -13.3721 -5.17009
20.129
lww2 |
84.55224
47.489
1.78
0.075 -8.52464 177.629
1.09613
------------------------------------------------------------------------------

(i) Letting z x/, show that




E(whrs|whrs > 0, x)
(z)
(z)2
= 202.4707 1 z

lww
(z) (z)2
In answering this question, the following results might come in handy:
1
(z)
2
(z) = ez /2 ; E(whrs|whrs > 0, x) = x +
(z)
2

= z(z) ;
= (z)
z
z
(ii) Verify that the marginal effect
E(whrs|whrs > 0, x)
= 84.55224
lww2
that STATA computes is indeed correct.
(iii) Do the signs on lww and nwif einc make economic sense? Explain.
(iv) With reference to your answer to questions 4a and 4(b)ii, what do you
make of the fact that the marginal effect for the sample of working women is
smaller than for the combined sample of working and non-working women?

(c) Calculate the elasticity of labour supply with respect to the hourly wage rate,
for women that choose whrs > 0. Hint: the log wage elasticity formula when
dealing with the conditional mean function E(whrs|whrs > 0, x) is
E(whrs|whrs > 0, x)
lww2

lww2
E(whrs|whrs > 0, x)
Note also that the header to the marginal effects reported by STATA in question
4b tells you that E(whrs|whrs > 0, x) = 1119.292
(d) Calculate the elasticity of labour supply with respect to the wifes property
income variable nwif einc.
(e) In what follows, wherever the word income is used, take it to mean income as
defined in this data set; i.e., property income. You may interpret property
income as any income earned from the sale of property that would carry a tax
obligation known as a capital gains tax.
A policy maker is contemplating dropping the capital gains tax rate. He asks
his research office to estimate the income elasticity of labour supply. The staff
at the research office have the same dataset you have been working with in this
question. They produce the following STATA output.
. su lww2 nwifeinc whrs if lfp == 1
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------lww2 |
428
1.190173
.7231978 -2.054164
3.218876
nwifeinc |
428
18.93748
10.59135 -.0290575
91
whrs |
428
1302.93
776.2744
12
4950
. reg whrs $W $C $I lww2, robust
Linear regression

Number of obs
F( 6,
746)
Prob > F
R-squared
Root MSE

=
=
=
=
=

753
19.77
0.0000
0.1226
819.43

-----------------------------------------------------------------------------|
Robust
whrs |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------wa |
-20.3053
4.687209
-4.33
0.000
-29.50699
-11.10361
we |
43.42601
17.05139
2.55
0.011
9.951603
76.90043
kl6 | -497.8102
62.25698
-8.00
0.000
-620.0299
-375.5905
k618 | -79.02626
23.68683
-3.34
0.001
-125.527
-32.52548
nwifeinc | -10.47326
2.383235
-4.39
0.000
-15.15191
-5.79462
lww2 |
112.8796
90.96162
1.24
0.215
-65.69165
291.4508
_cons |
1383.116
293.9374
4.71
0.000
806.0734
1960.159
------------------------------------------------------------------------------

(i) On the basis of these results, the staff at the research office argue to the
policy maker that at higher levels of work hours, labour supply is much
less income elastic. Are they correct?
10

(ii) The policy maker, accepting the finding of question 4(e)i, remarks that
this implies that a 1% increase in income is much more likely to lead to
a negative employment effect than a negative earnings effect (i.e., the
policy shift is more likely to induce a worker to switch from positive to 0
work hours than it is likely to induce a worker to work less hours). What
is the underlying economic rationale behind this remark?
(iii) Based on the findings of questions 4(e)i and 4(e)ii above, the policy maker
proposes to the Treasury that the proposed capital gains tax cut should
not be implemented because the resulting disincentive to enter into employment would exceed the resulting disincentive to work longer hours.
Assess this claim in the context of the Tobit model. Hint: there are two
issues to consider here: (i) is the negative income elasticity large enough
to warrent concern when the Tobit model is used, and is it necessarily the
case that the employment effect claimed in question 4(e)ii follows.
(30)

11

12

13

Das könnte Ihnen auch gefallen