Sie sind auf Seite 1von 5

EC3303 Econometrics I Homework 2

(Note: Please type your answers and hand in only 1 copy per group. Write all group members’
names, matric no.s and tutorial groups in your submission. Deposit your submission in Kelvin
Seah’s Mailbox (mailbox 59) at the Economics Dept, Level 6 of AS2, before April 19 2018, 2359hrs)

1.
(a) You are interested to know the relationship between earnings and gender. In a bid to control
for omitted variable bias, you add experience as a control variable. You estimate two
specifications initially:

= 323.70 + 5.15 × Experience – 169.78 × Female, 𝑅̅ 2=0.13, SER=274.75


(21.18) (0.55) (13.06)

= 5.44 + 0.015 × Experience – 0.421 × Female, 𝑅̅ 2=0.17, SER=0.75


(0.08) (0.002) (0.036)

where Earn are weekly earnings in dollars, Experience is work experience measured in years,
and Female is a binary variable, which takes on the value of one if the individual is a female
and is zero otherwise.

i) For a given experience, how much less do females earn on average? Answer this
using each regression.

ii) Should you choose the second specification on grounds of the higher regression 𝑅̅ 2?
Why or why not?
(3 Marks)

(b) Now, consider just the estimated specification:


= 5.44 + 0.015 × Experience – 0.421 × Female, 𝑅̅ 2=0.17, SER=0.75
(0.08) (0.002) (0.036)

How much are earnings estimated to change by when Experience increases by 2 years, from
20 to 22?
(1 Mark)

(c) Your peer points out to you that experience-earning profiles typically take on an inverted U-
shape. To test this idea, you add the square of experience to your log-linear regression.

= 3.04 + 0.147 Experience – 0.421Female – 0.0016 Experience , 𝑅̅ 2 = 0.28, SER = 0.68


2

(0.18) (0.009) (0.033) (0.0001)

Are there reasons to believe that this specification is superior to the specification
described in (b)? Why?
(1 Mark)

(d) What other variables would you consider including in the regression if you had a choice?
Name any two. Why?
(2 Marks)

1
2. You postulate a linear relationship between two variables, X and Y of the form:

𝑌𝑖 = 𝛽0 + 𝛽1 𝑋𝑖 + 𝑢𝑖

where 𝑢𝑖 denotes all other factors apart from 𝑋𝑖 affecting the value of Y for observation i.

1 𝑉𝑎𝑟[(𝑋𝑖 −𝜇𝑋 )𝑢𝑖 ]


Suppose LSA #1 is satisfied so that 𝐸(𝑢𝑖 |𝑋𝑖 ) = 0 and 𝑉𝑎𝑟(𝛽̂1 ) = × [𝑉𝑎𝑟(𝑋 )]2
.
𝑛 𝑖
Show that, if 𝑉𝑎𝑟(𝑢𝑖 |𝑋𝑖 = 𝑥) = 𝜎𝑢2 , where 𝜎𝑢2 is a constant, then the expression for
𝜎2
𝑉𝑎𝑟(𝛽̂1 ) simplifies to 𝑛𝜎𝑢2 . Show all steps clearly.
𝑋

(4 Marks)

3. Suppose a researcher collects data on houses that have been sold in a particular
neighbourhood over the past year, and obtains the regression results in the table shown
below:

Note: variable definitions are given below the table.

(a) Using the results in column (1), what is the expected change in price from building a
1,500- square foot addition to a house?
(1 Mark)

2
(b) Using the results in column (2), what is the effect on price of a change in the size of a
house by 7%?

(1 Mark)

(c) Using column (2), what is the estimated effect of view on price?

(1 Mark)

(d) (i) Find the effect of adding a view on the price of a house with a pool, as well as on the
price of a house without a pool.
(ii) Is the interaction term between pool and view statistically significant in column (5)?
What can you conclude based on this?

(3 Marks)

4. Using data collected from 420 school districts in California, you ran a regression of district test
scores on a number of regressors in STATA, where testscr denotes “district test score”, str
denotes student to teacher ratio in the district, el_pct denotes percentage of English learners in
the district, meal_pct denotes percentage of students in the district who are on a free meal
programme, avginc denotes average income in the district (in thousands of US$), avginc2
denotes the square of average income in the district, and avginc3 denotes the cube of average
income in the district. Your estimated regression is:

. regress testscr str el_pct meal_pct avginc avginc2 avginc3, robust

Linear regression Number of obs = 420


F( 6, 413) = 402.02
Prob > F = 0.0000
R-squared = 0.8090
Root MSE = 8.3869

Robust
testscr Coef. Std. Err. t P>|t| [95% Conf. Interval]

str -.5076202 .2543474 -2.00 0.047 -1.007597 -.0076433


el_pct -.2047791 .0336964 -6.08 0.000 -.271017 -.1385412
meal_pct -.4098048 .0334496 -12.25 0.000 -.4755575 -.3440522
avginc -1.063349 .5679633 -1.87 0.062 -2.179808 .0531105
avginc2 .0732655 .0218204 3.36 0.001 .0303727 .1161584
avginc3 -.0008873 .00025 -3.55 0.000 -.0013787 -.0003959
_cons 687.0091 7.244621 94.83 0.000 672.7682 701.2501

You then performed a test using the command “test avginc2 avginc3” and obtained the
following result:
. test avginc2 avginc3

( 1) avginc2 = 0
( 2) avginc3 = 0

F( 2, 413) = 6.69
Prob > F = 0.0014

Answer the following questions with reference to the STATA output given above:

3
(a) What would the estimated coefficient on the variables el_pct and meal_pct be if both
these variable were measured in fractions instead of percentages (i.e. values of these
variables would range between 0 and 1 instead of between 0 and 100)? Round your
answer to 3 significant figures.
(2 Marks)

(b) A friend views your results and claims that the model you have specified (where you
include avginc2 and avginc3 ) is more appropriate compared to a model with only
avginc included (that is, a model where avginc2 and avginc3 are excluded). Do you agree
with him? Explain.
(1 Mark)

(c) Eliminating the variable meal_pct from your regression, the estimated regression
becomes

. regress testscr str el_pct avginc avginc2 avginc3, robust

Linear regression Number of obs = 420


F( 5, 414) = 301.59
Prob > F = 0.0000
R-squared = 0.7244
Root MSE = 10.063

Robust
testscr Coef. Std. Err. t P>|t| [95% Conf. Interval]

str -.2257894 .295839 -0.76 0.446 -.8073233 .3557444


el_pct -.4645906 .0308103 -15.08 0.000 -.5251548 -.4040265
avginc 1.59157 .6400257 2.49 0.013 .3334649 2.849675
avginc2 .0235483 .0255119 0.92 0.357 -.0266008 .0736973
avginc3 -.0006129 .0002945 -2.08 0.038 -.0011919 -.000034
_cons 638.9685 6.572051 97.23 0.000 626.0497 651.8872

Does 𝛽̂𝑠𝑡𝑟 suffer from a positive, a negative, or no bias, when meal_pct is omitted from the
regression? How does this suggest str and the error term u are correlated in the regression
where meal_pct is omitted? Explain.
(2 Marks)

5. The data file CPS12 contains data for full-time, full-year workers, ages 25-35, with a high
school diploma (i.e. secondary qualification) or a Bachelor’s degree as their highest
qualification, in the United States. A detailed description of the dataset is given in
CPS12_Description. Note that the dataset is in excel format.

(a) The dataset is in excel format. Convert it to STATA format (.dta) before using it to answer the
questions below.

(b) (i) Run a regression of average hourly earnings (AHE) on age (Age), gender (Female), and
education (Bachelor). Attach the STATA output in your answer.
(ii) If Age increases from 25 to 26, how are earnings expected to change?
(iii) If Age increases from 33 to 35, how are earnings expected to change?

(3 Marks)

4
(c) Run a regression of the logarithm of average hourly earnings, Ln(AHE), on Age, Female, and
Bachelor. Attach the STATA output in your answer.

(i) If Age increases from 25 to 26, how are earnings expected to change?
(ii) If Age increases from 33 to 35, how are earnings expected to change?

(2 Marks)

(d) Run a regression of the logarithm of average hourly earnings, Ln(AHE) on Ln(Age), Female,
and Bachelor. Attach the STATA output in your answer.

(i) If Age increases from 25 to 26, how are earnings expected to change?
(ii) If Age increases from 33 to 35, how are earnings expected to change?

(2 Marks)

(e) Run a regression of the logarithm of average hourly earnings, Ln(AHE), on Age, 𝐴𝑔𝑒 2 ,
Female, and Bachelor. Attach your STATA output in your answer.

(i) If Age increases from 25 to 26, how are earnings expected to change?
(ii) If Age increases from 33 to 35, how are earnings expected to change?

(2 Marks)

Das könnte Ihnen auch gefallen