Sie sind auf Seite 1von 5

Econometrics 206-3

Exam III: 11.50 AM -1.20 PM, 24 April 2017

In answering these below, paste the Stata output only when it is asked. When
pasting output, use the copy as picture option. When testing a hypothesis, be sure
to mention the distribution of the test statistic, its degrees of freedom, the level of
significance and the associated critical value. DO NOT USE THE STATA test
COMMAND.

It would be easiest if you inserted your answer between the questions below and
returned this document. Rename the document as `your name.docx and upload it
on LMS.

You have to do this exam by yourself. You are allowed to consult the textbook and
your notes. You are NOT allowed to consult anybody whether by speaking, by text
messages or email or any other means. Violations will attract penalties as per
Ashoka policy.

1. (a) Regress log of wages on a constant and the female dummy. Paste output
here.

. regress lwage female

Source SS df MS Number of obs = 1,000


F(1, 998) = 114.01
Model 90.8307098 1 90.8307098 Prob > F = 0.0000
Residual 795.116327 998 .796709746 R-squared = 0.1025
Adj R-squared = 0.1016
Total 885.947037 999 .88683387 Root MSE = .89259

lwage Coef. Std. Err. t P>|t| [95% Conf. Interval]

female -.6905735 .0646761 -10.68 0.000 -.8174902 -.5636568


_cons 4.08833 .0327238 124.93 0.000 4.024115 4.152545

(b) Interpret the coefficient on the female dummy.

The coefficient on the female dummy is -0.69. This measures the percentage
difference in daily wages between males and females, keeping everything else
constant (as it is regressed on log of wages). So basically, a woman earns 69%
less than a man on a daily basis.

(c) Test the null hypothesis that the coefficient on female dummy is -0.5 against
the alternative that the coefficient on female dummy is less than -0.5. Show your
workings. [5+5+10]

The hypothesis would follow a t-distribution.


Here, null hypothesis (H0): 1(hat) = -0.5
Alternative hypothesis (H1): 1(hat) < -0.5
Degrees of freedom: 1000-2 = 998 (so we can follow the standard normal table)
At 5% level of significance,
The critical region would lie beyond -0.38.

From the regression we see that the coefficient on the female dummy is -0.69. As
this lies in the rejection region, therefore, we reject the null.

2. (a) Regress log of wages on a constant, the female dummy, age of the
individual and the square of age. Paste your output here.

. regress lwage female age age2

Source SS df MS Number of obs = 1,000


F(3, 996) = 63.03
Model 141.361825 3 47.1206082 Prob > F = 0.0000
Residual 744.585212 996 .747575514 R-squared = 0.1596
Adj R-squared = 0.1570
Total 885.947037 999 .88683387 Root MSE = .86462

lwage Coef. Std. Err. t P>|t| [95% Conf. Interval]

female -.692959 .0626515 -11.06 0.000 -.815903 -.570015


age .079388 .0117978 6.73 0.000 .0562365 .1025394
age2 -.0008853 .0001545 -5.73 0.000 -.0011885 -.0005822
_cons 2.527584 .2101014 12.03 0.000 2.115292 2.939876

(b) Controlling for age and the square of age does not seem to substantially
change the coefficient of the female dummy. Why is that so? [5+5]

The coefficient of the female dummy is still -0.69, which means that the
coefficient of the female dummy has not changed substantially. This is because
the age parameter is in no way related to the gender of the wage earner. Age
would affect factors like efficiency, health, etc. So, even by regressing both age
and age2, we are not affecting the female dummy.

3. (a) Regress log of wages on a constant, the female dummy, age of the
individual the square of age and the social group dummies for scheduled caste,
for scheduled tribe and for other backward caste. Note the omitted category is
the general castes (or forward castes). Paste your output here.
. regress lwage female age age2 scd std obc

Source SS df MS Number of obs = 1,000


F(6, 993) = 36.14
Model 158.795189 6 26.4658649 Prob > F = 0.0000
Residual 727.151847 993 .732277792 R-squared = 0.1792
Adj R-squared = 0.1743
Total 885.947037 999 .88683387 Root MSE = .85573

lwage Coef. Std. Err. t P>|t| [95% Conf. Interval]

female -.6808852 .0622516 -10.94 0.000 -.803045 -.5587253


age .076767 .0116935 6.56 0.000 .0538201 .0997138
age2 -.0008541 .0001531 -5.58 0.000 -.0011545 -.0005538
scd -.3656677 .076494 -4.78 0.000 -.5157761 -.2155594
std -.1625248 .0896568 -1.81 0.070 -.3384634 .0134138
obc -.2425269 .073789 -3.29 0.001 -.3873272 -.0977267
_cons 2.783519 .2155573 12.91 0.000 2.360519 3.206519

(b) Test the null hypothesis that none of the social group dummmies matter, i.e.,
controlling for sex, age and square of age, the average of log wages is the same
for all categories: scheduled castes, scheduled tribes, other backward castes and
the general (forward) castes. Do NOT use the Stata test command.

The testing will be done through an F distribution.

Df= 993

H0: scd=std=obc=0

H1: scd=std=obc0

(c) Test the null hypothesis that relative to the general (forward) castes,
scheduled castes and other backward castes suffer the same extent of
discrimination. If this requires new regressions, paste the output in your
answer. [5+15+15]

The testing will be done through an F distribution.

Df= 993

We need to regress using an restricted set of variables and then compare with
the unrestricted.
4. (a) Regress log of wages on a constant, the female dummy, age of the
individual the square of age, the social group dummies for scheduled caste, for
scheduled tribe and for other backward caste, and the education dummies for
illiterate, literate, primary, secondary, and higher secondary. Paste the output
here.
. regress lwage female age age2 scd std obc illiterate literate primary secondary higher_secondary

Source SS df MS Number of obs = 1,000


F(11, 988) = 58.25
Model 348.513806 11 31.6830732 Prob > F = 0.0000
Residual 537.433231 988 .54396076 R-squared = 0.3934
Adj R-squared = 0.3866
Total 885.947037 999 .88683387 Root MSE = .73754

lwage Coef. Std. Err. t P>|t| [95% Conf. Interval]

female -.4798672 .056069 -8.56 0.000 -.5898951 -.3698392


age .0575825 .0101922 5.65 0.000 .0375817 .0775832
age2 -.0005525 .0001333 -4.15 0.000 -.0008141 -.000291
scd -.1330371 .0671176 -1.98 0.048 -.2647465 -.0013278
std -.0407981 .0776793 -0.53 0.600 -.1932335 .1116372
obc -.1392578 .0638728 -2.18 0.029 -.2645997 -.013916
illiterate -1.528027 .0950408 -16.08 0.000 -1.714532 -1.341522
literate -1.217191 .1094939 -11.12 0.000 -1.432059 -1.002324
primary -1.139141 .1052712 -10.82 0.000 -1.345722 -.9325605
secondary -.8414607 .0984195 -8.55 0.000 -1.034596 -.6483255
higher_secondary -.3257726 .1343302 -2.43 0.015 -.589378 -.0621673
_cons 3.965373 .2109827 18.79 0.000 3.551347 4.379398

(b) Compare the above regression with the regression in question 3 (without the
education dummies). Does the inclusion of education dummies alter the
discrimination against women, scheduled castes, scheduled tribes and other
backward castes? Why? [5+15]

The inclusion of education dummies do alter the discrimination against women,


scheduled castes, scheduled tribes and other backward castes. The regression
coefficients of each of these variables fall significantly. Although the payment to
women, scheduled castes, scheduled tribes and other backward castes does not
become at par with the general and male, it still reduces the inequality
between them. So for example, the average female, who was earning 69% less
than the average male (everything else constant), now earns 47% lesser.
Similarly, the inequality gap is reduced for the SCs, STs and OBCs.
This can possibly be due to the merit factora more educated person is likely to
have a better developed set of skills which is required in the job market. So,
employers tend to discriminate less against the same kind of people when they
are more educated.

5. (a) To the explanatory variables in the regression in Qn 4(a), add land owned
(LandO) and land possessed (LandP) and re-run the regression. DO NOT paste
the output.

(b) Is either of the land variables individually significant at the 5 or 10% level?

At the 5% level, critical region lies beyond 1.96 and -1.96. Here, the variable
landP has a t-stat of 0.97. Since this value does not lie in the critical region, it is
not significant at the 5% level. At 10% level, critical region lies beyond 1.645 and
-1.645. Again, as the t-stat is 0.97, it still does not lie in the critical region. So, we
reject landP at both levels.
For landO, we have a t-stat of -0.22. This also does not lie in both the critical
regions mentioned above (at 5% and 10% levels).
So, we can conclude that neither variable is individually significant at the 5 or
10% level.

(c) Now drop land owned (LandO) and re-run the regression. Is the included
land variable significant at the 5 or 10% level?

At the 5% level, critical region lies beyond 1.96 and -1.96. Here, the variable
landP has a t-stat of 1.91. Since this value does not lie in the critical region, it is
not significant at the 5% level. At 10% level, critical region lies beyond 1.645 and
-1.645. As the t-stat is 1.91, it lies in the critical region and is therefore
significant.

(d)Explain the pattern of results observed in (b) and (c). [4+4+7]

In (b) we observe that when we regress both land owned and land possessed, we
do not see any one of the two variables significantly impacting the regression.
This is because of multicollinearity, as both the variables are closely interlinked.
But, when landO is removed in part (c), we see that the impact of landP on the
regression equation increases significantly (as it becomes significant at the 10%
level).

Das könnte Ihnen auch gefallen