Sie sind auf Seite 1von 6

BU5025: Quantitative Methods

Assignment 6
Winter 2014

Assume that you are interested in estimating the determinants of prices for detached houses in
the US. You have 85 observations on housing prices and data on the number of bedrooms
(all houses have at least 2 bedrooms), the lot size of the house in square meters (that is, the
total amount of land) and whether the house is a ‘Colonial’ style house. The variables are
defined as:

Variable Name Definition


LOG(PRICE) Logged house price in thousands of dollars
TWOORTHREEBEDROOMS Dummy variable=1 if house has two or three
bedrooms, =0 otherwise
FOURBEDROOMS Dummy variable=1 if house has four
bedrooms, =0 otherwise
FIVEORMOREBEDROOMS Dummy variable=1 if house has five or more
bedrooms, =0 otherwise
COLONIAL Dummy variable=1 if house is a ‘colonial’
style house, =0 otherwise
LOTSIZESQMETERS Lot size in square meters
C Constant term

Q1. One thing that we did not talk about in class was interacting variables. This is a way to
allow the effect of the coefficient to vary across different realisations of the independent
variable. To give an example, say you were interested to see if the impact of an increase in
the log of the lot size is different between colonial and noncolonial houses. This would result
in interacting the ‘COLONIAL’ and ‘LOG(LOTSIZESQUMETERS)’ variables. Running
this regression in Eviews gives:

Dependent Variable: LOG(PRICE)


Method: Least Squares
Included observations: 85

Variable Coefficient Std. Error t-Statistic Prob.

C 5.290974 0.443651 11.92598 0.0000


FOURBEDROOMS 0.002963 0.050117 0.059119 0.9530
FIVEORMOREBEDROOMS 0.363717 0.090423 4.022408 0.0001
COLONIAL -1.593772 0.585228 -2.723335 0.0080
LOG(LOTSIZESQMETERS) 0.030912 0.068634 0.450395 0.6537
COLONIAL*LOG(LOTSIZESQM
ETERS) 0.262647 0.090366 2.906488 0.0047

R-squared 0.452880 Mean dependent var 5.608765


Adjusted R-squared 0.418252 S.D. dependent var 0.275546
S.E. of regression 0.210165 F-statistic 13.07848
Sum squared resid 3.489392 Prob(F-statistic) 0.000000

1
a) How do you interpret these coefficients now? (Hint – think carefully. It may help to
remember that this is ‘just’ a function and the coefficients estimate the effect of a
change in the given independent variable.)

The estimated model in Q1 is:

log ( )= + + + + log ( )+
∗ log ( )+

The only thing affected by the inclusion of the interaction term ∗ log ( ) is the
slope of log ( ) . To see why, let’s see what log ( ) would be when colonial = 1 and
= 0.

When colonial = 1, we have


log( )= + +( + ) ∗ log ( ) + ⋯.

When colonial = 0, we have


log( )= + ∗ log ( ) + ⋯.

where = 5.29, = −1.59, = 0.03 and = 0.26.

What does this mean? Refer to the diagram presented during the tutorial. The intercept for a
colonial house is below that for a non-colonial house, but the slope of log ( ) is greater for
a colonial house.

This means that a colonial style house costs less than a non-colonial house at low levels of lot
size, but the gap narrows as lost size increases. At some point, a colonial house costs more
than a non-colonial house, given the same lot size.

Now you are worried that the errors may be heteroscedastic and conducted the tests for
heteroscedasticity. Results from the White test are reported below.

 White test

Heteroskedasticity Test: White

F-statistic 3.890933 Prob. F(2,82) 0.0243

Test Equation:
Dependent Variable: RESID^2
Method: Least Squares
Included observations: 85

2
Variable Coefficient Std. Error t-Statistic Prob.

C 14.69301 5.774212 2.544591 0.0128


LOGPRICEF -5.086348 1.984600 -2.562908 0.0122
LOGPRICEF2 0.440622 0.170200 2.588846 0.0114

b) Comment on the validity of the assumption that the errors in the house price
regression are homoscedastic based on the White test. Clearly state the significance
level used for the hypothesis testing and the null hypothesis you are testing for.

The White test

To test for the violation of the homoscedasticity assumption, the White test examines whether
is related to the fitted value and its square:

= + + +

The null hypothesis of homoscedasticity is

: = =0

Let the significance level be 5%. Conduct the F test to test the null hypothesis. The test
statistic has F distribution with degrees of freedom (2, 85-2-1). The 5% critical value
(obtained from the F-table) is roughly equal to 3.11.

Since the test statistic (provided in the result table highlighted by blue) 3.89 is greater than
the critical value 3.11, we reject the null of homoscedasticity based on the White test.

c) How do you address the issue of heteroscedasticity in case you find its evidence?

Use the (White’s) robust standard error, which is a square root of the following:

∑( − ̅)
[∑( − ̅) ]

Q2. What are the consequences for OLS estimation if the researcher has incorrectly specified
the econometric model either by a) including variables which should be excluded or b)
including variables that are closely correlated?

In case of a), the OLS estimator is still unbiased but the variance (and thus the standard
errors) of the OLS estimator is larger. This implies t-ratio of the OLS estimator β in the false
β
model, , will be too small. Thus some coefficients might be found insignificant even
(β)
when they are truly significant.

3
In case of b), the OLS estimator is still unbiased and consistent (as the high correlation of
variables is not a violation of any CLRM assumptions) but the standard errors of the OLS
estimator are larger. The same consequences follow as in a).

Additional comments: What if a variable that should be included is omitted?


Answer: the OLS estimator is biased and inconsistent unless an omitted variable, x, is
unrelated to all the variables included in a regression.

Q3. In a study relating the average final exam mark to time spent in various activities, a
researcher distributed a survey to several students. The students are asked how many hours
they spend each week in four activities: studying, sleeping, working, and leisure. Any activity
is put into one of the four categories, so that for each student, the sum of hours in the four
activities must be 168.

a) In the model

inal_mark = β + β study + β sleep + β work + β leisure + u

does it make sense to hold sleep, work and leisure fixed, while changing study?

No. By definition, the sum of hours in the four activities is 168. Thus if we change study,
we must change at least one of the other categories.

b) Explain why this model violates Assumption 3 of the CLRM assumptions (i.e., no
perfect collinearity assumption).

From part a), we can write, for example, study as a function of the other explanatory
variables: study = 168 – sleep – work – leisure. This holds for every explanatory variable.
Thus, Assumption 3 is violated.

c) How could you reformulate the model so that its parameters have a useful interpretation
and it satisfies Assumption 3 of the CLRM assumptions?

Simply drop one of the independent variables, say, leisure:

inal_mark = β + β study + β sleep + β work + u

Now for example β corresponds to a change in the final mark when study increases by
one hour, holding all else constant. If we are holding sleep and work fixed but increasing
study by one hour, we must be reducing leisure by one hour.

Q4. Using data on smoking, the following linear probability model was estimated:

smoke = .656 − .069 log(price) + .012 log(income) − .029educ


(.855) (.204) (.026) (.006)
[.856] [.208] [.025] [.007]

4
+.020age − 0.00026age − .101smoke_ban
(.006) (.00007) (.052)
[.005] [.00006] [.045]

where n = 802, R^2 = .059. The usual OLS standard errors and heteroscedasticity robust
standard errors are presented in parentheses and brackets, respectively. Variable names
indicate the following:

smoke: a dummy variable =1 if a person smokes, =0 otherwise;


price: per pack price of cigarettes (in cents);
income: annual income;
educ: years of schooling;
age: age measured in years;
smoke_ban: a dummy variable =1 if the person resides in a state with smoking
restriction in restaurants.

a) Interpret the coefficient on smoke_ban.

A person living in a state with restaurant smoking restrictions has a 0.101 lower probability
of smoking relative to a person living in a state without the smoking restrictions, on average,
holding all else constant.

b) Test whether living in a state with smoking restriction affects the probability of smoking.
Does it matter which standard errors are used?

Yes, it does. t-statistic is -1.94 using the OLS standard errors, while t-statistic is -2.24 using
the robust standard errors. The null hypothesis is rejected using the robust standard errors,
while it is not rejected using the OLS standard errors at the 5% significance level.

c) Does the significance level at which the null hypothesis can be rejected depend on the
standard errors used?

No. Researchers choose significance levels.

d) Holding other factors fixed, if education increases by two years, what happens to the
estimated probability of smoking?

The probability of smoking falls by about 0.058 on average.

e) At what point does another year of age reduce the probability of smoking?

Note that age appears in the regression equation as a quadratic function. In order to compute
the age at which the probability of smoking is maximised, (i) you take a partial derivative of
smoke with respect to age and (ii) set that equals zero:

∂smoke
= .020 − 2 × .00026age = 0
∂age

5
Solving this for age yields:

. 020
age = = 38.46
. 00052

Thus, at about 38.5 years old, another year of age reduce the probability of smoking. See the
following diagram regarding the relationship between smoke and age:

smoke

38.46 years
age

Das könnte Ihnen auch gefallen