Sie sind auf Seite 1von 8

ECON 601 Module 5 Problem Set

Fall 2019

Your solutions should be typed and well organized. You need to explain / show all of the steps you used to
arrive at your answer. Submit your work through Blackboard as a Word or pdf file.

1. Indicate whether the following statements are true or false, along with a brief explanation.

a. OLS can be used to study curvilinear relationships between X and Y.

b. A polynomial of order 3 can also be referred to as a quadratic specification.

c. Demeaning an explanatory variable that is specified as a polynomial can help reduce the
problem of heteroskedasticity.

d. The reciprocal of X is possibly a suitable transformation for a regression model if the


variable X contains negative values and positive values.

e. A “linear-log model” refers to a model where the natural log of Y is regressed on X.

f. All other factors the same, a lower value for the standard error of the regression is
preferred to a higher value.

g. The estimated slope coefficient in a log-log model can be interpreted as an elasticity.

Solution:

(a) True: OLS methodology requires a model be linear in the parameters (i.e., the betas), and
not the independent variables (i.e, the X’s).
(b) False: a polynomial of order 2 can also be referred to as a quadratic specification.
(c) False: when multicollinearity from a polynomial is problematic, a possible solution is to
demean the explanatory variable.
(d) False: the transformation involving the reciprocal of X can only be used if the observations
for the X variable are positive OR negative. This transformation should not be used if
observations for the X variable contain positive and negative values, or zero values.
(e) False: this would be called a “log-linear model”. In contrast, a “linear-log model” is when Y
is regressed on the natural log of X.
(f) True: the standard error of the regression (also called, Root MSE) represents the average
deviation between the actual value of Y and the predicted value of Y (i.e., the residual).
(g) True: the estimated slope coefficient from a log-log-model can be interpreted as an
elasticity (i.e., if X increases by 1% then Y changes by the percent shown by the estimated
beta).
2. Open the file “2018 Movie Gross.dta” on blackboard. This contains data on movies released in
2018 that I obtained from www.boxofficemojo.com. The two variables in this file are:
Y = TG (total domestic gross revenue, $ mil.)
X = OG (opening weekend gross revenue, $ mil.)
Opening weekend gross revenue for a movie tends to be related with how much total gross
revenue that movie generates. Thus, the movie industry closely watches the opening weekend
gross revenues to help make decisions about advertising, distribution, etc.

a. Create a scatterplot showing the relationship between TG and OG. Paste this graph into
your solutions. Briefly describe what you observe in this scatterplot.

b. Estimate each of the following models and fill in the table values:

Model Coefficient of Standard error of Is OG statistically


determination the regression significant at the 1%
(R2) (RMSE) level? (Yes or No)

Model 1 (TG vs OG)

Model 2 (TG vs log of OG)

Model 3 (log of TG vs OG)

Model 4 (log of TG vs log of OG)

c. Provide an interpretation for the marginal effect of OG for each model. Your answers
must specify the correct unit of measurement for the predicted change in TG (i.e., either
% or millions of dollars):
i. Model 1: if OG increases by $1 million, then TG is predicted to increase by ___.
ii. Model 2: if OG increases by 1%, then TG is predicted to increase by ___.
iii. Model 3: if OG increases by $1 million, then TG is predicted to increase by ___.
iv. Model 4: if OG increases by 1%, then TG is predicted to increase by ___.

d. For each model, predict the total gross revenue for a movie that has an opening gross of
$10 million:
i. Model 1: the predicted TG is ___.
ii. Model 2: the predicted TG is ___.
iii. Model 3: the predicted TG is ___.
iv. Model 4: the predicted TG is ___.

e. Which is your best model? Justify your choice.


Solution:

(a) A scatterplot of TG and OG: scatter TG OG


A positive relationship exists, although there are a handful of movies which have opening
and total gross amounts far above most other movies. This suggests a log-log transformation
is likely candidate.

(b) The table values are shown below.

Model Coefficient of Standard error of Is OG statistically


determination the regression significant at the 1%
(R2) (RMSE) level? (Yes or No)

Model 1 (TG vs OG) 0.9356 24.799 Yes

Model 2 (TG vs log of OG) 0.3895 76.373 Yes

Model 3 (log of TG vs OG) 0.3572 1.6605 Yes

Model 4 (log of TG vs log of OG) 0.7541 1.027 Yes

(c) The marginal effects are:


̂i = 5.05 + 3.00 × OGi
i. Model 1: TG
If OG increases by $1 million, then TG is predicted to increase by about $3 million.
̂i = 16.46 + 29.71 × log(OGi )
ii. Model 2: TG
If OG increases by 1%, then TG is predicted to increase by about $0.297 million.
̂ i = 1.84 + 0.0393 × OGi
iii. Model 3: logTG
If OG increases by $1 million, then TG is predicted to increase by about 3.93%.
̂ i = 1.41 + 0.88 × log(OGi )
iv. Model 4: logTG
If OG increases by 1%, then TG is predicted to increase by about 0.88%.
(d) The predicted total gross revenue for a movie that has an opening gross of $10 million:
̂i = 5.05 + 3.00 × 10 = $𝟑𝟓. 𝟏 𝐦𝐢𝐥𝐥𝐢𝐨𝐧
i. Model 1: TG
̂i = 16.46 + 29.71 × log(10) = $𝟖𝟒. 𝟗 𝐦𝐢𝐥𝐥𝐢𝐨𝐧
ii. Model 2: TG
̂ i = 1.84 + 0.0393 × 10 = 2.232 → e2.232 = $𝟗. 𝟑 𝐦𝐢𝐥𝐥𝐢𝐨𝐧
iii. Model 3: logTG
̂ i = 1.41 + 0.88 × log(10) = 3.424 → e3.424 = $𝟑𝟎. 𝟕 𝐦𝐢𝐥𝐥𝐢𝐨𝐧
iv. Model 4: logTG

(e) My chosen model is Model 4 (log of TG vs log of OG). The appearance of the scatterplot of
TG and OG strongly suggested this model was preferable. The regression output from model
4 can only be compared with model 3 (log of TG vs OG) because they share the same
dependent variable. Between models 3 and 4, it is model 4 that has the highest coefficient
of determination and the lowest RMSE. Lastly, I created a graph of the predictions for each
model below. Model 4 is clearly preferable to models 2 and 3.
3. Suppose you have been hired by a human resources department to build a model of employee
wages. The model specifies wages as a function of an employee’s age (Age) and his/her years of
education (Edu). The estimated model is:

̂ 𝑖 = 11.4 + 0.25𝐴𝑔𝑒𝑖 − 0.003𝐴𝑔𝑒𝑖2 + 2.3𝐸𝑑𝑢𝑖


𝑊𝑎𝑔𝑒
a. Holding Edu fixed at a value of 10, calculate the marginal effect of Age on Wage for a
person whose age changes from 23 to 24. Repeat this calculation for a person whose
ages changes from 54 to 55. Round all values to the nearest penny.
b. At what Age is an employee’s Wage predicted to be greatest? Round the value to the
nearest whole number. (Note: the equation you should use is shown in slide #10 of the
chapter 5 lecture PowerPoint.)
c. What type of relationship is there between employee Age and Wage according to the
estimated regression model? Your answer should be either (i) increasing at an increasing
rate; (ii) increasing at a decreasing rate; (iii) decreasing at an increasing rate; or (iv)
decreasing at a decreasing rate. Briefly explain how you arrived at your answer.

Solution:

(f) Marginal effects:


̂ 𝑖 = 11.4 + 0.25 × 23 − 0.003 × 232 + 2.3 × 10 = $38.56
Age 23 ➔ 𝑊𝑎𝑔𝑒
̂ 𝑖 = 11.4 + 0.25 × 24 − 0.003 × 242 + 2.3 × 10 = $38.67
Age 24 ➔ 𝑊𝑎𝑔𝑒
Change in 𝑊𝑎𝑔𝑒 = $0.11

̂ 𝑖 = 11.4 + 0.25 × 54 − 0.003 × 542 + 2.3 × 10 = $39.15


Age 54 ➔ 𝑊𝑎𝑔𝑒
̂ 𝑖 = 11.4 + 0.25 × 55 − 0.003 × 552 + 2.3 × 10 = $39.08
Age 55 ➔ 𝑊𝑎𝑔𝑒
Change in 𝑊𝑎𝑔𝑒 = -$0.08

(g) The Age that corresponds to the maximum value of Wage can be obtained using this
equation:
𝑏1 0.25
𝐴𝑔𝑒 = = = 41.67 ≈ 42
−2𝑏2 −2 × (−0.003)
(h) The positive estimate for the coefficient on Age and the negative estimate for the
coefficient on Age2 imply the predicted Wage will increase at a decreasing rate as an
employee gets older, ceteris paribus (i.e., holding Edu constant).
How do I know this? If you are comfortable with quadratic equations, then it may be
obvious that the positive linear term (0.25Age) dominates the negative squared term (-
0.003Age2) for smaller values of Age—thus wages are predicted to increase in this range.
But as age increases, there is a point where the negative squared term will overpower the
positive linear term, after which wages are predicted to decrease. Taking all of this
information together, it implies wages are predicted to increase at a decreasing rate.
The equation can be graphed for different values for Age (I used Excel below), and it is
readily apparent that Wage is increasing at a decreasing rate with Age.
Predicted wage
$40.00

$39.50

$39.00

$38.50

$38.00

$37.50

$37.00
22 27 32 37 42 47 52 57 62 67
Employee Age

4. Open the DERBY5 dataset that comes with the textbook. This data pertains to the famous
horseraces at the Kentucky Derby in Louisville, Kentucky each year. The dataset contains the
amount of money bet each year (in millions of dollars) on horseraces at the Kentucky Derby in
Louisville, Kentucky from 1927 to 1992. Once you open this dta file, format the time series data
via the following command: tsset date, yearly

a. Build an extrapolative model for the amount bet using linear or nonlinear trends. Justify
your choice of model. In addition, provide the following: (i) the regression output from
Stata of your chosen model, and (ii) a time series line graph showing the actual amount
bet and the predicted amount.

b. Use your model to forecast the amount bet in 1993 and 1994.

Solution:

(a) First, I look at a time series plot of bets to get a better idea of the trend. It is clear that this
does not follow a linear time trend. Instead, a quadratic or an exponential trend are likely
candidates. I proceed to estimate a linear time trend (for illustrative purposes), a quadratic
trend, and an exponential trend. The regression output for each model looks decent.
However, I cannot compare the regression output between the quadratic trend and the
exponential trend since the dependent variable is not the same. Thus, I rely on a visual
comparison of the two trends.
The quadratic and exponential trends both reasonably track the actual betting. Given the
visual similarity, it is best to pick the simplest model and/or the model that is easiest for an
audience to understand. In this regard, I am inclined to pick the quadratic trend.
̂𝒕 = −𝟏𝟗𝟎. 𝟔 + 𝟎. 𝟎𝟗𝟖 × 𝒅𝒂𝒕𝒆𝒕
Linear time trend: 𝒃𝒆𝒕𝒔
Source SS df MS Number of obs = 66
F(1, 64) = 368.52
Model 232.359409 1 232.359409 Prob > F = 0.0000
Residual 40.3536993 64 .630526552 R-squared = 0.8520
Adj R-squared = 0.8497
Total 272.713108 65 4.19558628 Root MSE = .79406

bets Coef. Std. Err. t P>|t| [95% Conf. Interval]

date .0984929 .0051307 19.20 0.000 .0882431 .1087426


_cons -190.574 10.05407 -18.95 0.000 -210.6593 -170.4887

̂𝒕 = 𝟖𝟐𝟔𝟎. 𝟔 − 𝟖. 𝟓𝟐𝟖 × 𝒅𝒂𝒕𝒆𝒕 + 𝟎. 𝟎𝟎𝟐𝟐 × 𝒅𝒂𝒕𝒆𝟐𝒕


Quadratic time trend: 𝒃𝒆𝒕𝒔
Source SS df MS Number of obs = 66
F(2, 63) = 1254.44
Model 266.032792 2 133.016396 Prob > F = 0.0000
Residual 6.68031634 63 .106036767 R-squared = 0.9755
Adj R-squared = 0.9747
Total 272.713108 65 4.19558628 Root MSE = .32563

bets Coef. Std. Err. t P>|t| [95% Conf. Interval]

date -8.528201 .4840981 -17.62 0.000 -9.495594 -7.560808


date_sq .0022012 .0001235 17.82 0.000 .0019544 .0024481
_cons 8260.631 474.2637 17.42 0.000 7312.89 9208.371

̂ 𝒕 = −𝟗𝟐. 𝟏𝟏𝟒 + 𝟎. 𝟎𝟒𝟕𝟑 × 𝒅𝒂𝒕𝒆𝒕


Exponential time trend: 𝒍𝒐𝒈(𝒃𝒆𝒕𝒔)
Source SS df MS Number of obs = 66
F(1, 64) = 964.16
Model 53.5034085 1 53.5034085 Prob > F = 0.0000
Residual 3.55150481 64 .055492263 R-squared = 0.9378
Adj R-squared = 0.9368
Total 57.0549133 65 .877767898 Root MSE = .23557

lbets Coef. Std. Err. t P>|t| [95% Conf. Interval]

date .0472623 .0015221 31.05 0.000 .0442216 .0503031


_cons -92.11472 2.982679 -30.88 0.000 -98.07331 -86.15614
(b) I use the quadratic model to forecast the amount bet 1993 and 1994:
̂𝑡 = 8260.6 − 8.528 × 𝑑𝑎𝑡𝑒𝑡 + 0.0022 × 𝑑𝑎𝑡𝑒𝑡2
𝑏𝑒𝑡𝑠
̂𝑡 = 8260.6 − 8.528 × 1993 + 0.0022 × 19932 = $𝟕. 𝟑𝟗𝟖 𝒎𝒊𝒍𝒍𝒊𝒐𝒏
𝑏𝑒𝑡𝑠
̂𝑡 = 8260.6 − 8.528 × 1994 + 0.0022 × 19942 = $𝟕. 𝟔𝟒𝟕 𝒎𝒊𝒍𝒍𝒊𝒐𝒏
𝑏𝑒𝑡𝑠

Alternatively, the exponential model can be used:


̂ 𝑡 = −92.114 + 0.0473 × 𝑑𝑎𝑡𝑒𝑡
𝑙𝑜𝑔(𝑏𝑒𝑡𝑠)
̂ 𝑡 = −92.114 + 0.0473 × 1993 = 2.079 → 𝑒 2.079 = $𝟕. 𝟗𝟗𝟖 𝒎𝒊𝒍𝒍𝒊𝒐𝒏
𝑙𝑜𝑔(𝑏𝑒𝑡𝑠)
̂ 𝑡 = −92.114 + 0.0473 × 1994 = 2.126 → 𝑒 2.126 = $𝟖. 𝟑𝟖𝟓 𝒎𝒊𝒍𝒍𝒊𝒐𝒏
𝑙𝑜𝑔(𝑏𝑒𝑡𝑠)

Das könnte Ihnen auch gefallen