Beruflich Dokumente
Kultur Dokumente
Fall 2019
Your solutions should be typed and well organized. You need to explain / show all of the steps you used to
arrive at your answer. Submit your work through Blackboard as a Word or pdf file.
1. Indicate whether the following statements are true or false, along with a brief explanation.
c. Demeaning an explanatory variable that is specified as a polynomial can help reduce the
problem of heteroskedasticity.
f. All other factors the same, a lower value for the standard error of the regression is
preferred to a higher value.
Solution:
(a) True: OLS methodology requires a model be linear in the parameters (i.e., the betas), and
not the independent variables (i.e, the X’s).
(b) False: a polynomial of order 2 can also be referred to as a quadratic specification.
(c) False: when multicollinearity from a polynomial is problematic, a possible solution is to
demean the explanatory variable.
(d) False: the transformation involving the reciprocal of X can only be used if the observations
for the X variable are positive OR negative. This transformation should not be used if
observations for the X variable contain positive and negative values, or zero values.
(e) False: this would be called a “log-linear model”. In contrast, a “linear-log model” is when Y
is regressed on the natural log of X.
(f) True: the standard error of the regression (also called, Root MSE) represents the average
deviation between the actual value of Y and the predicted value of Y (i.e., the residual).
(g) True: the estimated slope coefficient from a log-log-model can be interpreted as an
elasticity (i.e., if X increases by 1% then Y changes by the percent shown by the estimated
beta).
2. Open the file “2018 Movie Gross.dta” on blackboard. This contains data on movies released in
2018 that I obtained from www.boxofficemojo.com. The two variables in this file are:
Y = TG (total domestic gross revenue, $ mil.)
X = OG (opening weekend gross revenue, $ mil.)
Opening weekend gross revenue for a movie tends to be related with how much total gross
revenue that movie generates. Thus, the movie industry closely watches the opening weekend
gross revenues to help make decisions about advertising, distribution, etc.
a. Create a scatterplot showing the relationship between TG and OG. Paste this graph into
your solutions. Briefly describe what you observe in this scatterplot.
b. Estimate each of the following models and fill in the table values:
c. Provide an interpretation for the marginal effect of OG for each model. Your answers
must specify the correct unit of measurement for the predicted change in TG (i.e., either
% or millions of dollars):
i. Model 1: if OG increases by $1 million, then TG is predicted to increase by ___.
ii. Model 2: if OG increases by 1%, then TG is predicted to increase by ___.
iii. Model 3: if OG increases by $1 million, then TG is predicted to increase by ___.
iv. Model 4: if OG increases by 1%, then TG is predicted to increase by ___.
d. For each model, predict the total gross revenue for a movie that has an opening gross of
$10 million:
i. Model 1: the predicted TG is ___.
ii. Model 2: the predicted TG is ___.
iii. Model 3: the predicted TG is ___.
iv. Model 4: the predicted TG is ___.
(e) My chosen model is Model 4 (log of TG vs log of OG). The appearance of the scatterplot of
TG and OG strongly suggested this model was preferable. The regression output from model
4 can only be compared with model 3 (log of TG vs OG) because they share the same
dependent variable. Between models 3 and 4, it is model 4 that has the highest coefficient
of determination and the lowest RMSE. Lastly, I created a graph of the predictions for each
model below. Model 4 is clearly preferable to models 2 and 3.
3. Suppose you have been hired by a human resources department to build a model of employee
wages. The model specifies wages as a function of an employee’s age (Age) and his/her years of
education (Edu). The estimated model is:
Solution:
(g) The Age that corresponds to the maximum value of Wage can be obtained using this
equation:
𝑏1 0.25
𝐴𝑔𝑒 = = = 41.67 ≈ 42
−2𝑏2 −2 × (−0.003)
(h) The positive estimate for the coefficient on Age and the negative estimate for the
coefficient on Age2 imply the predicted Wage will increase at a decreasing rate as an
employee gets older, ceteris paribus (i.e., holding Edu constant).
How do I know this? If you are comfortable with quadratic equations, then it may be
obvious that the positive linear term (0.25Age) dominates the negative squared term (-
0.003Age2) for smaller values of Age—thus wages are predicted to increase in this range.
But as age increases, there is a point where the negative squared term will overpower the
positive linear term, after which wages are predicted to decrease. Taking all of this
information together, it implies wages are predicted to increase at a decreasing rate.
The equation can be graphed for different values for Age (I used Excel below), and it is
readily apparent that Wage is increasing at a decreasing rate with Age.
Predicted wage
$40.00
$39.50
$39.00
$38.50
$38.00
$37.50
$37.00
22 27 32 37 42 47 52 57 62 67
Employee Age
4. Open the DERBY5 dataset that comes with the textbook. This data pertains to the famous
horseraces at the Kentucky Derby in Louisville, Kentucky each year. The dataset contains the
amount of money bet each year (in millions of dollars) on horseraces at the Kentucky Derby in
Louisville, Kentucky from 1927 to 1992. Once you open this dta file, format the time series data
via the following command: tsset date, yearly
a. Build an extrapolative model for the amount bet using linear or nonlinear trends. Justify
your choice of model. In addition, provide the following: (i) the regression output from
Stata of your chosen model, and (ii) a time series line graph showing the actual amount
bet and the predicted amount.
b. Use your model to forecast the amount bet in 1993 and 1994.
Solution:
(a) First, I look at a time series plot of bets to get a better idea of the trend. It is clear that this
does not follow a linear time trend. Instead, a quadratic or an exponential trend are likely
candidates. I proceed to estimate a linear time trend (for illustrative purposes), a quadratic
trend, and an exponential trend. The regression output for each model looks decent.
However, I cannot compare the regression output between the quadratic trend and the
exponential trend since the dependent variable is not the same. Thus, I rely on a visual
comparison of the two trends.
The quadratic and exponential trends both reasonably track the actual betting. Given the
visual similarity, it is best to pick the simplest model and/or the model that is easiest for an
audience to understand. In this regard, I am inclined to pick the quadratic trend.
̂𝒕 = −𝟏𝟗𝟎. 𝟔 + 𝟎. 𝟎𝟗𝟖 × 𝒅𝒂𝒕𝒆𝒕
Linear time trend: 𝒃𝒆𝒕𝒔
Source SS df MS Number of obs = 66
F(1, 64) = 368.52
Model 232.359409 1 232.359409 Prob > F = 0.0000
Residual 40.3536993 64 .630526552 R-squared = 0.8520
Adj R-squared = 0.8497
Total 272.713108 65 4.19558628 Root MSE = .79406