Sie sind auf Seite 1von 6

Econometrics 2 Exam

Green = answers found on the internet


Orange = answered by the slides/own interpretation
Yellow = still needs to be answered

I. Theory-related Questions (20 points)

Consider the following variables on the housing market:


 Price: a house’s selling price
 Lotsize: the size of a house’s lot
 Space: the floor space of the house (in m2)
 Bidders: the number of bids the house received the last time it was sold
 Bedroom: the number of bedrooms in a house
 Assess: the assessed value of a house, determined by an appraiser

1. Suppose we are interested in a simple understanding of the marginal effects of various


home characteristics on the selling price, write out the model you would want to estimate.
Briefly explain (i) your model, and (ii) why you included (or didn’t include) which of the
variables! (10 points)

i) Price is the dependent variable, B0 is the intercept and lotsize, space, bidders, bedrooms
are independent variables in the model. We can leave the variable bidders out of the
model, because this variable refers to the last time the house was sold and therefore, it
should have no effect on the current sales price. There is no multicollinearity.
ii) Therefore, we can exclude the variable bidders from the model which leads to
 Price = B0 + B1*lotsize + B2*space + B3*bedrooms + B4*assess

2. Interpret the slope coefficient of the regression model lnYi = 𝜷0 + 𝜷1lnXi + ui (2 points)

- If ln (x1) changes by 1, then ln (y) changes by B1


- If X1 grows by 1, then Y grows by B1
- If ln (x1) grows by 1%, then Yi grows by B1%

3. Suppose the true income-consumption model is cons= 𝜷0 + 𝜷1inc + 𝜷2inc2 + u. Which are the
consequences, if we estimate the model without the quadratic term inc2? (2 points)

Functional form misspecification is happening, which cause the OLS estimators to be biased. It is
also omitted omitted variable.
Furthermore, by getting rid of the inc ^2 you linearize the non-linear relationship between the
variables. Also, without the X1^2 there is less flexibility in the modeling of the relationship
between X1 ‘inc’ and Y ‘cons’ and changes the marginal effect.

1
4. Why do the estimators of any regression model have a sampling distribution? (2 points)

The estimators of any regression model have a sampling distribution because they depend on samples
of the population that we are observing. The sampling distribution of the estimators hat0 , hat1, ..
hat k is necessary to make inference on the unknown parameters  0  1  k.

5. What is the effect of an increasing sample size on the components of a standard OLS-model,
in particular the estimated 𝜷js and the error term? (4 points)

Increasing the sample size decreases the error variance and the estimated BjS will change
 By increasing the sample size, the B’s will be better defined and they will be closer to the
population parameters. Furthermore, the standard error of regression becomes more accurate
estimate of the standard deviation and therefor smaller as we are closer to the whole population.
The bias remains the same.

II. Linear Probability Model (18 points)

Suppose you have estimated the following linear probability model explaining 401(k) eligibility
in terms of income, age, and gender:

e401k = -0.506 + 0.0124inc – 0.000062inc2 + 0.0265age – 0.00031age2 – 0.0035male


(0.081) (0.0006) (0.000005) (0.0039) (0.00005) (0.0121)
2
n = 9275, R = 0.094

…where e401k is a binary variable for eligibility in a 401(k) plan (e401k=1 if eligible for 401 (k),
and 0 otherwise), inc denotes family annual income (in $1000s), age denotes the individual’s
age (in years), and male is a binary variable for gender (male= 1 if male individual, and 0
otherwise). The numbers in parenthesis below each coefficient denote the respective p-values
for each coefficient.

For your info: 401 (k) is an employer sponsored retirement savings plan in the USA
aiming at supplementing (future) pension payments for employees. It is a private
pension provision, with the money being invested into different asset classes (stocks,
bonds, money market investments). However, not every employee is eligible for
participating in the program.

1. Based on the reported regression results, would you say that 401(k) eligibility depends on
income in a statistically significant way? Explain very briefly. (2 points)

2
Yes, 401 (k) eligibility depends on the income. Both terms involving ‘inc’ have significant t statistics.
 p =0.0006 and p= 0.000005  both are <0,05 so significant.

2. What is the estimated marginal effect of income on 401(k) eligibility? Interpret your result.
(4 points)

401(k) = 0,0124 – 2*(0,000062) inc


= 0,0124 – 0,000124inc
If the income changes by 1 unit (1000), then 401(k) changes by 0,0124 – 0,000124inc. If the income
rises further, this effects the scope : diminishes.

3. Holding other factors fixed, for an individual with family income of $ 100,000, if annual
income increases by $1000, what happens to the probability of 401(k) eligibility? (4 points)

401(k) = 0,0124 - 2 *(0,000062) * (100) = 0.  So, for an individual with family income of $100,000, if
annual income increases by $1,000 (inc increases by 1), the estimated 401(k) eligibility does not
change.

4. How would you test for a possible gender discrimination (test which hypothesis, use which
statistical test)? (2 points)

In order to test this, the t test should be applied because there is only 1 coefficient (male) included.
Then the binary dummy variables should be used, which have 2 values: 1 and 0.
H0= B gender = 0  gender does not have any influence on Y
H1= B gender = not 0  gender does have an influence on Y

5. Predict the 401(k) eligibility of a person with the following characteristics: inc = 100; age=
30, and female. Does the predicted probability make sense? Why could the prediction not
make sense? Briefly explain what a prediction does. (6 points)

401(k) = -0,506 + 0,0124 * 100 – 0,000062 *100^2 + 0,0265 * 30 – 0,00031 * 30^2 – 0,0035 * 0
= 0,63  which means that it is in the range (0,1), so it makes sense.
R^2 is 0,094 which is rather low. A low R^2 implies that the included regressors do not explain much
of the variation in Y.
The probability is 63% for the female with the characteristics inc=100, age =30. Therefore, we cannot
assume that this is correct for the whole population. Prediction cannot always predict the future.
They are not verified against reality. This is only a prediction for a particular case.

3
III. Interpreting Estimation Results (22 points)

There are rumors claiming that during nights of either fullmoon or newmoon, there is lots of
traffic in the emergency care units of hospitals. An American TV-station wanted to investigate
those rumors and collected, for one hospital, the number of daily incoming emergency calls
(variable calls) for the time frame between January and August 1998. There is a total of n=229
observations (days), and 8 full moons (variable fullmoon) and 7 newmoons (variable
newmoon) as well as 3 holidays (variable holiday). Additionally, there is separate information
on Fridays and Saturdays (variables fri and sat). The variable trend controls for an eventual
time effect, and captures the 229 days the study was going on.

Below you find the Stata-output for the estimated regression equation:

calls= 𝛽 0 + 𝛽 1*trend + 𝛽 2*holiday + 𝛽 3*fri + 𝛽 4*sat + 𝛽 5*fullmoon + 𝛽 6*newmoon

1. Write down the regression equation using all of the statistically significant coefficients (at
5%-level)! (2 points)

Calls = B0+B1*trend + B2*holiday + B3*fri + B4*sat + B5*fullmoon + B6*newmoon


Significance = <0,05!
The variables fullmoon and newmoon are not significant and therefore will be left out of the model.
 calls= B0+B1*trend+ B2*holiday + B3*fri + B4*sat
 calls= 93,69583 + 0,0337trend + 13,862holiday + 6,909fri + 10,5894sat

4
2. What is the common characteristic, the variables holiday, fri, sat, fullmoon, and newmoon
share? (2 points)

They are all dummy variables. They all take the value 0 or 1, indicating some absence or presence of
some categorical effect that may effect the outcome.

3. Can you judge from the given data whether the regression faces the problem of
multicollinearity? What does multicollinearity mean and how would you check for
multicollinearity? (4 points)

Multicollinearity means that 1 variable is expressed by another variable, and therefore contains
similar information.
Yes, you can judge from the given data whether the regression faces the problem of multicollinearity,
because not all the variables ae significant (fullmoon and newmoon). This means that the variables
are highly correlated and therefore may include similar information. If variables are not significant,
they are more likely to be correlated and therefore more likely to contain the same information.
Check which one variable is correlated and delete that one. Afterwards, the regression should be run
again.
The variance inflation factor (VIF) should be used in this case:
>10  exclude the variable
<10  no problem, variable does not need to be exluded

4. Which criteria are you looking at, in order to be able to judge the statistical quality of the
regression? Explain briefly! (6 points)

The goodness of fit is used to check the quality of the regression model. R^2 tells you how good the
model and the factors are. Adding more independent variables always increases R^2.
 Criteria: R2 and the adjusted R-square.
Furthermore, check if the model is significant, by using the F-test. If it is on 5%, this is a sign for good
quality. Additionally, if the root use is close to 0 this is good.

5. Should the hospital management allocate more physicians for days with fullmoon or
newmoon? Why? (2 points)

No, because the variables fullmoon and newmoon are not significant, therefore these variables do not
have a big impact on the calls.

6. How can you test the hypothesis of fullmoon and newmoon being of any influence on the
number of daily emergency calls? Write down the null- and alternative hypotheses and the
required test. (4 points)

5
H0: B5=0 B6=0  no influence
H1: B5= not 0 B6= not 0  influence
 The T-test should be used.

7. Looking at the R-squared (or Adjusted R-squared) of the regression output, what can you
conclude about explanatory power of the chosen regression model? (2 points)

Adjusted R^2 will be used because it leaves out the rubbish variables, which is in this case 0,1512.
Knowing if the R-squared or the adjusted r-squared has an explanatory power on the model, depends
on the context. However, in this model we assume that the adjusted R-squared with 15,12% is not
very high explanatory power.

Das könnte Ihnen auch gefallen