Lecture7 80

University of Hong Kong
Introductory Econometrics (ECON0701), Fall 2010

17 September 2010
Multiple Regression Analysis: Estimation

• Remember from last time, the four basic assumptions of the multiple regression
model.
• The model in the population must be linear in parameters:
y   0  1  x1   2  x2  ...   k  xk  u

• The data set
 x i1 , xi 2 ,..., xik , yi  : i  1, 2,..., n
• must be a random sample from the general population.

• The error term must have zero conditional mean:
E u | x1 , x2 ,..., xk   0

• There cannot be any perfect collinearity among the independent variables.
• When these are all true, the estimates of 0, 1, …, k are unbiased.
• We also discussed the implications of omitting an important variable from the multiple
regression analysis.
• An “important variable” is one that (a) correlated with the other x variables in the
model and (b) related to the y variable, after the other x variables are taken into
account.
• We will now explore what happens when we mis-specify the regression model.
• The misspecification is leaving out a variable that should be there.
• The result will be that the estimator for the coefficient on the remaining variable will
be biased.

• For example, suppose that the true relationship is
y   0  1  x1   2  x2  u
•
• But instead of including x2 as a variable in our regression model, as we should, we
instead estimate
y  0  1  x1  u

• What happens to the estimator of 1?
n n
x i1  x1  yi x i1  x1   0  1  xi1   2  xi 2  ui 

1  i 1
n
 i 1
n
x i1  x1  x i1  x1 
2 2
i 1 i 1
n n n n
 0   xi1  x1   1  xi1  xi1  x1    2  xi 2  xi1  x1    ui  xi1  x1 
1  i 1 i 1
n
i 1 i 1
x  x1 
2
i1
i 1
• Taking the expectation, this simplifies to
ˆ  x1 , x2 
Cov
E  1   1   2
ˆ  x1 
Var
• This means that if you omit x2, and x2 is correlated with x1, your estimate of 1 will be
biased.

Bias in estimate of Corr(x1,x2) > 0 Corr(x1,x2) < 0
1 from omitting x2
2 > 0 positive bias negative bias
2 < 0 negative bias positive bias

• For example, it is probably the case that one’s wages depend both on one’s education
and one’s ability:
wages   0  1 education  1 ability  u

• But often we do not have data on someone’s “ability.”

• Instead, we might estimate the model
wages   0  1 education  u
• using the data we have available.
• Probably, since ability and education are positively correlated, and ability as a positive
effect on wages, the estimate of 1 will be positively biased.
• As an example, we will look at the relationship between a child’s educational
attainment, his or her IQ, and his or her mother’s educational attainment.
education   0  1  Meducation   2  IQ  u
• Question: what happens to the estimate of b1 if the second variable (IQ) is omitted
from the equation?

• Fact 1: the correlation between IQ and educational attainment is probably positive.
Therefore 2 is probably positive.
• Fact 2: The correlation between mother’s education and child’s IQ is also probably
positive.
• Therefore, omitting child’s IQ means that we expect the estimate of 1 to be positively
biased.

. d
Contains data from wage2.dta

obs: 935
vars: 17 14 Apr 1999 13:41
size: 24,310 (97.2% of memory free)
---------------------------------------------------------------------------
storage display value
variable name type format label variable label
---------------------------------------------------------------------------
…
IQ int %9.0g IQ score
…
educ byte %9.0g years of education
…
meduc byte %9.0g mother's education
…
---------------------------------------------------------------------------
. summarize IQ, detail
IQ score
-------------------------------------------------------------
Percentiles Smallest
1% 64 50
5% 74 54
10% 82 55 Obs 935
25% 92 59 Sum of Wgt. 935
50% 102 Mean 101.2824

Largest Std. Dev. 15.05264
75% 112 134
90% 120 134 Variance 226.5819
95% 125 137 Skewness -.3404246
99% 132 145 Kurtosis 2.977035

IQ Range Description
36-50 Moderately Retarded
51-70 Mildly Retarded
70-90 Slow Learner
90-110 Average
110-120 Superior
120-140 Very Superior
140-180 Gifted

• Now let’s look at the estimated multiple regression model:
. regress educ meduc IQ
Source | SS df MS Number of obs = 857

-------------+------------------------------ F( 2, 854) = 191.07
Model | 1277.79546 2 638.897729 Prob > F = 0.0000
Residual | 2855.60011 854 3.34379404 R-squared = 0.3091
-------------+------------------------------ Adj R-squared = 0.3075
Total | 4133.39557 856 4.82873314 Root MSE = 1.8286
------------------------------------------------------------------------------
educ | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
meduc | .1669298 .0232489 7.18 0.000 .121298 .2125615
IQ | .0651911 .0044139 14.77 0.000 .0565278 .0738544
_cons | 5.155378 .4398148 11.72 0.000 4.292133 6.018622
------------------------------------------------------------------------------
• Note that b2 (the coefficient on IQ) is positive, as we hypothesized would be true for
Fact 1.
• Next step is to see if mother’s education is positively correlated with children’s IQ (for
Fact 2).
. correlate IQ meduc
(obs=857)
| IQ meduc
-------------+------------------
IQ | 1.0000
meduc | 0.3318 1.0000

. regress educ meduc

-------------+------------------------------ F( 1, 855) = 130.78
Model | 548.378173 1 548.378173 Prob > F = 0.0000
-------------+------------------------------ Adj R-squared = 0.1317
Total | 4133.39557 856 4.82873314 Root MSE = 2.0477
------------------------------------------------------------------------------
educ | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
meduc | .2808636 .0245594 11.44 0.000 .2326598 .3290674
_cons | 10.57491 .271523 38.95 0.000 10.04198 11.10783
------------------------------------------------------------------------------
• Note that without IQ, the estimate of the coefficent on mother’s education is upwardly
biased – 0.28 instead of 0.16.

• Now we will add a fifth assumption to the four assumptions we have discussed so far
(linear in parameters, random sampling, zero conditional mean, no multicollinearity).
• When assuming conditional homoskedasticity, the variance of the estimators of 0, 1,
…, k can be simply calculated.
• Remember the linear regression model:
y   0  1  x1   2  x2  ...   k  xk  u
• The conditional homoskedasticity assumption states that

Var  u | x1 , x2 ,..., xk    2

• When these five assumptions (the four basic assumptions plus conditional
homoskedasticity) are true, the variance of the estimator of any parameter j is
2
 
Var ˆ j 
 n 2
  ij  x  x j    1 Rj
2
 
 i 1 

• As in the simple regression case, 2 cannot be measured directly. It must be
estimated:
n
1
ˆ 
2

n  k  1 i 1
uˆi2
• The standard error of j is defined as
ˆ 2
 
Se ˆ j 
 n 2
   xij  x j    1  R j
 
2
 i 1 
• The Rj2 term refers to the R squared value of a regression of xj on the other x variables.
• You can see why multicollinearity is a problem; when xj is perfectly correlated with
the other x variables, this R squared is one and dividing by (1- Rj2) means dividing by
zero.

• The second point is that adding variables to the model will always increase the
variance of the estimator.
• This is because this causes the Rj2 to increase, increasing the variance.

• As an example we will explore the determinants of extra-marital affairs.
• The research questions are:
– Are older people more likely to have extra-marital affairs?
– Are people who have been married longer more likely to have extra-marital affairs?

. d
Contains data from D:\Econometrics\Statafiles\affairs.dta

obs: 601
vars: 19 22 May 2002 11:49
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
…
age float %9.0g in years
yrsmarr float %9.0g years married
…
naffairs byte %9.0g number of affairs within last
year
…
-------------------------------------------------------------------------------
Sorted by: id
. summarize age, detail
in years
-------------------------------------------------------------
1% 22 17.5
5% 22 17.5
10% 22 17.5 Obs 601
25% 27 17.5 Sum of Wgt. 601
50% 32 Mean 32.48752

75% 37 57
90% 47 57 Variance 86.28109
95% 52 57 Skewness .8869999
99% 57 57 Kurtosis 3.220077

. summarize yrsmar, detail
years married
-------------------------------------------------------------
1% .125 .125
5% .75 .125
10% 1.5 .125 Obs 601
25% 4 .125 Sum of Wgt. 601
50% 7 Mean 8.177696

75% 15 15
90% 15 15 Variance 31.03942
95% 15 15 Skewness .0779935
99% 15 15 Kurtosis 1.432516

. tabulate naffairs
number of |
affairs |
within last |
year | Freq. Percent Cum.
------------+-----------------------------------
0 | 451 75.04 75.04
1 | 34 5.66 80.70
2 | 17 2.83 83.53
3 | 19 3.16 86.69
7 | 42 6.99 93.68
12 | 38 6.32 100.00
------------+-----------------------------------
Total | 601 100.00
• It is unfortunately the case, however, that age and years of marriage are highly
correlated:
. correlate age yrsmar

(obs=601)
| age yrsmarr
-------------+------------------
age | 1.0000
yrsmarr | 0.7775 1.0000

• Regressing the number of affairs on age, it appears that older people are slightly more
likely to have affairs:
. regress naffairs age

-------------+------------------------------ F( 1, 599) = 5.48
Model | 59.219586 1 59.219586 Prob > F = 0.0195
-------------+------------------------------ Adj R-squared = 0.0074
Total | 6529.08153 600 10.8818026 Root MSE = 3.2865
------------------------------------------------------------------------------
naffairs | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | .033822 .0144444 2.34 0.020 .0054541 .0621899
_cons | .357114 .4880375 0.73 0.465 -.6013585 1.315586
------------------------------------------------------------------------------

• People who have been married longer are more likely to have affairs, as well:
. regress naffairs yrsmar

-------------+------------------------------ F( 1, 599) = 21.67
Model | 227.929033 1 227.929033 Prob > F = 0.0000
-------------+------------------------------ Adj R-squared = 0.0333
Total | 6529.08153 600 10.8818026 Root MSE = 3.2434
------------------------------------------------------------------------------
-------------+----------------------------------------------------------------
yrsmarr | .1106286 .0237664 4.65 0.000 .0639529 .1573043
_cons | .5512198 .2351106 2.34 0.019 .0894785 1.012961
------------------------------------------------------------------------------
• However, when both are put together the coefficient on age is inconclusive – its sign
changes. This suggests that there is a high variance in the measurement of this
coefficient.
. regress naffairs age yrsmar

-------------+------------------------------ F( 2, 598) = 12.86
Model | 269.275536 2 134.637768 Prob > F = 0.0000
-------------+------------------------------ Adj R-squared = 0.0380
Total | 6529.08153 600 10.8818026 Root MSE = 3.2354
------------------------------------------------------------------------------
-------------+----------------------------------------------------------------
age | -.0449423 .0226134 -1.99 0.047 -.0893536 -.000531
yrsmarr | .1688902 .0377022 4.48 0.000 .0948454 .242935
_cons | 1.534838 .5476808 2.80 0.005 .4592266 2.61045
------------------------------------------------------------------------------
Break
• Last time, we examined some consequences of misspecifying a multiple regression
model.
• In particular, we examined the implications of omitting an “important variable” from
the regression.
• x2 is an “important variable” if (a) it is correlated with x1 and (b) 2 is something other
than zero.
• In this case, the sign of the omitted variable bias is determined by the signs of (a) and
(b).
• So if the correlation is negative, and 2 is positive, the estimate 1 will be negatively
biased.

• We also examined the sampling variance of the OLS estimator in the multiple
regression case.
• In general, an estimate of the parameter k will be more precise if:
– xk has a high variance
– The number of data points (n) is high
– xk is less correlated with the other x variables (i.e., R2k is low).
• When the five assumptions are true (linear form, random sampling, zero conditional
mean, no multicollinearity, conditional homoskedasticity) the ordinary least squares
estimator is BLUE.
• BLUE = Best Linear Unbiased Estimator
• “Best” means that the OLS estimator has the minimum possible variance of any linear
estimator.

• This result is known as the Gauss-Markov Theorem.
• If conditional homoskedasticity is violated, the parameter estimates are still unbiased,
but the OLS estimator isn’t the most efficient any more.
• The practice of altering the estimator to be more efficient when heteroskedasticity is
present is known as weighted least squares.

• As a further example of multiple regression analysis we will look at characteristics of
law schools and the starting salaries of their graduates.
• Graduates from more highly ranked law schools often earn more than those from
lower ranked law schools.
• Is it because of the law school, or because better law schools admit better applicants in
the first place?

. d
Contains data from D:\Econometrics\Statafiles\LAWSCH85.DTA

obs: 156
vars: 21 12 Mar 1999 15:06
---------------------------------------------------------------------------
---------------------------------------------------------------------------
rank int %9.0g law school ranking
salary float %9.0g median starting salary
cost int %9.0g law school cost
LSAT int %9.0g median LSAT score
GPA float %9.0g median college GPA
libvol int %9.0g no. volumes in lib., 1000s
---------------------------------------------------------------------------
• The dependent variable will be the logarithm of median starting salary for new
graduates.
• The independent variables measuring the quality of incoming students will be LSAT
scores and college GPAs.
• The independent variables measuring the quality of the school will be the logarithm of
the number of books in the school’s library, the logarithm of the cost of attending the
school and the media ranking of the law school.

• To start, we will try to find the relationship between graduates’ starting salaries and
school quality variables without any measure of the quality of entering students.
ln( salary )   0  1 ln(libvol )   2 ln(cost )   3 rank  u

. generate lsalary=ln(salary)
. generate lcost=ln(cost)
. generate llibvol=ln(libvol)
. regress lsalary llibvol lcost rank

-------------+------------------------------ F( 3, 137) = 211.04
Model | 8.8224849 3 2.9408283 Prob > F = 0.0000
Residual | 1.90907945 137 .013934886 R-squared = 0.8221
-------------+------------------------------ Adj R-squared = 0.8182
Total | 10.7315643 140 .076654031 Root MSE = .11805
------------------------------------------------------------------------------
lsalary | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
llibvol | .1291507 .0325187 3.97 0.000 .0648471 .1934543
lcost | .0265127 .0295489 0.90 0.371 -.0319182 .0849437
rank | -.0041712 .0002976 -14.02 0.000 -.0047596 -.0035829
_cons | 9.880132 .3433113 28.78 0.000 9.201258 10.55901
------------------------------------------------------------------------------

• Now let’s try a more complete model, including measures of student quality.
ln( salary )   0  1 ln(libvol )   2 ln(cost )   3rank   4GPA   5 LSAT  u
• We expect the coefficient on 3 to be negative, but the others should all be positive.
. regress lsalary llibvol lcost rank GPA LSAT

-------------+------------------------------ F( 5, 130) = 138.23
Model | 8.73362207 5 1.74672441 Prob > F = 0.0000
Residual | 1.64272974 130 .012636383 R-squared = 0.8417
-------------+------------------------------ Adj R-squared = 0.8356
Total | 10.3763518 135 .076861865 Root MSE = .11241
------------------------------------------------------------------------------
lsalary | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
llibvol | .0949932 .0332543 2.86 0.005 .0292035 .160783
lcost | .0375538 .0321061 1.17 0.244 -.0259642 .1010718
rank | -.0033246 .0003485 -9.54 0.000 -.004014 -.0026352
GPA | .2475239 .090037 2.75 0.007 .0693964 .4256514
LSAT | .0046965 .0040105 1.17 0.244 -.0032378 .0126308
_cons | 8.343226 .5325192 15.67 0.000 7.2897 9.396752
------------------------------------------------------------------------------
Multiple Regression Analysis: Inference

• So far we have discussed only the mean and variance of the parameter estimates in the
multiple regression model:
y   0  1  x1   2  x2  ...   k  xk  u
• We have not said anything about the distribution of the OLS estimators other than that.

• As a review, under the four basic assumptions of the linear regression model, the OLS
estimates of the parameters are unbiased:
E  ˆ j    j

• When conditional homoskedasticity is assumed:
– The variance of the OLS estimators can be written as
2
 
Var ˆ j 
 n 2
   xij  x j  
  1 Rj
2

 i 1 
– Additionally, the OLS estimators are BLUE – that is, among all estimators that are
both linear and unbiased, they have the minimum sampling variance possible.
• These five assumptions are collectively known as the Gauss-Markov assumptions.
• With a sixth assumption, the entire sampling distribution of the estimator can be
characterized.
• The assumption, known as the normality assumption states that u is independent of the
explanatory variables and:

u  Normal 0,  2 

• The form of the multiple regression model under these six assumptions can be
imagined succinctly as

y | x1 , x2 ,..., xk  Normal  0  1 x1   2 x2  ...   k xk ,  2 
• where the true values y are normal random variables with means equal to fitted values
constructed with the true parameters and variances all equal to 2.

• These six assumptions together are known as the classical linear model assumptions.
• Collectively, they imply that the sample distributions of the OLS estimators are
normally distributed:
  
ˆ j  Normal  j ,Var ˆ j

• As an example, consider if the error term u is independent of the x variables and takes
on the values -2, -1, 0, 1, and 2 with equal probability.
• This error term satisfies the Gauss-Markov assumptions.
• However, it violates the CLM assumptions.
• When the estimator is normally distributed, it can also be standardized:

ˆ j  Normal  j ,Var ˆ j  
ˆ j   j
 Normal  0,1
 
Se ˆ j

• When an estimate of the standard error must be used, as is almost always the case, the
standardization is
ˆ j   j
 tn  k 1
 
Seˆ ˆ j

• This standardization is used in hypothesis testing.
• Example of a hypothesis test:
• Two candidates, A and B, are running in an election. The official results say that
candidate B won the election with 54% of the vote.
• Candidate A thinks the election is rigged and hires a polling agency to ask 100 people
how they voted. The polling agency does so and 53 of the people it polls say they
voted for candidate A.

• There are two alternatives: the election results are accurate, or they aren’t.
• Suppose that  represents the proportion of people that voted for candidate A (46%).
• An example of a null hypothesis is the hypothesis that the election results are
accurate:
H 0 :   0.46
• An example of an alternative hypothesis is that it is not:
H A :   0.46
• Candidate B brings his poll results to the local magistrate, who devises a statistical test
to find out if candidate B has evidence beyond a reasonable doubt that the election was
rigged.

• The standard for “beyond a reasonable doubt” most commonly used is the 5%
significance test.
• In other words, the test can at most have a 5% chance of rejecting the null hypothesis
(the results are accurate) if the null hypothesis is true.

• Suppose that the hypothesis is true ( . The next question is, what is the
sampling distribution of candidate A’s estimate if the hypothesis is TRUE?
• One way to find this out is through simulations.
.07889
Fraction
0
32 36 40 44 48 52 56 60
xb
• A test statistic is a function of the random sample of data.
• The outcome of this function is used to create a rejection rule for the null hypothesis.
• Usually the rejection rule is to reject the null hypothesis if the outcome of the function
exceeds some critical value.

• Let’s look at the simulated data and see if we can construct a rejection rule for the null
hypothesis that 
. summarize xb, detail
xb
-------------------------------------------------------------
1% 34 24
5% 38 25
10% 40 26 Obs 100000
25% 43 26 Sum of Wgt. 100000
50% 46 Mean 45.9947

75% 49 66
90% 52 67 Variance 25.01752
95% 54 68 Skewness .0185113
99% 58 68 Kurtosis 2.989094

• A two-sided test might be to reject the null hypothesis if the poll results are greater or
less than certain values.
• For example, a 10% significance test would be to reject the null hypothesis if less than
38 people polled voted for A, or more than 54 people did.
• There is a 10% chance of a Type I error – rejecting the null hypothesis even if it is
true.
• Reducing the probability of Type I errors, however, increases the probability of
making a Type II error – failing to reject the null hypothesis if it is false.
• For example, suppose that 52% of the population actually voted for A. If so, it is still
fairly unlikely that one poll will show 58 people voting for A. If the poll
shows that 53 people voted for A, and the magistrate fails to reject the hypothesis that
a Type II error has been made.

• To summarize, a null hypothesis is rejected if the test statistic falls beyond the critical
values.
• If not we fail to reject the hypothesis.
• The percent chance of a Type I error (rejecting the hypothesis if it is true) is the
significance level of the test.
• Failing to reject a hypothesis is NOT “accepting” the hypothesis.

• For example, suppose that the cannon in front of the Vice-Chancellor’s house has been
stolen.
• The police know it was stolen at 2 in the morning.
• At the time, many people were taking part in a dance party nearby. A friend of yours
was there as well.

• Since your friend was there, if that is all the evidence you have, you fail to reject the
null hypothesis that your friend stole the cannon.
• However you have no evidence that your friend DID steal the cannon, either.
• Therefore you cannot “accept” the null hypothesis that your friend stole the cannon.

Lecture7 80

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Lecture7 80

Hochgeladen von

Copyright:

Verfügbare Formate

University of Hong Kong

Introductory Econometrics (ECON0701), Fall 2010

Multiple Regression Analysis: Estimation

Multiple Regression Analysis: Estimation

 x i1 , xi 2 ,..., xik , yi  : i  1, 2,..., n

• must be a random sample from the general population.

Multiple Regression Analysis: Estimation

Multiple Regression Analysis: Estimation

Multiple Regression Analysis: Estimation

x i1  x1  yi x i1  x1   0  1  xi1   2  xi 2  ui 

Multiple Regression Analysis: Estimation

2 < 0 negative bias positive bias

Multiple Regression Analysis: Estimation

wages   0  1 education  1 ability  u

Multiple Regression Analysis: Estimation

Multiple Regression Analysis: Estimation

Multiple Regression Analysis: Estimation

Contains data from wage2.dta

50% 102 Mean 101.2824

Multiple Regression Analysis: Estimation

Multiple Regression Analysis: Estimation

. regress educ meduc IQ

Source | SS df MS Number of obs = 857

Multiple Regression Analysis: Estimation

Source | SS df MS Number of obs = 857

Multiple Regression Analysis: Estimation

• The conditional homoskedasticity assumption states that

Multiple Regression Analysis: Estimation

Multiple Regression Analysis: Estimation

• The standard error of j is defined as

Multiple Regression Analysis: Estimation

Multiple Regression Analysis: Estimation

Multiple Regression Analysis: Estimation

Contains data from D:\Econometrics\Statafiles\affairs.dta

50% 32 Mean 32.48752

Multiple Regression Analysis: Estimation

50% 7 Mean 8.177696

Multiple Regression Analysis: Estimation

. correlate age yrsmar

Multiple Regression Analysis: Estimation

. regress naffairs age

Source | SS df MS Number of obs = 601

Multiple Regression Analysis: Estimation

. regress naffairs yrsmar

Source | SS df MS Number of obs = 601

Source | SS df MS Number of obs = 601

Multiple Regression Analysis: Estimation

Multiple Regression Analysis: Estimation

Multiple Regression Analysis: Estimation

Multiple Regression Analysis: Estimation

Contains data from D:\Econometrics\Statafiles\LAWSCH85.DTA

Multiple Regression Analysis: Estimation

ln( salary )   0  1 ln(libvol )   2 ln(cost )   3 rank  u

Multiple Regression Analysis: Estimation

. regress lsalary llibvol lcost rank

Source | SS df MS Number of obs = 141

Multiple Regression Analysis: Estimation

ln( salary )   0  1 ln(libvol )   2 ln(cost )   3rank   4GPA   5 LSAT  u

Source | SS df MS Number of obs = 136

Multiple Regression Analysis: Inference

Multiple Regression Analysis: Inference

Multiple Regression Analysis: Inference