Multiple-Variable OLS Regressions

ECONOMETRICS EXERCISE 3
TRAN THI ANH NGUYET 28 March, 2013
1. The data set CEOSAL2.DTA contains information on 177 CEOs. In this sample, the average annual salary is $865,864,400 with the smallest and largest being $100,000 and $5,299,000,000, respectively. Another most interesting variable is sales with the average being $3,5329,463,000, and its the smallest and largest being $29,000 and $51,300,000. Using the data set, the following OLS regression is obtained: (1) . lnsalary = 4.58 + 0.192.lnsales + 0.094.lnmktval + 0.017.ceoten 0.009comten 0.253 .0398 0..0489 0.0055 0.0033
reg lnsalary lnsales lnmktval ceoten comten

Source Model Residual Total ln_salary ln_sales ln_mktval ceoten comten _cons . SS 22.5202805 42.1259326 64.6462131 Coef. .1918844 .0940901 .0169211 -.009415 4.576374 df 4 172 176 MS 5.63007012 .244918213 .367308029 t 4.82 1.92 3.05 -2.82 18.05 P>|t| 0.000 0.056 0.003 0.005 0.000 Number of obs F( 4, 172) Prob > F R-squared Adj R-squared Root MSE = = = = = = 177 22.99 0.0000 0.3484 0.3332 .49489
Std. Err. .039824 .0489194 .0055389 .0033341 .2535074
[95% Conf. Interval] .1132777 -.0024696 .0059882 -.0159959 4.075988 .2704911 .1906498 .0278541 -.002834 5.07676
Interprete the result. OLS Coecients The main variable of interest is lnsales. If all other factors are unchanged, 10 more percentage of annual sales increases salary per year of a CEO by 1.92%. By mean of level changes, $353,294,630 increase in annual sales, on average, raises the CEOs salary by $101,740,800. This seems like a reasonably large inuence. 2 = 0.94 means that whenever the market value of the rm is 10% higher, the CEOs salary is predicted to rise about 0.94%, holding other factors xed. In comparison with 1.92%, this is not a huge eect, but it should not negligible, either. The variable ceoten is years as CEO with the current company and comten is total years with the company with the OLS estimators being 0.0169 and 0.0094, respectively. Other factors xed, one more year as a CEO with the rm increases his salary by about 1.69% . However, another year with the rm but not as a CEO, lowers approximately his salary by 0.094%. The rst one is likely as our expectation since , but the second one, the eect of comten at rst seems strange. Superstar eect hypothesis of Rosen (1990) states that the market must identify new talent and reassign control over careers from older to younger generations. It means that when a rm hires a CEO on labor market with long-time experience working as non-CEO, it is likely he was not hired as a superstar (CEO) by other rms. Therefore, the rm may have no willingness to bid up his salary. If the model supposed is not mispecied or problematic, the intercept coecient seems less practically meaningful as all independent variables have no change, the annual salary is still predicted to increase about 4.57%. It is a very large number in comparison with inuences of changes in sales and the rms market value. Otherwise, if there is any omitted variables, this change might be more meaningful.
Statistical Signicance The t statistic on lnsales, ceoten and comten are 4.82, 3.05, and2.82 that are large enough in magnitude to conclude that all these variables are statiscally signicant at the 5% level. What about the statistical signicance of the market value variable? The t statistic on lnmktval = 1.92 and it is likely not to be high to imply that lnmktval is statistically signicant at 5% with 2-sided test. However, when we expand the signicance level to 10%, with 177 6 = 171 df for the 2-sided alternative, H0 : = 0, the 10% critical value is about 1.645. Hence, lnmktval is almost signicant against the 2-sided alternative at the 10% level. In fact, we will show later that some outliers and inuential observations make the less satistical signicance of lnmktval. R-squared R2 = 0.3484, this means that the 4 predictors together explain about 34.84% of the percentage change in CEOs salary for this sample of 177 observations. This is not a very high gure caused by some reasons. Firstly, it is probably the result of omitting some other factors, for example, age and prof it, or functional form mispecication. Secondly, it might be caused by problem of inuential observation, outliers that is shown latter. Test hypothesis H0 : 2 = 0 H1 : 2 > 0 I use the following commands to display the c critical value for one-sided alternative at 5% signicance level:
. di "t(175) 95% = " invttail(171,.05) t(175) 95% = 1.6538133
From the rst table, t = 1.9233687 and t2 > c 2 2 is statistically greater than 0 at So we reject the null and accept the H1 hypothesis or the 5% signicance level. About the p-value p value = P (T > 1.92) = 0.5.P (|T | > 1.92) = 0.5(0.056) = 0.028 or by STATA command:
. display ttail(171,1.92) .02826118
This means that, if the null hypothesis is true, we would observe a t statistic as large as 2.8% of the time that is not a strong evidence against H0
2. Add the dummy variable college and grad into the equation, we obtain the following OLS (2) lnsalary = 4.664 + 0.192lnsales + 0.1003lnmktval + 0.0167ceoten 0.0104comten 0.0642college 0.0945grad
. reg ln_salary ln_sales ln_mktval ceoten comten college grad Source Model Residual Total ln_salary ln_sales ln_mktval ceoten comten college grad _cons SS 22.9276462 41.7185668 64.6462131 Coef. .1925694 .1002791 .0166999 -.0104663 -.0642696 -.0945008 4.663686 df 6 170 176 MS 3.82127437 .245403334 .367308029 t 4.83 2.03 3.01 -3.05 -0.28 -1.20 13.18 P>|t| 0.000 0.043 0.003 0.003 0.781 0.233 0.000 Number of obs F( 6, 170) Prob > F R-squared Adj R-squared Root MSE = = = = = = 177 15.57 0.0000 0.3547 0.3319 .49538
Std. Err. .0398827 .0492815 .0055566 .0034369 .2313154 .0789601 .3539677
[95% Conf. Interval] .1138403 .0029965 .0057312 -.0172508 -.52089 -.2503694 3.964947 .2712984 .1975616 .0276687 -.0036819 .3923509 .0613678 5.362424
Test hypothesis that education has no eect on CEO salaries Ho 5 = 6 = 0 H1: (5 = 6 = 0) Unrestricted model is (2) and restricted model is (1) The F statistic is dened as F
(SSRr SSRur )/q SSRur /(nk1)
.with
q=2 and n k 1 = 170 SSRr = 42.126 and SSRur = 41.7186
By Stata, I get the below result:

. test college grad ( 1) ( 2) college = 0 grad = 0 F( 2, 170) = Prob > F = 0.83 0.4378
. di invFtail(2,170,0.05) 3.0491486
We have F = 0.83 < c = 3.049, then we fail to reject the H0 or the hypothesis that education has no eect on CEO salaries the 5% signicance level. In other word, college and grad are jointly insignicant. The p-value= 0.4378 means that the chance of observing a value of the F statistic as large as we did under the H0 hypothesis is 43.78%, that is pretty strong evidence against H0 . The sample size is not too large, however, even if we are more liberal with the signicance level to 10%, college and grad are still jointly insignicant against the 2-sided alternative that can be seen from the following:
. return list scalars: r(drop) r(df_r) r(F) r(df) r(p) = = = = = 0 170 .8299923049572177 2 .4378117678453225
. di invFtail(r(df),r(df_r),0.1) 2.3340563
Furthermore, if we consider respectively the statistical signicance of each variable of college and grad, both of them are not statistically signicant at 5% signicance level. The t-statistic for college and grad are 0.28 and 1.2, much less than 1.96. 3. Residual analysis
(a) Inspect residuals Firstly, recalling the (1) OLS model and inspecting residuals , I get the following result:
. predict r, resid . inspect r r: Residuals # # # # # # Negative Zero Positive # # . Total Missing Number of Observations Total 92 85 177 177 Integers Nonintegers 92 85 177
-2.549983 1.992772 (More than 99 unique values)
Look at the right hand-sided histogram, we can see some unusual residuals, about more than 99 ones. Secondly, about the normality of residuals. However, it is not the case that the regression requires normality. It is to valid hypothesis testing with pvalues for ttest and F test. pnorm r
1.00 0.00 0.00 0.25 Normal F[(r-m)/s] 0.50 0.75
0.25
0.50 Empirical P[i] = i/(N+1)
0.75
1.00
qnorm r
2 -3 -2 Residuals -1 0 1
-1
0 Inverse Normal
The results from both pnorm and qnorm show that there are some deviation from normality. Nevertheless, this seems to be a minor and not serious deviation. So we can accept that the residuals close to normal distribution. (b) Potential Prolems To identify three types of observation (inuence, leverage and outliers), we rst look at the scatterplot matrix of variables
10
20
40 8
ln_salary
6 4
10 8 6 4 10
ln_sales
ln_mktval
8 6
40
20
ceoten
60 40
comten
20 0 4 6 8 6 8 10 0 20 40 60
The graphs of ln salary with other variables show few potential problems. In all plot, we see some (at least 3 to 4) point that are far away from the remaining ones. To get a better view of scatterplots, for example, lets make an individual graph of ln salary with ln mktval scatter ln salary ln mktval, mlable(obs)
9
103
74 77 60 62 101 34 56 4039 18 31 93 45 94 153 119 110 164 43 100 84 36 47 106107 143 160 8 165 32 88 13 144 27 155 142 147 150 82 71 89 72 10 127 1 50 146 55 83 6 15 125 154 69 108 19 81 7 97 58 145 79 92 161 5417 22 136 112 111 129 29 25 140 138 42 121 139 105 23 134 80 163 124 117 14 167 128 157 148 59 91 4 68 114 6753 132 41 61 20 152 11149 2 52 5770 51 115 171 116 63 37 33 156 109 162 65 9 102 5 66 131 13364 85 35 87 21 170 166 96 49 48 99 177 158 104 73 126 123 122 78 151 120 175 169 98 46 75 76 3 168 159 26 12 90 137 95 141 118 30 28 173 24 16 172 174 86 176 130 38 113 135 44
5 6
ln_salary 7
8 ln_mktval
10
11
Doing the same with the scatterplots of ln salary with others, all of them suggest that the observation 103, 74 stands out away from all other points.
(c) Identifying outliers Here, I examine studentized residuals for detecting outliers.
. predict rs, rstudent . stem rs Stem-and-leaf plot for rs (Studentized residuals) rs rounded to nearest multiple of .1 plot in units of .1 -5. -5* -4. -4* -3. -3* -2. -2* -1. -1* -0. -0* 0* 0. 1* 1. 2* 2. 3* 3. 4* 8
9 3 88555 44444221111000 99988888887777777666665555555 44443333333332222222222221111111111 000000000000111111112222223333334444444444 555666666667777777788889999 01112223333444 566 12 8 0 3
We see three residuals that stick out -5.8, 2.9, 3.0 and 4.3. Lets continuing sorting data on the residuals and show the 10 largest and 10 smallest values along with the observation id
. sort rs . list obs rs in 1/10 obs 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 113 30 38 174 141 86 158 172 52 118 rs -5.818924 -2.931373 -2.288085 -1.814222 -1.752105 -1.507932 -1.497845 -1.494411 -1.448616 -1.419524
. list obs rs in -10/l obs 168. 169. 170. 171. 172. 173. 174. 175. 176. 177. 147 89 45 160 69 18 130 176 74 103 rs 1.399831 1.419326 1.548682 1.564893 1.586095 2.091742 2.246242 2.759216 2.980667 4.268435
I pay attention to studentized residuals that exceed 2.9. The results show that there are four most worrisome observations, including 113, 30, 74 and 177. (d) Identifying leverage
Lets look at the leverages to detect observations that have great eects on regression. A point with leverage higher than (2k +2)/n should be carefully examined with k being the number of predictors.
. di (2*4+2)/177 .05649718 . list obs ln_salary ln_sales ln_mktval ceoten comten lev if lev>0.0565 obs 162. 163. 164. 165. 166. 167. 168. 169. 170. 171. 172. 173. 174. 175. 176. 177. 117 86 49 73 168 30 44 26 43 40 113 122 135 136 13 121 ln_sal~y 6.556778 5.159055 6.122493 6.052089 5.910797 5.703783 7.611348 5.880533 7.304516 7.467371 4.60517 5.981414 7.650645 6.74876 7.090077 6.672033 ln_sales 3.912023 3.367296 5.9428 3.583519 4.174387 8.839276 10.84545 4.59512 10.00785 9.692766 7.901007 6.240276 9.2399 6.706862 6.679599 7.090077 ln_mkt~l 6.805723 5.966147 8.809863 6.467699 7.090077 8.455317 10.66663 7.740664 7.937375 9.792556 9.220291 6.870053 10.72327 6.018593 6.519147 8.070906 ceoten 6 13 3 13 1 26 3 4 3 24 26 28 20 34 37 37 comten 6 13 9 13 4 45 34 23 3 31 26 58 41 34 37 45 lev .0565982 .0567984 .0596156 .0599303 .0606704 .0607598 .0607931 .066881 .0680141 .0701581 .073588 .0904837 .0928421 .097917 .1107356 .1211273
So, 121, 13 and 136 are observations with the highest leverage. Furthermore, as can be seen from both outlier and leverage tables, 133 and 86 have both large residuals and leverage. Such points may be the most inuential. (e) Correcting the model Regressing the new model without the most prolematic outliers and leverages, I get the below result: (3)
. reg ln_salary ln_sales ln_mktval ceoten comten if obs!=103&obs!=121&obs!=113&obs!=13 Source Model Residual Total ln_salary ln_sales ln_mktval ceoten comten _cons SS 25.5087652 30.9890844 56.4978496 Coef. .171261 .1418624 .0230826 -.0113112 4.370826 df 4 168 172 MS 6.3771913 .184458836 .32847587 t 4.93 3.29 4.27 -3.88 19.54 P>|t| 0.000 0.001 0.000 0.000 0.000 Number of obs F( 4, 168) Prob > F R-squared Adj R-squared Root MSE = = = = = = 173 34.57 0.0000 0.4515 0.4384 .42949
Std. Err. .0347562 .0431555 .0054012 .0029149 .2237059
[95% Conf. Interval] .1026457 .0566654 .0124196 -.0170657 3.929189 .2398762 .2270595 .0337456 -.0055567 4.812463
In the new regression, 2 = 0.1418 with tstatistic= 3.29 that is much higher than the rst-model tstatistic= 1.92. So, as t-stat > 1.96 - the 5% two-sided critical value, now ln mktval is statistically signicant at 5% signicance level. Moreover, the magnitude of 2 also increases considerably from 0.094 to 0.1418. It means that if market value of a rm is 10% higher, CEOs salary will increase 1.418%.
(f) Should we need revising the model (1)? In terms of goodness-of-t, the model (1) has R2 = 0.3484 while the (3) has R2 = 0.4515, and the RSS in two models are not much dierent, hence, the model (3) seems better. 2 to decide between these models, At this point, we can use features other than R2 and R pretty intuitively. For example, all independent variables are much more statistically signicant in (3) than in (1), and the 2 in (3) is probably of more interest.
4. Heteroskedasticity (As I am not sure which model of the three I have to examnine, so I do test for heteroskedasticity with the model (3) - the (1) after eliminating outliers and leverages). (a) Detecting heteroskedasticity H0 H1 (3) constant variance heteroskedastic lnsalary = 4.37+0.1713.lnsales+0.1419lnmktval+0.0231ceoten0.0113comten 0.2237 0.0348 R2 = 0.4515 0.0432 0.0054 0.0029
Firstly, recall the OLS regression (3). . n = 177,
Run the new regression with OLS residuals and predictors from (1)
. predict u, resid . gen u2=u*u . reg u2 ln_sales ln_mktval ceoten comten if obs!=103&obs!=121&obs!=113&obs!=13
u 2 = 0.2907 .0611ln sales + 0.0384ln mktval + 0.0082ceoten + 0.0007comten n = 173,

2 Ru 2 = 0.0705
F statistic is F =
2 /k Rr 2 2 )/(nk 1) (1Rr 2
where k = 4 and n k 1 = 168
Using Breusch-Pagan test for testing with F-statistic.

. scalar F=( 0.0705/4)/((1- 0.0705)/168) . scalar lm=168*F . di Ftail(4,168,F) .01489981 . test ln_sales ln_mktval ceoten comten ( ( ( ( 1) 2) 3) 4) ln_sales = 0 ln_mktval = 0 ceoten = 0 comten = 0 F( 4, 168) = Prob > F = 3.19 0.0148
Or White-test:
. estat imtest, white White's test for Ho: homoskedasticity against Ha: unrestricted heteroskedasticity chi2(14) Prob > chi2 = = 25.08 0.0338
Both pvalue associated with the Breusch-Pagan test (F stat) and White test(2 stat) are suciently small, being 0.0148 and0.0338, respectively. Therefore, we reject H0 and accept H1 , this means that heteroskedasticity is presenting and the usual standard errors reports in part 1 are not reliable. (b) Heteroskedasticity-Robust (4) lnsalary = 4.37 + 0.1713.lnsales + 0.1419lnmktval + 0.0231ceoten 0.0113comten . (0.2237) (0.0348) (0.0432) (0.0054) (0.0029) . [0.2111] [0.0339] [0.0337] [0.0059] [0.0027] 2 n = 173 R = 0.4515 The usual OLS standard errors are in parentheses, and the hetero-robust standard errors are in []. First, in the model (4), any variable that was statistically signicant using t-statistic is still statistically signicant using the hetero-robust tstatistic. This occurs as the two sets of se( j ) are not very dierent. The largest relative change in standard errors 2 = 0.0432 0.0337 = 0.0095 , about 22%. is for the coecient on ln mktval with The associated pvalue will dier slightly as the robust tstatistics are not identical to the usual, nonrobust tstat. (4) also shows that most of the robust standard errors are smaller than the usual 3 . standard errors, exception for

Multiple-Variable OLS Regressions

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Multiple-Variable OLS Regressions

Hochgeladen von

Copyright:

Verfügbare Formate

ECONOMETRICS EXERCISE 3

TRAN THI ANH NGUYET 28 March, 2013

reg lnsalary lnsales lnmktval ceoten comten

Std. Err. .039824 .0489194 .0055389 .0033341 .2535074

Std. Err. .0398827 .0492815 .0055566 .0034369 .2313154 .0789601 .3539677

q=2 and n k 1 = 170 SSRr = 42.126 and SSRur = 41.7186

By Stata, I get the below result:

-2.549983 1.992772 (More than 99 unique values)

0.50 Empirical P[i] = i/(N+1)

9 3 88555 44444221111000 99988888887777777666665555555 44443333333332222222222221111111111 000000000000111111112222223333334444444444 555666666667777777788889999 01112223333444 566 12 8 0 3

Std. Err. .0347562 .0431555 .0054012 .0029149 .2237059

Firstly, recall the OLS regression (3). . n = 177,

u 2 = 0.2907 .0611ln sales + 0.0384ln mktval + 0.0082ceoten + 0.0007comten n = 173,

where k = 4 and n k 1 = 168

Using Breusch-Pagan test for testing with F-statistic.

Das könnte Ihnen auch gefallen