Beruflich Dokumente
Kultur Dokumente
QMB 6305
UNIVERSITY OF WEST FLORIDA
Submitted by
Aaron Hall
Instructor
DR. GAYLE BAUGH
TABLE OF CONTENTS
Introduction ............................................................................................................ 1
The Data ............................................................................................................. 3
Prediction And Moving Averages ........................................................................ 5
Data Manipulation ............................................................................................ 11
Data Exploration .............................................................................................. 12
Linear Prediction .............................................................................................. 13
Checking The Model ......................................................................................... 20
Conclusions .......................................................................................................... 22
The Model ......................................................................................................... 22
Confidence Intervals ........................................................................................ 22
Error Comparison ............................................................................................. 23
Summary .......................................................................................................... 25
References ........................................................................................................... 27
Appendix A: Acknowledgements .......................................................................... 28
GNU/Linux/Ubuntu ............................................................................................ 28
R ....................................................................................................................... 28
OpenOffice.org ................................................................................................. 28
Other Tools ....................................................................................................... 28
Appendix B: R Code ............................................................................................. 29
ii
INTRODUCTION
The goal of this project is to analyze available econometric data and find
predictors of the valuation of the United States stock market. Availability of
data is an important constraint for this analysis. As the stock market has its
value calculated every second, using predictors with monthly frequency
would have been preferable, however, macroeconomic data is usually given
in quarterly and annual forms. Thus, all data used here has been annualized.
The proxy for stock values, the model's independent variable, is the
Ibbotson Large Company total return values. Since these values are given in
percentage returns from year 1925, the figures used for the model are
transformed as $1,000 invested in 1925 (Harrington, 2008).
Dependent variables include projected GDP, interest rates, inflation, and
the money supply.
Projected GDP is measured by reported predictions by the fed in the
Greenbook, but since it is only reported with a three year delay, the data
must be analyzed up to that date. Further, since the Fed's methodology is a
secret, the results for this figure, if they are significant, cannot be accurately
reproduced. GDP is an estimate even after the period for which it is
measured, and is usually revised several times (St. Louis Fed, 2010).
Projected GPD is important because it represents the productivity of the
United States economy. It would make sense that the more productive the US
economy, the greater chance for profitability for any US company, though
certainly not a guarantee. If an investor believes the economy will be more
productive in the future, perhaps this will increase the price the investor will
pay for the investment.
Since stock prices are based on the present values of future cash flows,
changes in interest rates are likely to affect the valuation of stocks. Lower
rates would increase the present value of the future cash flows. Interest rates
also affect the cost of capital for the firm as well. Lower interest rates would
decrease the borrowing costs of firms and increase their profitability.
The original proposal sought to examine bond prices as a predictor
variable. Since interest rate changes are directly related to changes in bond
prices, it precludes the use of bonds as a predictor variable.
Inflation may also be a predictor of stock valuations. Higher inflation
means that investors in bonds, if they are spending the bond income, are
losing capital to inflation. As a result, they may seek higher returns in the
stock market. Further, the values of hard assets owned by the companies
may also be increasing in value relative to the weakening dollar. Similarly to
the Large Company stock values, the figures used for inflation are indexed to
$1,000 in 1925.
The money supply is the final predictor to examine. The money supply
represents the number of dollars in circulation. It is related to inflation as the
more dollars in circulation, the less each individual dollar may be worth.
Since it may be highly correlated to inflation, it is unlikely both will remain in
the final model. This idea is not unrelated to Keynes' idea that the
components of money demand include a speculative demand in addition to
Inflation is given by Ibbotson Inflation Return data. The Fed Funds Rate
will proxy for interest rates. Money supply may be represented variously by
Institutional Money Funds (series IMFNS from the St. Louis Fed) and M2, and
the two series added together. (M3 was to be our time series for money
supply as it is the most encompassing definition of money, but it has been
discontinued by the Fed on the grounds that the costs of gathering the data
are not overcome by the value of the series.)
It should be noted that both the Large Company and Inflation return data
are given in terms of annual percentage growth, and have been transformed
to indicate the growth in the value of $1000 in 1925, therefore these figures
indicate the value by the end of the period. For prediction, the predictor
variables (other than projected GDP and GNP) will be lagged.
Occam's razor states that if two possible explanations are equally likely,
one should accept the least complicated explanation. When making
predictions, one should accept more complicated explanations only if the
more complicated explanation provides significantly greater prediction value.
Thus, this paper will seek to find the simplest model with the best
prediction value.
LCStock
89597.47
106119.24
140523.09
133623.41
162232.18
198750.65
211212.31
279138.19
330695.02
347990.37
406487.55
534490.48
517547.13
675657.77
727480.73
800156.05
810638.09
1114059.93
1371073.56
1828463.70
2351038.63
2845697.15
2586454.14
2279183.39
1775483.86
2285047.73
2533432.42
2657823.95
3077760.13
3246729.16
2045439.37
GNP
8445.3
9385.1
10219.1
11342.9
12479.4
12979.4
14604.2
15628.9
16478.2
17707.1
18951.2
20890.4
22115.7
22918.9
23731.8
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
GDP
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
23646.8
25106.9
26916.3
28464.6
29644.5
31702.1
33760.6
35286.5
39068.3
41928.5
41733.8
43518.7
46639.7
NA
NA
NA
NA
These are simple methods for forecasting time series data. They can
provide a baseline for deciding if other more complicated methods are
worthwhile. Measurement of the degree to which they fail to predict can
provide a way of eliminating the less effective methods of prediction. The
methods are demonstrated and compared on the following pages.
LCStock
89597.47
106119.24
140523.09
133623.41
162232.18
198750.65
211212.31
279138.19
330695.02
347990.37
406487.55
534490.48
517547.13
675657.77
727480.73
800156.05
810638.09
1114059.93
1371073.56
1828463.70
2351038.63
2845697.15
2586454.14
2279183.39
1775483.86
2285047.73
2533432.42
2657823.95
3077760.13
3246729.16
2045439.37
Last
Absolute Error
Squared Error
Abs. % Error
89597.47
106119.24
140523.09
133623.41
162232.18
198750.65
211212.31
279138.19
330695.02
347990.37
406487.55
534490.48
517547.13
675657.77
727480.73
800156.05
810638.09
1114059.93
1371073.56
1828463.70
2351038.63
2845697.15
2586454.14
2279183.39
1775483.86
2285047.73
2533432.42
2657823.95
3077760.13
3246729.16
16521.77
34403.86
6899.68
28608.77
36518.46
12461.67
67925.88
51556.82
17295.35
58497.18
128002.93
16943.35
158110.65
51822.95
72675.32
10482.04
303421.84
257013.63
457390.14
522574.93
494658.53
259243.01
307270.75
503699.53
509563.87
248384.69
124391.53
419936.18
168969.03
1201289.79
272968969.30
1183625368.92
47605638.59
818461848.89
1333598241.05
155293109.25
4613925152.16
2658106122.24
299129110.46
3421920136.68
16384749714.32
287077043.93
24998976830.29
2685618284.69
5281702797.86
109873251.96
92064812356.11
66056004341.90
209205740036.57
273084552890.20
244687058285.70
67206938571.71
94415315113.52
253713215787.74
259655335708.01
61694953315.94
15473253157.22
176346398595.85
28550533539.91
1443097161504.6
15.57%
24.48%
5.16%
17.63%
18.37%
5.90%
24.33%
15.59%
4.97%
14.39%
23.95%
3.27%
23.40%
7.12%
9.08%
1.29%
27.24%
18.75%
25.01%
22.23%
17.38%
10.02%
13.48%
28.37%
22.30%
9.80%
4.68%
13.64%
5.20%
58.73%
225742.56
MAD
115510479476.74
MSE
16.94%
MAPE
LCStock
89597.47
106119.24
140523.09
133623.41
162232.18
198750.65
211212.31
279138.19
330695.02
347990.37
406487.55
534490.48
517547.13
675657.77
727480.73
800156.05
810638.09
1114059.93
1371073.56
1828463.70
2351038.63
2845697.15
2586454.14
2279183.39
1775483.86
2285047.73
2533432.42
2657823.95
3077760.13
3246729.16
2045439.37
3SMA
112079.93
126755.25
145459.56
164868.75
190731.71
229700.38
273681.84
319274.53
361724.31
429656.13
486175.05
575898.46
640228.54
734431.52
779424.96
908284.69
1098590.53
1437865.73
1850191.96
2341733.16
2594396.64
2570444.90
2213707.13
2113238.33
2197988.00
2492101.37
2756338.83
2994104.42
Absolute Error
21543.48
35476.94
53291.08
46343.57
88406.48
100994.63
74308.53
87213.02
172766.17
87891.00
189482.72
151582.27
159927.51
76206.58
334634.98
462788.87
729873.17
913172.89
995505.19
244720.98
315213.25
794961.03
71340.60
420194.09
459835.95
585658.77
490390.33
948665.04
337495.89
MAD
Squared Error
464121451.77
1258612933.63
2839939693.60
2147726102.60
7815705416.34
10199915820.03
5521756957.95
7606111133.42
29848147904.41
7724827496.83
35903703032.64
22977183646.41
25576807786.21
5807442490.69
111980567596.75
214173535872.04
532714845282.45
833884735153.89
991030584614.88
59888359287.38
99359393130.41
631963045960.47
5089480910.29
176563073690.98
211449097709.30
342996192310.68
240482676908.24
899965361827.70
204341961189.70
MSE
Abs. % Error
16.12%
21.87%
26.81%
21.94%
31.67%
30.54%
21.35%
21.46%
32.32%
16.98%
28.04%
20.84%
19.99%
9.40%
30.04%
33.75%
39.92%
38.84%
34.98%
9.46%
13.83%
44.77%
3.12%
16.59%
17.30%
19.03%
15.10%
46.38%
25.28%
MAPE
Year
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
LCStock
3EMA.5 Absolute Error
89597.47
106119.24
89597.47
140523.09
97858.35
133623.41 119190.72
14432.69
162232.18 126407.07
35825.12
198750.65 144319.62
54431.02
211212.31 171535.14
39677.18
279138.19 191373.72
87764.47
330695.02 235255.96
95439.06
347990.37 282975.49
65014.88
406487.55 315482.93
91004.62
534490.48 360985.24
173505.24
517547.13 447737.86
69809.27
675657.77 482642.49
193015.28
727480.73 579150.13
148330.59
800156.05 653315.43
146840.62
810638.09 726735.74
83902.35
1114059.93 768686.92
345373.02
1371073.56 941373.43
429700.13
1828463.70 1156223.49
672240.21
2351038.63 1492343.60
858695.03
2845697.15 1921691.11
924006.04
2586454.14 2383694.13
202760.01
2279183.39 2485074.14
205890.75
1775483.86 2382128.76
606644.90
2285047.73 2078806.31
206241.42
2533432.42 2181927.02
351505.40
2657823.95 2357679.72
300144.23
3077760.13 2507751.83
570008.30
3246729.16 2792755.98
453973.18
2045439.37 3019742.57
974303.20
311128.82
MAD
Squared Error
208302472.58
1283438940.56
2962736201.05
1574278358.49
7702601885.25
9108613854.10
4226934433.03
8281840836.41
30104067776.38
4873334340.09
37254899475.16
22001964771.08
21562167965.08
7039605132.02
119282520409.14
184642205957.35
451906896336.44
737357153315.08
853787164899.99
41111621713.91
42390999723.26
368018038091.87
42535521976.81
123556043793.01
90086558779.20
324909460857.22
206091648861.04
949266726355.81
173819531389.31
MSE
Abs. % Error
10.80%
22.08%
27.39%
18.79%
31.44%
28.86%
18.68%
22.39%
32.46%
13.49%
28.57%
20.39%
18.35%
10.35%
31.00%
31.34%
36.77%
36.52%
32.47%
7.84%
9.03%
34.17%
9.03%
13.87%
11.29%
18.52%
13.98%
47.63%
23.61%
MAPE
LCStock
89597.47
106119.24
140523.09
133623.41
162232.18
198750.65
211212.31
279138.19
330695.02
347990.37
406487.55
534490.48
517547.13
675657.77
727480.73
800156.05
810638.09
1114059.93
1371073.56
1828463.70
2351038.63
2845697.15
2586454.14
2279183.39
1775483.86
2285047.73
2533432.42
2657823.95
3077760.13
3246729.16
2045439.37
3WMA
Absolute Error
Squared Error
120568
13055.87
170455826.59
131339
30892.91
954371666.51
149078
49672.90
2467397310.21
175723
35489.03
1259471001.03
198895
80243.12
6438958847.56
243098
87596.71
7673183321.03
293596
54394.74
2958787899.04
330750
75737.66
5736193038.66
374356 160134.08
25642922636.86
460739
56807.65
3227108677.42
504685 170972.79
29231696566.83
599426 128054.38
16397925184.80
675217 124938.57
15609647468.82
755181
55456.87
3075463885.92
793285 320775.42 102896866984.15
960602 410471.55 168486896330.42
1191996 636467.26 405090572707.41
1556933 794105.60 630603703969.31
2013519 832177.68 692519690655.54
2511272
75182.07
5652344215.05
2633633 354449.17 125634213850.63
2476026 700542.07
490759197131.8
2078545 206502.31
42643204645.54
2114216 419216.70 175742642136.79
2324313 333511.19 111229711943.21
2554231 523529.40 274083030393.65
2847060 399669.05 159735345716.28
3092255 1046815.91 1095823551868.69
302846.77 170434983551.10
MAD
MSE
Abs. % Error
9.77%
19.04%
24.99%
16.80%
28.75%
26.49%
15.63%
18.63%
29.96%
10.98%
25.30%
17.60%
15.61%
6.84%
28.79%
29.94%
34.81%
33.78%
29.24%
2.91%
15.55%
39.46%
9.04%
16.55%
12.55%
17.01%
12.31%
51.18%
22.20%
MAPE
10
MAD
225742.56
337495.89
311128.82
302846.77
MSE
115510479476.74
204341961189.70
173819531389.31
170434983551.10
11
MAPE
16.94%
25.28%
23.61%
22.20%
12
13
14
LCS =YearProjected GDP Inflation Fed Funds RateMoney Supply , is fit. GDP and
the money supply are shown to be significant at the 5% level. Adjusted Rsquared is 0.9014, and the p-value for the model is highly significant.
(Excluded observations are those before 1980 and after 2004; all
observations between and including 1980 and 2004 were included in the
regression.)
15
Using the original model yields a very high F statistic and R-squared.
However, only two of the independent variables is significant at the 5% level.
Visually observing the errors, they appear to be within a narrow band to the
left and a much wider band to the right.
16
17
lambda), except when lambda equals zero, which then takes the natural log
of the response variable. The 95% confidence interval for lambda falls
between approximately -0.28 and 0.05, confirming earlier suspicions of the
appropriateness of taking the natural log of the response.
Based on the suggestions of the analysis thus far, the model will be
changed and transformed before dropping insignificant variables in an
attempt to improve the model. The new model is
ln LCS =Year Pro.GDP Infl.Fed Funds RateMoney Supplyln LaggedLCS .
The transformation of the dependent variable indicates a successful
improvement on the model. The R-squared has increased, and the p-value is
still significant. The money supply is still significant, but the untransformed
projected GDP is no longer significant.
The Shapiro-Wilk normality test now indicates normally distributed errors.
The Durbin-Watson test still indicates autocorrelation with a p-value of .0847.
The Box-Cox test now indicates a wide range of possible transformations
including -2 and 1 within the 95% confidence level. Perhaps the Box-Cox
result is a problem with the predictors not having the correct transformation.
Both the Money Supply, Inflation, and Gross Domestic Product are functions
of growth over time. The next model will take their logs as well and fit.
This iteration does not improve for any of the variables except the lag
predictor which corrects for autocorrelation. Up to this point, inflation appears
to be an unimportant variable. Remove inflation for the next regression.
Removing inflation improves this model. Removing variables usually
18
decreases R-squared, but in this case, there was no change. Adjusted Rsquared actually improved.
Since the lagged dependent variable is in the model as a predictor, it
does a better job of predicting the next year's performance than the year.
Since the year is the least significant predictor now, it is the next variable to
remove from the model.
Removing the Year variable lowers R-squared an insignificant amount,
while still improving adjusted R-squared. In addition, the GDP variable
becomes significant again. The variable for the Federal Funds Rate is the lone
remaining insignificant term. Next, remove the Federal Funds Rate from the
model.
Removing the Fed Funds Rate creates very small decreases to R-squared
(at 0.9842) and adjusted R-squared (at 0.982). The model is now
ln LCS =ln Pro. GDP ln Money Supplyln LaggedLCS , which is the best
iteration so far. Each term is significant at the 5% level, and both terms were
significant in the first iterations model (before any transformations). (See
Appendix B, page 33 for the regression ANOVA with the object code "malt5".)
The least significant term is the money supply term. Even though it is
significant at the 5% level, given the high R-squared, the model may be overfit. The problem with over-fitting a model is that it is overly sensitive to newly
sampled data. Training the model on a subset of the data and testing its
ability to predict based on data outside the subset is one way of testing for
fit, and this method shall be demonstrated at the end of this paper.
19
Removing the money supply data reduces both R-squared and adjusted
R-squared slightly. It also reduces the confidence level of the prediction
provided by predicting gross economic production. It may be that the money
supply adds meaning that is required for projected GDP to mean anything. An
important thing to note is that this regression includes the observations from
year 1979. A look at the data indicates nothing strange that should arise from
that year being included. (Also, there is no indication of multiplicative effects,
see Appendix B.)
Checking The Model
Since the optimal model has been found, checking previous diagnostics
on the model, ln LCS =ln Projected GDPln Money Supplyln LaggedLCS will
test the strength of the model. The Shapiro test gives a p-value of .2885,
which is little evidence to reject the assumption of normality in the errors.
The Durbin-Watson test gives a p-value of .1872, which is evidence for not
rejecting the assumption of independent errors (other forms of dataexploration confirm this conclusion). And the Box-Cox transformation test
indicates that the data are correctly transformed with a maximal value for
lambda of close to one, although the 95% confidence interval ranges from
approximately -0.75 to 2.66. Thus we can assume the model is linear in the
parameters.
20
21
implement this model in making predictions about the stock market, first
predict the next year's GDP to the same level of accuracy as the Fed (no
small feat). Then, using the current year's end of year money supply and
stock market values, combined with the GDP projection, predict the stock
market's valuations with the following formula.
4.98091.9336ln ProjGDP 1.0676ln MoneySupply0.5758ln LaggedLCS
LCS =e
The transformation is justified by both the Box-Cox procedure as well as
the improvement in R-Squared of over 0.05.
Confidence Intervals
The minimum and maximum residuals are -0.22263 and 0.23279
respectively. To understand the difference, for any number exponential in e
greater than expected by 0.25, the exponential function is 28.4% greater
than expected. Similarly, for a number in e's exponent less than expected by
0.25, the result is 22.1% less than expected.
The residual standard error is 0.1368 with 21 degrees of freedom. Based
on the two tailed t-distribution with an alpha of 95%, the critical range is plus
or minus 2.08 standard errors. Thus the 95% confidence interval for the
regression is from 24.8% less than expected to 33.0% greater than expected.
This calculation indicates that this regression is no gold mine, and that even
with some expectation of a future value, there can be very large variance.
Error Comparison
Based on the sample the regression was calculated from, assuming
accurate projection of GDP, the next four years would have had this result.
LCSTOCK GDP
MRM2IMFNS
AUTOCOR
2657824.0 50553.5 7547 2533432.42
3077760.1 53595.7 7875 2657823.95
3246729.2 56290 8481 3077760.13
2045439.4 57765.7 9465 3246729.16
-4.9809 1.9336 -1.068
0.5758
22
Predicted
Abs Errors Squared Errors Abs%Error
3015493.21 357669.25767 127927297880 13.46%
3316359.46 238599.33319 56929641800.5 7.75%
3665965.51 419236.34554 175759113421 12.91%
3534858.84 1489419.4698 2218370357114 72.82%
Sums: 2504924.4062 2578986410215
Means: 626231.10156 644746602554 26.73%
MAE
MSE
MAPE
LCSTOCK
140523.09
133623.41
162232.18
198750.65
211212.31
279138.19
330695.02
347990.37
406487.55
534490.48
517547.13
675657.77
727480.73
800156.05
810638.09
1114059.93
1371073.56
1828463.7
2351038.63
2845697.15
2586454.14
2279183.39
1775483.86
2285047.73
2533432.42
2657823.95
3077760.13
3246729.16
2045439.37
-4.9809
GDP MRM2IMFNSAUTOCOR
10219.1 1488.5 106119.24
11342.9
1620
140523.09
12479.4 1798.3 133623.41
12979.4 1967.2 162232.18
14604.2 2178.7 198750.65
15628.9 2386.8 211212.31
16478.2 2574.8 279138.19
17707.1 2832.6 330695.02
18951.2 2937.7 347990.37
20890.4 3101.2 406487.55
22115.7 3283.9 534490.48
22918.9 3431.7 517547.13
23689.3 3582.3 675657.77
25106.9 3662.7 727480.73
26916.3 3722.5 800156.05
28464.6 3733.8 810638.09
29644.5 3934.2 1114059.93
31702.1 4172.1 1371073.56
33760.6 4463.3 1828463.7
35286.5 4963.8 2351038.63
39068.3 5336.1 2845697.15
41928.5 5782.1 2586454.14
41733.8
6717
2279183.39
43518.7
7117
1775483.86
46639.7 7256.1 2285047.73
50553.5 7546.7 2533432.42
53595.7 7875.2 2657823.95
56290
8480.7 3077760.13
57765.7
9465
3246729.16
1.9336 -1.0676
0.5758
23
24
Summary
Since this model was arrived at over a series of iterative processes that
eliminated one variable at a time, it may be argued that the findings are
spurious, and the result of random chance. That the model is the result of
pure chance is unlikely to be the case, however.
In general, this model states that the stock market goes up when the
economy is expected to grow, and when the money supply is decreasing. The
effect for the economy is about twice as much as the effect for the money
supply.
Expectations of economic growth fuel speculation in stocks. When people
expect the economy to grow more, stock prices increase. When there is less
of an expectation for economic growth, stock prices do not increase as much.
The Fed acts to contract the money supply when the economy is growing
too fast. The stock market is known to be a leading indicator of economic
growth. It would make sense that the Fed would be tightening the money
supply as the stock market is increasing.
Sometimes time series data runs the risk of reaching a change point
where the effects being used for prediction cease to work (Chatfield, 2000). It
is unlikely that the effects found here will cease to predict, however. These
effects are the result of actions of or predictions by a United States
government chartered organization that has powerful control over
fundamental aspects of the economy.
The high correlation between the two factors is an element of concern. It
25
would make sense that if the Fed sees the economy growing above average
the next year that it would act today to reduce the money supply. This
reasoning would explain the high level of correlation. This interaction is
troubling, but each needs the other for its significance level in the model. And
without the two variables, the model is left with nothing but an
autocorrelation correction variable based on the previous year's market and
about a third higher residual standard error.
Low standard errors with many variables relative to the number of
observations may indicate a model that is over-fit, but the two variables (plus
the autocorrelation variable) do not seem to be too much relative to the size
of the data available. In retrospect, this model also has the lowest standard
errors, and since all of these models have a very high R-squared, optimizing
for standard errors while keeping the number of predictors small would seem
to be the best remaining approach.
26
REFERENCES
Chatfield, C. (2000). Time-Series Forecasting. Boca Raton: Chapman &
Hall/CRC.
Colby, R. W. (2003). The Encyclopedia of Technical Market Indicators. New
York: McGraw-Hill.
Faraway, J. J. (2005). Linear Models with R. Boca Raton: Chapman & Hall/CRC.
Harrington, J. P. (Ed.). (2008). Ibbotson SBBI 2009 Classic Yearbook: Market
Results for Stocks, Bonds, Bills, and Inflation 1926-2008. Chicago:
Morningstar.
R Development Core Team. (2009). R: A Language and Environment for
Statistical Computing. Vienna, Austria: R Foundation for Statistical
Computing. Retrieved from http://www.R-project.org
St. Louis Fed. (2010). St. Louis Fed: Download Data for Series: M2NS, M2
Money Stock. St. Louis Fed. Retrieved April 6, 2010, from
http://research.stlouisfed.org/fred2/series/M2NS/downloaddata?cid=48
Venables, W. N., & Ripley, B. D. (2002). Modern Applied Statistics with S (4th
ed.). New York: Springer. Retrieved from
http://www.stats.ox.ac.uk/pub/MASS4
Zeileis, A., & Hothorn, T. (2002). Diagnostic Checking in Regression
Relationships. R News, 2(3), 7-10.
27
APPENDIX A: ACKNOWLEDGEMENTS
GNU/Linux/Ubuntu
The GNU Community has developed or enabled the functioning of all of
the tools used to create this document. All of these tools are open-source
software packages that are free to use and free to modify. The Linux kernel
powered the computing. Ubuntu is a popular distribution of Linux, and the
source for software repositories that provided the operating system,
supporting software, and core tools (except Zotero).
R
This study was done in R, a powerful command-line statistical
programming package (R Development Core Team, 2009). The advantages of
a command-line interface are that one may maintain an exactly reproducible
copy of ones work (e.g. see Appendix B), while having complete access to
many powerful functions. The disadvantage is that the learning curve takes
longer to climb compared to graphical user interfaces.
OpenOffice.org
This paper was written in OpenOffice.org, an open-source version of
Sun's StarOffice. Writer was used for word processing and document
assembly. Calc was used for data manipulation, spreadsheet functions, and
table creation.
Other Tools
SciTE with R syntax highlighting was also used to manipulate the code.
Zotero Firefox and Writer plug-ins were used to manage citations.
28
APPENDIX B: R CODE
This is the console input/output. It requires the files to be in the location
provided, and the lmtest and MASS libraries. The command prompt is the
">" symbol, and the "#" symbol indicates a non-executing comment.
R version 2.9.2 (2009-08-24)
Copyright (C) 2009 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
REvolution R enhancements not installed. For improved
performance and other extensions: apt-get install revolution-r
> comb <- read.table("/home/aaron/Desktop/MBA Statistics/combined.csv", header
= TRUE)
> cor(comb, use= "pairwise.complete.obs")
Year
LCSTOCK
GNP
GDP SPLICEGROSS Inflation
Year
1.0000000 0.9261381 0.9971895 0.9939544
0.9887894 0.9975192
LCSTOCK
0.9261381 1.0000000 0.9715892 0.8080071
0.9340474 0.9154274
GNP
0.9971895 0.9715892 1.0000000
NA
0.9999981 0.9855178
GDP
0.9939544 0.8080071
NA 1.0000000
0.9999990 0.9943664
SPLICEGROSS 0.9887894 0.9340474 0.9999981 0.9999990
1.0000000 0.9782972
Inflation
0.9975192 0.9154274 0.9855178 0.9943664
0.9782972 1.0000000
FEDFUNDS
-0.8233347 -0.6218049 -0.8141758 -0.4857107 -0.7987852 -0.8081528
IMFNS
0.8859370 0.8180842 0.9549985 0.9533118
0.9287711 0.8753418
M2NS
0.9683431 0.8936886 0.9902620 0.9829273
0.9898840 0.9679235
M2IMFNS
0.9542132 0.8822151 0.9931752 0.9830307
0.9871683 0.9527529
AUTOCOR
0.9261381 0.9508532 0.9668297 0.8545207
0.9352194 0.9307222
AUTOCOR2
0.9261381 0.9060283 0.9720780 0.8874491
0.9320751 0.9238932
AUTOCOR3
0.9369234 0.8664926 0.9575144 0.9434643
0.9391325 0.9144456
MRINFL
0.9975192 0.9119503 0.9808353 0.9940487
0.9774272 0.9986158
MRFUNDS
-0.8080723 -0.6402895 -0.7588302 -0.3646230 -0.7814267 -0.7788310
MRIMFNS
0.8782463 0.8099785 0.9478959 0.9329031
0.9090652 0.8900329
MRM2NS
0.9696137 0.8894880 0.9940004 0.9675501
0.9867425 0.9741104
MRM2IMFNS
0.9547696 0.8778663 0.9947253 0.9609944
0.9814615 0.9621024
FEDFUNDS
IMFNS
M2NS
M2IMFNS
AUTOCOR
AUTOCOR2
Year
-0.8233347 0.8859370 0.9683431 0.9542132 0.9261381 0.9261381
LCSTOCK
-0.6218049 0.8180842 0.8936886 0.8822151 0.9508532 0.9060283
GNP
-0.8141758 0.9549985 0.9902620 0.9931752 0.9668297 0.9720780
GDP
-0.4857107 0.9533118 0.9829273 0.9830307 0.8545207 0.8874491
SPLICEGROSS -0.7987852 0.9287711 0.9898840 0.9871683 0.9352194 0.9320751
Inflation
-0.8081528 0.8753418 0.9679235 0.9527529 0.9307222 0.9238932
FEDFUNDS
1.0000000 -0.6737792 -0.7703007 -0.7510230 -0.6632477 -0.7031356
29
Median
-41612
3Q
106983
Max
820959
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.897e+08 3.763e+08 -1.036
0.3134
Year
1.982e+05 1.913e+05
1.036
0.3134
SPLICEGROSS 1.757e+02 8.159e+01
2.153
0.0444 *
MRINFL
-8.347e+02 5.898e+02 -1.415
0.1732
MRFUNDS
1.792e+04 3.871e+04
0.463
0.6486
MRM2IMFNS
-6.165e+02 2.555e+02 -2.413
0.0261 *
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 290400 on 19 degrees of freedom
(8 observations deleted due to missingness)
30
library(MASS)
boxcox(m,plotit=T)
boxcox(m,plotit=T,lambda=seq(-0.5,0.5,by=0.1))
malt <- lm(log(LCSTOCK) ~ Year + SPLICEGROSS + MRINFL + MRFUNDS + MRM2IMFNS
log(AUTOCOR),comb, na.action= na.exclude)
summary(malt)
Call:
lm(formula = log(LCSTOCK) ~ Year + SPLICEGROSS + MRINFL + MRFUNDS +
MRM2IMFNS + log(AUTOCOR), data = comb, na.action = na.exclude)
Residuals:
Min
1Q
Median
-0.224728 -0.071272 -0.002158
Coefficients:
(Intercept)
Year
SPLICEGROSS
MRINFL
MRFUNDS
MRM2IMFNS
3Q
0.078216
Max
0.189551
31
3Q
0.094161
Max
0.211644
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
-72.479133 89.941534 -0.806 0.43085
Year
0.037791
0.048280
0.783 0.44395
log(SPLICEGROSS)
1.322121
1.589509
0.832 0.41643
log(MRINFL)
-0.004836
1.307447 -0.004 0.99709
MRFUNDS
-0.019492
0.018379 -1.061 0.30293
log(MRM2IMFNS)
-1.318564
0.621873 -2.120 0.04813 *
log(AUTOCOR)
0.620016
0.196808
3.150 0.00553 **
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 0.1397 on 18 degrees of freedom
(8 observations deleted due to missingness)
Multiple R-squared: 0.9859,
Adjusted R-squared: 0.9812
F-statistic: 209.8 on 6 and 18 DF, p-value: 1.178e-15
32
>
> malt3 <- lm(log(LCSTOCK) ~ Year + log(SPLICEGROSS) + MRFUNDS +
log(MRM2IMFNS) + log(AUTOCOR),comb, na.action= na.exclude)
> summary(malt3)
Call:
lm(formula = log(LCSTOCK) ~ Year + log(SPLICEGROSS) + MRFUNDS +
log(MRM2IMFNS) + log(AUTOCOR), data = comb, na.action = na.exclude)
Residuals:
Min
1Q
Median
-0.242398 -0.088944 -0.005036
3Q
0.094195
Max
0.211602
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
-72.58974
82.56247 -0.879 0.39027
Year
0.03784
0.04502
0.841 0.41102
log(SPLICEGROSS)
1.31767
1.00937
1.305 0.20733
MRFUNDS
-0.01945
0.01461 -1.331 0.19881
log(MRM2IMFNS)
-1.31744
0.52746 -2.498 0.02185 *
log(AUTOCOR)
0.62009
0.19068
3.252 0.00419 **
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 0.136 on 19 degrees of freedom
(8 observations deleted due to missingness)
Multiple R-squared: 0.9859,
Adjusted R-squared: 0.9822
F-statistic: 265.8 on 5 and 19 DF, p-value: < 2.2e-16
>
> malt4 <- lm(log(LCSTOCK) ~ log(SPLICEGROSS)
log(AUTOCOR),comb, na.action= na.exclude)
> summary(malt4)
+ MRFUNDS + log(MRM2IMFNS) +
Call:
lm(formula = log(LCSTOCK) ~ log(SPLICEGROSS) + MRFUNDS + log(MRM2IMFNS) +
log(AUTOCOR), data = comb, na.action = na.exclude)
Residuals:
Min
1Q
-0.24297 -0.08841
Median
0.01259
3Q
0.11157
Max
0.22009
Coefficients:
33
>
> #Final model, malt5
> malt5 <- lm(log(LCSTOCK) ~ log(SPLICEGROSS) + log(MRM2IMFNS) +
log(AUTOCOR),comb, na.action= na.exclude)
> summary(malt5)
Call:
lm(formula = log(LCSTOCK) ~ log(SPLICEGROSS) + log(MRM2IMFNS) +
log(AUTOCOR), data = comb, na.action = na.exclude)
Residuals:
Min
1Q
-0.22263 -0.09233
Median
0.01955
3Q
0.09150
Max
0.23279
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
-4.9809
2.4174 -2.060 0.05196 .
log(SPLICEGROSS)
1.9336
0.7975
2.425 0.02443 *
log(MRM2IMFNS)
-1.0676
0.5033 -2.121 0.04597 *
log(AUTOCOR)
0.5758
0.1846
3.119 0.00519 **
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 0.1368 on 21 degrees of freedom
(8 observations deleted due to missingness)
Multiple R-squared: 0.9842,
Adjusted R-squared: 0.982
F-statistic: 436.9 on 3 and 21 DF, p-value: < 2.2e-16
>
> malt6 <- lm(log(LCSTOCK) ~ log(SPLICEGROSS) + log(AUTOCOR),comb, na.action=
na.exclude)
> summary(malt6)
Call:
lm(formula = log(LCSTOCK) ~ log(SPLICEGROSS) + log(AUTOCOR),
data = comb, na.action = na.exclude)
Residuals:
Min
1Q
-3.358e-01 -1.018e-01
Median
4.623e-05
3Q
1.066e-01
Max
2.075e-01
Coefficients:
34
Median
0.01955
3Q
0.09150
Max
0.23279
Coefficients:
Median
0.01333
3Q
0.12556
Max
0.22081
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
0.86816
0.35789
2.426
0.0220 *
log(AUTOCOR) 0.94333
0.02646 35.657
<2e-16 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 0.1653 on 28 degrees of freedom
(3 observations deleted due to missingness)
Multiple R-squared: 0.9785,
Adjusted R-squared: 0.9777
F-statistic: 1271 on 1 and 28 DF, p-value: < 2.2e-16
> comb2 <- read.table("/home/aaron/Desktop/MBA Statistics/combined2.csv",
header = TRUE)
>
> cor(comb2, use= "pairwise.complete.obs")
Year
LCSTOCK SPLICEGROSS MRM2IMFNS
AUTOCOR AUTOCOR2
Year
1.0000000 0.9261381
0.9887894 0.9547696 0.9261381 0.9261381
LCSTOCK
0.9261381 1.0000000
0.9340474 0.8778663 0.9508532 0.9060283
0.9887894 0.9340474
1.0000000 0.9814615 0.9352194
0.9547696 0.8778663
0.9814615 1.0000000 0.8822151
0.9261381 0.9508532
0.9352194 0.8822151 1.0000000
0.9261381 0.9060283
0.9320751 0.9445584 0.9508532
0.9261381 0.8664926
0.9391325 0.9493800 0.9060283
AUTOCOR3
Year
0.9261381
LCSTOCK
0.8664926
SPLICEGROSS 0.9391325
MRM2IMFNS
0.9493800
AUTOCOR
0.9060283
AUTOCOR2
0.9508532
AUTOCOR3
1.0000000
> m2 <- lm(log(LCSTOCK) ~ log(SPLICEGROSS) + log(MRM2IMFNS) +
log(AUTOCOR),comb2, na.action= na.exclude)
> summary(m2)
0.9320751
0.9445584
0.9508532
1.0000000
0.9508532
Call:
lm(formula = log(LCSTOCK) ~ log(SPLICEGROSS) + log(MRM2IMFNS) +
log(AUTOCOR), data = comb2, na.action = na.exclude)
Residuals:
Min
1Q
-0.22263 -0.09233
Median
0.01955
3Q
0.09150
Max
0.23279
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
-4.9809
2.4174 -2.060 0.05196 .
log(SPLICEGROSS)
1.9336
0.7975
2.425 0.02443 *
log(MRM2IMFNS)
-1.0676
0.5033 -2.121 0.04597 *
log(AUTOCOR)
0.5758
0.1846
3.119 0.00519 **
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 0.1368 on 21 degrees of freedom
(9 observations deleted due to missingness)
Multiple R-squared: 0.9842,
Adjusted R-squared: 0.982
F-statistic: 436.9 on 3 and 21 DF, p-value: < 2.2e-16
>
> plot(fitted(m2), residuals(m2), xlab="Fitted",ylab="Residuals")
> qqnorm(resid(m2))
> shapiro.test(residuals(m2))
Shapiro-Wilk normality test
data: residuals(m2)
W = 0.9527, p-value = 0.2885
> library(lmtest)
> dwtest(m2)
Durbin-Watson test
data: m2
DW = 1.8683, p-value = 0.1872
35
36
Median
0.01519
3Q
0.08435
Max
0.22880
Coefficients: