Predictors of Stock Market Values

PREDICTORS OF STOCK MARKET VALUES
QMB 6305
UNIVERSITY OF WEST FLORIDA
Submitted by
Aaron Hall
April 13, 2010
Instructor
DR. GAYLE BAUGH
TABLE OF CONTENTS
Introduction ............................................................................................................ 1
The Data ............................................................................................................. 3
Prediction And Moving Averages ........................................................................ 5
Data Manipulation ............................................................................................ 11
Data Exploration .............................................................................................. 12
Linear Prediction .............................................................................................. 13
Checking The Model ......................................................................................... 20
Conclusions .......................................................................................................... 22
The Model ......................................................................................................... 22
Confidence Intervals ........................................................................................ 22
Error Comparison ............................................................................................. 23
Summary .......................................................................................................... 25
References ........................................................................................................... 27
Appendix A: Acknowledgements .......................................................................... 28
GNU/Linux/Ubuntu ............................................................................................ 28
R ....................................................................................................................... 28
OpenOffice.org ................................................................................................. 28
Other Tools ....................................................................................................... 28
Appendix B: R Code ............................................................................................. 29
ii
Predictors of Stock Market Values
INTRODUCTION
The goal of this project is to analyze available econometric data and find
predictors of the valuation of the United States stock market. Availability of
data is an important constraint for this analysis. As the stock market has its
value calculated every second, using predictors with monthly frequency
would have been preferable, however, macroeconomic data is usually given
in quarterly and annual forms. Thus, all data used here has been annualized.
The proxy for stock values, the model's independent variable, is the
Ibbotson Large Company total return values. Since these values are given in
percentage returns from year 1925, the figures used for the model are
transformed as $1,000 invested in 1925 (Harrington, 2008).
Dependent variables include projected GDP, interest rates, inflation, and
the money supply.
Projected GDP is measured by reported predictions by the fed in the
Greenbook, but since it is only reported with a three year delay, the data
must be analyzed up to that date. Further, since the Fed's methodology is a
secret, the results for this figure, if they are significant, cannot be accurately
reproduced. GDP is an estimate even after the period for which it is
measured, and is usually revised several times (St. Louis Fed, 2010).
Projected GPD is important because it represents the productivity of the
United States economy. It would make sense that the more productive the US
economy, the greater chance for profitability for any US company, though
certainly not a guarantee. If an investor believes the economy will be more
productive in the future, perhaps this will increase the price the investor will
pay for the investment.
Since stock prices are based on the present values of future cash flows,
changes in interest rates are likely to affect the valuation of stocks. Lower
rates would increase the present value of the future cash flows. Interest rates
also affect the cost of capital for the firm as well. Lower interest rates would
decrease the borrowing costs of firms and increase their profitability.
The original proposal sought to examine bond prices as a predictor
variable. Since interest rate changes are directly related to changes in bond
prices, it precludes the use of bonds as a predictor variable.
Inflation may also be a predictor of stock valuations. Higher inflation
means that investors in bonds, if they are spending the bond income, are
losing capital to inflation. As a result, they may seek higher returns in the
stock market. Further, the values of hard assets owned by the companies
may also be increasing in value relative to the weakening dollar. Similarly to
the Large Company stock values, the figures used for inflation are indexed to
$1,000 in 1925.
The money supply is the final predictor to examine. The money supply
represents the number of dollars in circulation. It is related to inflation as the
more dollars in circulation, the less each individual dollar may be worth.
Since it may be highly correlated to inflation, it is unlikely both will remain in
the final model. This idea is not unrelated to Keynes' idea that the
components of money demand include a speculative demand in addition to
classical notions of precautionary demand and transactional demand.

Since some predictors are only available on a quarterly or annual basis,
important data is only available on a year
A concern for this analysis is whether or not the stock market of today is
affected by the same causes even a decade past.
The Data
The data is expected upon graphical examination to reveal trends and
cyclicality. It is expected that since the stock market values are
representative of growth, perhaps a log transformation of the data is the best
approach. However, the other values also follow a growth form, and
transforming both the predictor and dependent variables in an identical
fashion will not yield any additional predictive power. The Box-Cox operation
may reveal the optimal transformation.
Stock market values will be represented by Ibbotson Large Company
Returns. Since returns are given in terms of percentage gains or losses, the
data is transformed into values based on $1000 invested in 1925, and
represent the value gained or lost by the end of the year (Harrington, 2008).
Projected economic production is measured by Projected Gross National
Product (in billions) for up to year 1992, and Projected Gross Domestic
Product (in billions) for year 1992 on, with one year of overlap in projections
for year 1992. These projections are given by the Greenbook, which is
released along with the Federal Open Market Committee meeting transcripts
after a five year lag (St. Louis Fed, 2010).
Inflation is given by Ibbotson Inflation Return data. The Fed Funds Rate
will proxy for interest rates. Money supply may be represented variously by
Institutional Money Funds (series IMFNS from the St. Louis Fed) and M2, and
the two series added together. (M3 was to be our time series for money
supply as it is the most encompassing definition of money, but it has been
discontinued by the Fed on the grounds that the costs of gathering the data
are not overcome by the value of the series.)
It should be noted that both the Large Company and Inflation return data
are given in terms of annual percentage growth, and have been transformed
to indicate the growth in the value of $1000 in 1925, therefore these figures
indicate the value by the end of the period. For prediction, the predictor
variables (other than projected GDP and GNP) will be lagged.
Occam's razor states that if two possible explanations are equally likely,
one should accept the least complicated explanation. When making
predictions, one should accept more complicated explanations only if the
more complicated explanation provides significantly greater prediction value.
Thus, this paper will seek to find the simplest model with the best
prediction value.

Year
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
LCStock
89597.47
106119.24
140523.09
133623.41
162232.18
198750.65
211212.31
279138.19
330695.02
347990.37
406487.55
534490.48
517547.13
675657.77
727480.73
800156.05
810638.09
1114059.93
1371073.56
1828463.70
2351038.63
2845697.15
2586454.14
2279183.39
1775483.86
2285047.73
2533432.42
2657823.95
3077760.13
3246729.16
2045439.37
GNP
8445.3
9385.1
10219.1
11342.9
12479.4
12979.4
14604.2
15628.9
16478.2
17707.1
18951.2
20890.4
22115.7
22918.9
23731.8
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
GDP
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
23646.8
25106.9
26916.3
28464.6
29644.5
31702.1
33760.6
35286.5
39068.3
41928.5
41733.8
43518.7
46639.7
NA
NA
NA
NA
Inflation FEDFUNDS IMFNS M2NS M2IMFNS

3777.37
NA
NA
NA
NA
4280.14
13.78
9.5
1479.0
1488.5
4810.87
18.90
15.2 1604.8
1620
5240.97
12.37
38.0 1760.3
1798.3
5443.79
8.95
50.0 1917.2
1967.2
5650.66
9.47
42.5 2136.2
2178.7
5873.86
8.38
65.9 2320.9
2386.8
6095.30
8.27
68.2 2506.6
2574.8
6164.18
6.91
88.5 2744.1
2832.6
6436.02
6.77
95.0 2842.7
2937.7
6720.49
8.76
94.9 3006.3
3101.2
7032.99
8.45
112.5 3171.4
3283.9
7462.71
7.31
141.5 3290.2
3431.7
7691.07
4.43
191.2 3391.1
3582.3
7914.11
2.92
216.0 3446.7
3662.7
8131.75
2.96
221.3 3501.2
3722.5
8348.86
5.45
216.1 3517.7
3733.8
8560.92
5.60
270.3 3663.9
3934.2
8845.15
5.29
332.4 3839.7
4172.1
8995.51
5.50
409.7 4053.6
4463.3
9140.34
4.68
565.6 4398.2
4963.8
9385.30
5.30
674.2 4661.9
5336.1
9703.47
6.40
833.4 4948.7
5782.1
9853.87
1.82
1248.0 5469.0
6717
10088.39
1.24
1300.8 5816.2
7117
10278.05
0.98
1154.7 6101.4
7256.1
10613.12
2.16
1103.0 6443.7
7546.7
10976.09
4.16
1172.1 6703.1
7875.2
11254.88
5.24
1378.4 7102.3
8480.7
11714.08
4.24
1934.8 7530.2
9465
11724.62
0.16
2430.9 8251.3 10682.2
Table 1: Raw Data

Prediction And Moving Averages
There are various ways of attempting to predict a variable based on its
past values. The simplest method is to use the last measured value. This
method may put too much emphasis on a single terms' values. Various forms
of moving averages can provide a more nuanced approach to prediction of
the next period's value.
Simple moving Averages (SMAs) weight all periods evenly, and the only
parameter is the number of periods used for prediction.

Exponential Moving Averages (EMAs) reduce the parameters involved in
weighted moving averages to the number of periods used and a parameter,
alpha, which defines the amount of weighting each period receives. If alpha is
restricted to 2 /n1 , one may reduce the number of parameters to 1
(Colby, 2003).
Weighted Moving Averages (WMA) may also be considered. There are
enumerable variations in choice for weightings. Restricting the weighting to
nt x where n is the number of periods, t is the most recent period, and
x is the period number, can provide a general rule structure with equally
declining rates of weighting. SMA and the Last method are actually restricted
cases of WMA, the SMA with equal weighting, and the Last method with
n=1.
These are simple methods for forecasting time series data. They can
provide a baseline for deciding if other more complicated methods are
worthwhile. Measurement of the degree to which they fail to predict can
provide a way of eliminating the less effective methods of prediction. The
methods are demonstrated and compared on the following pages.

Year
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
LCStock
89597.47
106119.24
140523.09
133623.41
162232.18
198750.65
211212.31
279138.19
330695.02
347990.37
406487.55
534490.48
517547.13
675657.77
727480.73
800156.05
810638.09
1114059.93
1371073.56
1828463.70
2351038.63
2845697.15
2586454.14
2279183.39
1775483.86
2285047.73
2533432.42
2657823.95
3077760.13
3246729.16
2045439.37
Last
Absolute Error
Squared Error
Abs. % Error
89597.47
106119.24
140523.09
133623.41
162232.18
198750.65
211212.31
279138.19
330695.02
347990.37
406487.55
534490.48
517547.13
675657.77
727480.73
800156.05
810638.09
1114059.93
1371073.56
1828463.70
2351038.63
2845697.15
2586454.14
2279183.39
1775483.86
2285047.73
2533432.42
2657823.95
3077760.13
3246729.16
16521.77
34403.86
6899.68
28608.77
36518.46
12461.67
67925.88
51556.82
17295.35
58497.18
128002.93
16943.35
158110.65
51822.95
72675.32
10482.04
303421.84
257013.63
457390.14
522574.93
494658.53
259243.01
307270.75
503699.53
509563.87
248384.69
124391.53
419936.18
168969.03
1201289.79
272968969.30
1183625368.92
47605638.59
818461848.89
1333598241.05
155293109.25
4613925152.16
2658106122.24
299129110.46
3421920136.68
16384749714.32
287077043.93
24998976830.29
2685618284.69
5281702797.86
109873251.96
92064812356.11
66056004341.90
209205740036.57
273084552890.20
244687058285.70
67206938571.71
94415315113.52
253713215787.74
259655335708.01
61694953315.94
15473253157.22
176346398595.85
28550533539.91
1443097161504.6
15.57%
24.48%
5.16%
17.63%
18.37%
5.90%
24.33%
15.59%
4.97%
14.39%
23.95%
3.27%
23.40%
7.12%
9.08%
1.29%
27.24%
18.75%
25.01%
22.23%
17.38%
10.02%
13.48%
28.37%
22.30%
9.80%
4.68%
13.64%
5.20%
58.73%
225742.56
MAD
115510479476.74
MSE
16.94%
MAPE
Table 2: Prediction Method of using the Last Period's Value

Year
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
LCStock
89597.47
106119.24
140523.09
133623.41
162232.18
198750.65
211212.31
279138.19
330695.02
347990.37
406487.55
534490.48
517547.13
675657.77
727480.73
800156.05
810638.09
1114059.93
1371073.56
1828463.70
2351038.63
2845697.15
2586454.14
2279183.39
1775483.86
2285047.73
2533432.42
2657823.95
3077760.13
3246729.16
2045439.37
3SMA
112079.93
126755.25
145459.56
164868.75
190731.71
229700.38
273681.84
319274.53
361724.31
429656.13
486175.05
575898.46
640228.54
734431.52
779424.96
908284.69
1098590.53
1437865.73
1850191.96
2341733.16
2594396.64
2570444.90
2213707.13
2113238.33
2197988.00
2492101.37
2756338.83
2994104.42
Absolute Error
21543.48
35476.94
53291.08
46343.57
88406.48
100994.63
74308.53
87213.02
172766.17
87891.00
189482.72
151582.27
159927.51
76206.58
334634.98
462788.87
729873.17
913172.89
995505.19
244720.98
315213.25
794961.03
71340.60
420194.09
459835.95
585658.77
490390.33
948665.04
337495.89
MAD
Table 3: Simple Moving Average, n=3
Squared Error
464121451.77
1258612933.63
2839939693.60
2147726102.60
7815705416.34
10199915820.03
5521756957.95
7606111133.42
29848147904.41
7724827496.83
35903703032.64
22977183646.41
25576807786.21
5807442490.69
111980567596.75
214173535872.04
532714845282.45
833884735153.89
991030584614.88
59888359287.38
99359393130.41
631963045960.47
5089480910.29
176563073690.98
211449097709.30
342996192310.68
240482676908.24
899965361827.70
204341961189.70
MSE
Abs. % Error
16.12%
21.87%
26.81%
21.94%
31.67%
30.54%
21.35%
21.46%
32.32%
16.98%
28.04%
20.84%
19.99%
9.40%
30.04%
33.75%
39.92%
38.84%
34.98%
9.46%
13.83%
44.77%
3.12%
16.59%
17.30%
19.03%
15.10%
46.38%
25.28%
MAPE
Year
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
LCStock
3EMA.5 Absolute Error
89597.47
106119.24
89597.47
140523.09
97858.35
133623.41 119190.72
14432.69
162232.18 126407.07
35825.12
198750.65 144319.62
54431.02
211212.31 171535.14
39677.18
279138.19 191373.72
87764.47
330695.02 235255.96
95439.06
347990.37 282975.49
65014.88
406487.55 315482.93
91004.62
534490.48 360985.24
173505.24
517547.13 447737.86
69809.27
675657.77 482642.49
193015.28
727480.73 579150.13
148330.59
800156.05 653315.43
146840.62
810638.09 726735.74
83902.35
1114059.93 768686.92
345373.02
1371073.56 941373.43
429700.13
1828463.70 1156223.49
672240.21
2351038.63 1492343.60
858695.03
2845697.15 1921691.11
924006.04
2586454.14 2383694.13
202760.01
2279183.39 2485074.14
205890.75
1775483.86 2382128.76
606644.90
2285047.73 2078806.31
206241.42
2533432.42 2181927.02
351505.40
2657823.95 2357679.72
300144.23
3077760.13 2507751.83
570008.30
3246729.16 2792755.98
453973.18
2045439.37 3019742.57
974303.20
311128.82
MAD
Squared Error
208302472.58
1283438940.56
2962736201.05
1574278358.49
7702601885.25
9108613854.10
4226934433.03
8281840836.41
30104067776.38
4873334340.09
37254899475.16
22001964771.08
21562167965.08
7039605132.02
119282520409.14
184642205957.35
451906896336.44
737357153315.08
853787164899.99
41111621713.91
42390999723.26
368018038091.87
42535521976.81
123556043793.01
90086558779.20
324909460857.22
206091648861.04
949266726355.81
173819531389.31
MSE
Table 4: Exponential Moving Average, n=3, alpha=0.5
Abs. % Error
10.80%
22.08%
27.39%
18.79%
31.44%
28.86%
18.68%
22.39%
32.46%
13.49%
28.57%
20.39%
18.35%
10.35%
31.00%
31.34%
36.77%
36.52%
32.47%
7.84%
9.03%
34.17%
9.03%
13.87%
11.29%
18.52%
13.98%
47.63%
23.61%
MAPE

Year
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
LCStock
89597.47
106119.24
140523.09
133623.41
162232.18
198750.65
211212.31
279138.19
330695.02
347990.37
406487.55
534490.48
517547.13
675657.77
727480.73
800156.05
810638.09
1114059.93
1371073.56
1828463.70
2351038.63
2845697.15
2586454.14
2279183.39
1775483.86
2285047.73
2533432.42
2657823.95
3077760.13
3246729.16
2045439.37
3WMA
Absolute Error
Squared Error
120568
13055.87
170455826.59
131339
30892.91
954371666.51
149078
49672.90
2467397310.21
175723
35489.03
1259471001.03
198895
80243.12
6438958847.56
243098
87596.71
7673183321.03
293596
54394.74
2958787899.04
330750
75737.66
5736193038.66
374356 160134.08
25642922636.86
460739
56807.65
3227108677.42
504685 170972.79
29231696566.83
599426 128054.38
16397925184.80
675217 124938.57
15609647468.82
755181
55456.87
3075463885.92
793285 320775.42 102896866984.15
960602 410471.55 168486896330.42
1191996 636467.26 405090572707.41
1556933 794105.60 630603703969.31
2013519 832177.68 692519690655.54
2511272
75182.07
5652344215.05
2633633 354449.17 125634213850.63
2476026 700542.07
490759197131.8
2078545 206502.31
42643204645.54
2114216 419216.70 175742642136.79
2324313 333511.19 111229711943.21
2554231 523529.40 274083030393.65
2847060 399669.05 159735345716.28
3092255 1046815.91 1095823551868.69
302846.77 170434983551.10
MAD
MSE
Table 5: Weighted Moving Average, 3 periods
Abs. % Error
9.77%
19.04%
24.99%
16.80%
28.75%
26.49%
15.63%
18.63%
29.96%
10.98%
25.30%
17.60%
15.61%
6.84%
28.79%
29.94%
34.81%
33.78%
29.24%
2.91%
15.55%
39.46%
9.04%
16.55%
12.55%
17.01%
12.31%
51.18%
22.20%
MAPE
10

Method
Last
SMA
EMA
WMA
MAD
225742.56
337495.89
311128.82
302846.77
MSE
115510479476.74
204341961189.70
173819531389.31
170434983551.10
11
MAPE
16.94%
25.28%
23.61%
22.20%
Table 6: Moving Average Summary

It is clear that the method of choosing the last value has the best
performance of the moving averages, since it has the least Mean Absolute
Deviation (MAD), the least Mean Squared Errors (MSE), and the least Mean
Absolute Percentage Error (MAPE). In terms of the moving averages, the best
performer is the simplest prediction method, where the last period is used to
predict the next period. Since the data follows a trend with occasional
retracement, it makes sense that this prediction method would provide the
best performance.
This is not to say that selecting the Last Period is a satisfactory approach
to prediction. The variable is generally in an upward trend. If the troughs can
be predicted, an investor will safely gain returns while avoiding or even
profiting from market losses.
Data Manipulation
Since the data are to be used to predict future period Large Company
Stock returns using current data, aside from projected GDP. Therefore
Inflation, Interest Rates, and the Money Supply figures will be lagged.
Also, Projected GNP ends where Projected GDP begins, with one year of
overlap. These series are spliced together with the average taken for the
year of overlap (since there is minimal difference in the figures, less than half
12
a percent.) This series will be referred to as SPLICEGROSS in the data, and as

GDP in the text from this point on.
The lagged money supply data will consist of the combined M2 and
Institution Money Funds, since M3 (the most expansive definition of the
money supply) was discontinued. In the data, it is referred to under the
MRM2IMFNS label. In the text, it will be referred to as simply the money
supply. The most recent (thus lagged) Fed Funds and Inflation data are also
used.
Data Exploration
The data are highly correlated. Pairwise correlation is measured where
data is available for both terms. The term of primary interest is the
dependent variable, Large Cap Stock. Correlations with the dependent
variable are: Year, 0.926; GDP, 0.934; lagged Inflation, 0.912; lagged
FEDFUNDS, 0.64; lagged M2 and Institution Money Funds, 0.878.
There are other pairwise correlations of note. FEDFUNDS is negatively
correlated with all other variables. Its strongest relationship is with the GDP (0.798, -0.781 lagged) and Inflation (-0.808) variables, both at near 0.8.
The measure of money supply is highly correlated with Inflation (0.962)
as well as GDP (0.981). This relationship may indicate multicollinearity in the
data, and it makes the money supply variable an early potential candidate for
removal. The removal of the money supply (or any other variable, for that
matter) from the model does not mean that it is unimportant, it merely
means that it is not needed to predict the response variable.
13
Illustration 1: Plots of the Key Variables

Multicollinearity is the problem of having one predictive variable being a
near linear transformation of another predictive variable. When the variables
are highly correlated, the model may still be reliable so long as the
relationship between the independent variables is stable. If the relationship
14
between variables changes, the model may cease to be reliable (Faraway,

2005).
Multicollinearity also means that it is difficult to explain the individual
importance of each variable. Small changes in the predicted variable can
create large changes in the beta coefficients.
The indication of multicollinearity is not only shown in the correlation
matrix. It may also show itself in the model. (Variance inflation factors are a
another approach to examining collinearity, but are beyond the scope of this
paper.)
There are other potential problems as well, including heteroskedasticity,
non-constant variance of the errors.
Linear Prediction
The first model,
LCS =YearProjected GDP Inflation Fed Funds RateMoney Supply , is fit. GDP and
the money supply are shown to be significant at the 5% level. Adjusted Rsquared is 0.9014, and the p-value for the model is highly significant.
(Excluded observations are those before 1980 and after 2004; all
observations between and including 1980 and 2004 were included in the
regression.)
15
Using the original model yields a very high F statistic and R-squared.
However, only two of the independent variables is significant at the 5% level.
Visually observing the errors, they appear to be within a narrow band to the
left and a much wider band to the right.
Illustration 2: Residual Variance is Non-Constant

Variance appears to be non-constant, a condition called
heteroskedasticity. A Q-Q Plot of the errors also indicates non-normality, but
is similar to log-normal residuals (which is more evidence for a log
transformation of the dependent variable). The Shapiro-Wilk normality test
gives a p-value of 0.03778. Since the Shapiro-Wilk null hypothesis is that the
residuals are normal, this test provides formal evidence for the rejection of an
16
assumption of normality. (R documentation indicates a rejection threshold of

less than .1 is adequate, citing a remark in Applied Statistics by Patrick
Royston in 1995 (R Development Core Team, 2009).)
A further problem is autocorrelation. Visual inspection of the trending
data is indicative of autocorrelation. The Durbin-Watson test is conclusive
(Zeileis & Hothorn, 2002). It reports a p-value of 0.0002858, rejecting the
hypothesis of non-correlated errors. An approach to deal with autocorrelation
is to add the lagged response variable to the predictor variables. This
approach is akin to using the Last method for prediction.
Illustration 3: Box-Cox Operation Indicates Natural Log-Transformation

The proper transformation of the data can be easily estimated with the
Box-Cox method (Venables & Ripley, 2002). The Box-Cox method transforms
the response variable by raising it to the power of lambda (and dividing it by
17
lambda), except when lambda equals zero, which then takes the natural log
of the response variable. The 95% confidence interval for lambda falls
between approximately -0.28 and 0.05, confirming earlier suspicions of the
appropriateness of taking the natural log of the response.
Based on the suggestions of the analysis thus far, the model will be
changed and transformed before dropping insignificant variables in an
attempt to improve the model. The new model is
ln LCS =Year Pro.GDP Infl.Fed Funds RateMoney Supplyln LaggedLCS .
The transformation of the dependent variable indicates a successful
improvement on the model. The R-squared has increased, and the p-value is
still significant. The money supply is still significant, but the untransformed
projected GDP is no longer significant.
The Shapiro-Wilk normality test now indicates normally distributed errors.
The Durbin-Watson test still indicates autocorrelation with a p-value of .0847.
The Box-Cox test now indicates a wide range of possible transformations
including -2 and 1 within the 95% confidence level. Perhaps the Box-Cox
result is a problem with the predictors not having the correct transformation.
Both the Money Supply, Inflation, and Gross Domestic Product are functions
of growth over time. The next model will take their logs as well and fit.
This iteration does not improve for any of the variables except the lag
predictor which corrects for autocorrelation. Up to this point, inflation appears
to be an unimportant variable. Remove inflation for the next regression.
Removing inflation improves this model. Removing variables usually
18
decreases R-squared, but in this case, there was no change. Adjusted Rsquared actually improved.
Since the lagged dependent variable is in the model as a predictor, it
does a better job of predicting the next year's performance than the year.
Since the year is the least significant predictor now, it is the next variable to
remove from the model.
Removing the Year variable lowers R-squared an insignificant amount,
while still improving adjusted R-squared. In addition, the GDP variable
becomes significant again. The variable for the Federal Funds Rate is the lone
remaining insignificant term. Next, remove the Federal Funds Rate from the
model.
Removing the Fed Funds Rate creates very small decreases to R-squared
(at 0.9842) and adjusted R-squared (at 0.982). The model is now
ln LCS =ln Pro. GDP ln Money Supplyln LaggedLCS , which is the best
iteration so far. Each term is significant at the 5% level, and both terms were
significant in the first iterations model (before any transformations). (See
Appendix B, page 33 for the regression ANOVA with the object code "malt5".)
The least significant term is the money supply term. Even though it is
significant at the 5% level, given the high R-squared, the model may be overfit. The problem with over-fitting a model is that it is overly sensitive to newly
sampled data. Training the model on a subset of the data and testing its
ability to predict based on data outside the subset is one way of testing for
fit, and this method shall be demonstrated at the end of this paper.
19
Removing the money supply data reduces both R-squared and adjusted
R-squared slightly. It also reduces the confidence level of the prediction
provided by predicting gross economic production. It may be that the money
supply adds meaning that is required for projected GDP to mean anything. An
important thing to note is that this regression includes the observations from
year 1979. A look at the data indicates nothing strange that should arise from
that year being included. (Also, there is no indication of multiplicative effects,
see Appendix B.)
Checking The Model
Since the optimal model has been found, checking previous diagnostics
on the model, ln LCS =ln Projected GDPln Money Supplyln LaggedLCS will
test the strength of the model. The Shapiro test gives a p-value of .2885,
which is little evidence to reject the assumption of normality in the errors.
The Durbin-Watson test gives a p-value of .1872, which is evidence for not
rejecting the assumption of independent errors (other forms of dataexploration confirm this conclusion). And the Box-Cox transformation test
indicates that the data are correctly transformed with a maximal value for
lambda of close to one, although the 95% confidence interval ranges from
approximately -0.75 to 2.66. Thus we can assume the model is linear in the
parameters.
20
Illustration 4: Box-Cox Operation Maximum Likelihood: Linear Model
The one remaining problem with the data is multicollinearity. Projected

GDP is correlated at 98% with the lagged money supply. Although this
correlation indicates removing one of the predictors from the model, removal
from this point is impossible. Upon removal of projected GDP, the money
supply variable's p-value increases to 0.3530. Removal of the money supply
variable causes projected GDP's p-value to go to 0.267.
CONCLUSIONS
The Model
This study indicates that together, the money supply and projected GDP
provide information that indicates the direction of the stock market. To
21
implement this model in making predictions about the stock market, first
predict the next year's GDP to the same level of accuracy as the Fed (no
small feat). Then, using the current year's end of year money supply and
stock market values, combined with the GDP projection, predict the stock
market's valuations with the following formula.
4.98091.9336ln ProjGDP 1.0676ln MoneySupply0.5758ln LaggedLCS
LCS =e
The transformation is justified by both the Box-Cox procedure as well as
the improvement in R-Squared of over 0.05.
Confidence Intervals
The minimum and maximum residuals are -0.22263 and 0.23279
respectively. To understand the difference, for any number exponential in e
greater than expected by 0.25, the exponential function is 28.4% greater
than expected. Similarly, for a number in e's exponent less than expected by
0.25, the result is 22.1% less than expected.
The residual standard error is 0.1368 with 21 degrees of freedom. Based
on the two tailed t-distribution with an alpha of 95%, the critical range is plus
or minus 2.08 standard errors. Thus the 95% confidence interval for the
regression is from 24.8% less than expected to 33.0% greater than expected.
This calculation indicates that this regression is no gold mine, and that even
with some expectation of a future value, there can be very large variance.
Error Comparison
Based on the sample the regression was calculated from, assuming
accurate projection of GDP, the next four years would have had this result.

Year
2005
2006
2007
2008
Beta:
LCSTOCK GDP
MRM2IMFNS
AUTOCOR
2657824.0 50553.5 7547 2533432.42
3077760.1 53595.7 7875 2657823.95
3246729.2 56290 8481 3077760.13
2045439.4 57765.7 9465 3246729.16
-4.9809 1.9336 -1.068
0.5758
22
Predicted
Abs Errors Squared Errors Abs%Error
3015493.21 357669.25767 127927297880 13.46%
3316359.46 238599.33319 56929641800.5 7.75%
3665965.51 419236.34554 175759113421 12.91%
3534858.84 1489419.4698 2218370357114 72.82%
Sums: 2504924.4062 2578986410215
Means: 626231.10156 644746602554 26.73%
MAE
MSE
MAPE
Table 7: Four Year Forecast and Error

This prediction uses actual cumulative quarterly GDP instead of projected
GDP (which as noted earlier, is only released by the Fed after a five year lag).
The Mean Absolute Percent Error is the best measure of error, since this out
of sample prediction is after significant growth in the stock market and other
variables, and the absolute errors and squared errors should be much larger.
Looking at the individual prediction percentage errors, for the first three
years, note an average over-prediction that ranges from 7.75% to 13.46%.
The MAPE of 26.73% is skewed high by the 2008 observation.
For the entire sample plus the next four years, the MAPE is 13.12%.
Relative to even the best of the moving average prediction methods, by this
measure, the regression is far superior.

Year
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
Beta:
LCSTOCK
140523.09
133623.41
162232.18
198750.65
211212.31
279138.19
330695.02
347990.37
406487.55
534490.48
517547.13
675657.77
727480.73
800156.05
810638.09
1114059.93
1371073.56
1828463.7
2351038.63
2845697.15
2586454.14
2279183.39
1775483.86
2285047.73
2533432.42
2657823.95
3077760.13
3246729.16
2045439.37
-4.9809
GDP MRM2IMFNSAUTOCOR
10219.1 1488.5 106119.24
11342.9
1620
140523.09
12479.4 1798.3 133623.41
12979.4 1967.2 162232.18
14604.2 2178.7 198750.65
15628.9 2386.8 211212.31
16478.2 2574.8 279138.19
17707.1 2832.6 330695.02
18951.2 2937.7 347990.37
20890.4 3101.2 406487.55
22115.7 3283.9 534490.48
22918.9 3431.7 517547.13
23689.3 3582.3 675657.77
25106.9 3662.7 727480.73
26916.3 3722.5 800156.05
28464.6 3733.8 810638.09
29644.5 3934.2 1114059.93
31702.1 4172.1 1371073.56
33760.6 4463.3 1828463.7
35286.5 4963.8 2351038.63
39068.3 5336.1 2845697.15
41928.5 5782.1 2586454.14
41733.8
6717
2279183.39
43518.7
7117
1775483.86
46639.7 7256.1 2285047.73
50553.5 7546.7 2533432.42
53595.7 7875.2 2657823.95
56290
8480.7 3077760.13
57765.7
9465
3246729.16
1.9336 -1.0676
0.5758
23
Predicted Abs ErrorsSquared Errors Abs%Error

124752.29 15770.80 248718251.35
11.22%
163920.03 30296.62 917885351.98
22.67%
171322.78 9090.599 82638994.598
5.60%
187800.16 10950.49 119913272.36
5.51%
237773.43 26561.12 705493083.59
12.58%
254694.48 24443.71 597495000.97
8.76%
305514.84 25180.18 634041355.26
7.61%
349602.05 1611.683 2597523.5158
0.46%
394866.81 11620.74 135041639.52
2.86%
492043.93 42446.55 1801709516
7.94%
605042.42 87495.29 7655425684
16.91%
607121.90 68535.87 4697165498
10.14%
720759.17 6721.556 45179310.683
0.92%
821835.76 21679.71 470009941.49
2.71%
976169.36 165531.3 27400600909
20.42%
1092297.79 21762.14 473590650.94
1.95%
1341884.08 29189.48 852025708.45
2.13%
1617168.71 211295.0 44645574834
11.56%
2005824.72 345213.9 119172641764
14.68%
2254232.08 591465.1 349830931632
20.78%
2836037.94 249583.8 62292075342
9.65%
2824486.32 545302.9 297355288996
23.93%
2217762.22 442278.4 195610143382
24.91%
1957991.01 327056.7 106966098805
14.31%
2535676.41 2243.989 5035487.8639
0.09%
3015493.21 357669.3 127927297880
13.46%
3316359.46 238599.3 56929641800
7.75%
3665965.51 419236.3 175759113421
12.91%
3534858.84 1489419 2.21837E+012
72.82%
Sums: 58182523.80170E+012
Means: 207794.7 135775133291
13.12%
MAE
MSE
MAPE
Table 8: Final Model's Error

It may be considered unfair to compare MAPE for the whole set of years
for a regression fitted to those years designed to minimize Mean Squared
Errors. However, there is little else to compare. Indeed this regression may be
the best approximation to predicting the next year's stock market levels.
Optimization of prediction notwithstanding, the variance may be far too high
to create profitable trading rules based on the data.
24
Summary
Since this model was arrived at over a series of iterative processes that
eliminated one variable at a time, it may be argued that the findings are
spurious, and the result of random chance. That the model is the result of
pure chance is unlikely to be the case, however.
In general, this model states that the stock market goes up when the
economy is expected to grow, and when the money supply is decreasing. The
effect for the economy is about twice as much as the effect for the money
supply.
Expectations of economic growth fuel speculation in stocks. When people
expect the economy to grow more, stock prices increase. When there is less
of an expectation for economic growth, stock prices do not increase as much.
The Fed acts to contract the money supply when the economy is growing
too fast. The stock market is known to be a leading indicator of economic
growth. It would make sense that the Fed would be tightening the money
supply as the stock market is increasing.
Sometimes time series data runs the risk of reaching a change point
where the effects being used for prediction cease to work (Chatfield, 2000). It
is unlikely that the effects found here will cease to predict, however. These
effects are the result of actions of or predictions by a United States
government chartered organization that has powerful control over
fundamental aspects of the economy.
The high correlation between the two factors is an element of concern. It
25
would make sense that if the Fed sees the economy growing above average
the next year that it would act today to reduce the money supply. This
reasoning would explain the high level of correlation. This interaction is
troubling, but each needs the other for its significance level in the model. And
without the two variables, the model is left with nothing but an
autocorrelation correction variable based on the previous year's market and
about a third higher residual standard error.
Low standard errors with many variables relative to the number of
observations may indicate a model that is over-fit, but the two variables (plus
the autocorrelation variable) do not seem to be too much relative to the size
of the data available. In retrospect, this model also has the lowest standard
errors, and since all of these models have a very high R-squared, optimizing
for standard errors while keeping the number of predictors small would seem
to be the best remaining approach.
26
REFERENCES
Chatfield, C. (2000). Time-Series Forecasting. Boca Raton: Chapman &
Hall/CRC.
Colby, R. W. (2003). The Encyclopedia of Technical Market Indicators. New
York: McGraw-Hill.
Faraway, J. J. (2005). Linear Models with R. Boca Raton: Chapman & Hall/CRC.
Harrington, J. P. (Ed.). (2008). Ibbotson SBBI 2009 Classic Yearbook: Market
Results for Stocks, Bonds, Bills, and Inflation 1926-2008. Chicago:
Morningstar.
R Development Core Team. (2009). R: A Language and Environment for
Statistical Computing. Vienna, Austria: R Foundation for Statistical
Computing. Retrieved from http://www.R-project.org
St. Louis Fed. (2010). St. Louis Fed: Download Data for Series: M2NS, M2
Money Stock. St. Louis Fed. Retrieved April 6, 2010, from
http://research.stlouisfed.org/fred2/series/M2NS/downloaddata?cid=48
Venables, W. N., & Ripley, B. D. (2002). Modern Applied Statistics with S (4th
ed.). New York: Springer. Retrieved from
http://www.stats.ox.ac.uk/pub/MASS4
Zeileis, A., & Hothorn, T. (2002). Diagnostic Checking in Regression
Relationships. R News, 2(3), 7-10.
27
APPENDIX A: ACKNOWLEDGEMENTS
GNU/Linux/Ubuntu
The GNU Community has developed or enabled the functioning of all of
the tools used to create this document. All of these tools are open-source
software packages that are free to use and free to modify. The Linux kernel
powered the computing. Ubuntu is a popular distribution of Linux, and the
source for software repositories that provided the operating system,
supporting software, and core tools (except Zotero).
R
This study was done in R, a powerful command-line statistical
programming package (R Development Core Team, 2009). The advantages of
a command-line interface are that one may maintain an exactly reproducible
copy of ones work (e.g. see Appendix B), while having complete access to
many powerful functions. The disadvantage is that the learning curve takes
longer to climb compared to graphical user interfaces.
OpenOffice.org
This paper was written in OpenOffice.org, an open-source version of
Sun's StarOffice. Writer was used for word processing and document
assembly. Calc was used for data manipulation, spreadsheet functions, and
table creation.
Other Tools
SciTE with R syntax highlighting was also used to manipulate the code.
Zotero Firefox and Writer plug-ins were used to manage citations.
28
APPENDIX B: R CODE
This is the console input/output. It requires the files to be in the location
provided, and the lmtest and MASS libraries. The command prompt is the
">" symbol, and the "#" symbol indicates a non-executing comment.
R version 2.9.2 (2009-08-24)
Copyright (C) 2009 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
REvolution R enhancements not installed. For improved
performance and other extensions: apt-get install revolution-r
> comb <- read.table("/home/aaron/Desktop/MBA Statistics/combined.csv", header
= TRUE)
> cor(comb, use= "pairwise.complete.obs")
Year
LCSTOCK
GNP
GDP SPLICEGROSS Inflation
Year
1.0000000 0.9261381 0.9971895 0.9939544
0.9887894 0.9975192
LCSTOCK
0.9261381 1.0000000 0.9715892 0.8080071
0.9340474 0.9154274
GNP
0.9971895 0.9715892 1.0000000
NA
0.9999981 0.9855178
GDP
0.9939544 0.8080071
NA 1.0000000
0.9999990 0.9943664
SPLICEGROSS 0.9887894 0.9340474 0.9999981 0.9999990
1.0000000 0.9782972
Inflation
0.9975192 0.9154274 0.9855178 0.9943664
0.9782972 1.0000000
FEDFUNDS
-0.8233347 -0.6218049 -0.8141758 -0.4857107 -0.7987852 -0.8081528
IMFNS
0.8859370 0.8180842 0.9549985 0.9533118
0.9287711 0.8753418
M2NS
0.9683431 0.8936886 0.9902620 0.9829273
0.9898840 0.9679235
M2IMFNS
0.9542132 0.8822151 0.9931752 0.9830307
0.9871683 0.9527529
AUTOCOR
0.9261381 0.9508532 0.9668297 0.8545207
0.9352194 0.9307222
AUTOCOR2
0.9261381 0.9060283 0.9720780 0.8874491
0.9320751 0.9238932
AUTOCOR3
0.9369234 0.8664926 0.9575144 0.9434643
0.9391325 0.9144456
MRINFL
0.9975192 0.9119503 0.9808353 0.9940487
0.9774272 0.9986158
MRFUNDS
-0.8080723 -0.6402895 -0.7588302 -0.3646230 -0.7814267 -0.7788310
MRIMFNS
0.8782463 0.8099785 0.9478959 0.9329031
0.9090652 0.8900329
MRM2NS
0.9696137 0.8894880 0.9940004 0.9675501
0.9867425 0.9741104
MRM2IMFNS
0.9547696 0.8778663 0.9947253 0.9609944
0.9814615 0.9621024
FEDFUNDS
IMFNS
M2NS
M2IMFNS
AUTOCOR
AUTOCOR2
Year
-0.8233347 0.8859370 0.9683431 0.9542132 0.9261381 0.9261381
LCSTOCK
-0.6218049 0.8180842 0.8936886 0.8822151 0.9508532 0.9060283
GNP
-0.8141758 0.9549985 0.9902620 0.9931752 0.9668297 0.9720780
GDP
-0.4857107 0.9533118 0.9829273 0.9830307 0.8545207 0.8874491
SPLICEGROSS -0.7987852 0.9287711 0.9898840 0.9871683 0.9352194 0.9320751
Inflation
-0.8081528 0.8753418 0.9679235 0.9527529 0.9307222 0.9238932
FEDFUNDS
1.0000000 -0.6737792 -0.7703007 -0.7510230 -0.6632477 -0.7031356

IMFNS
M2NS
M2IMFNS
AUTOCOR
AUTOCOR2
AUTOCOR3
MRINFL
MRFUNDS
MRIMFNS
MRM2NS
MRM2IMFNS
29
-0.6737792 1.0000000 0.9612459 0.9785086 0.8846286 0.9476079

-0.7703007 0.9612459 1.0000000 0.9974368 0.9093549 0.9487988
-0.7510230 0.9785086 0.9974368 1.0000000 0.9097530 0.9551136
-0.6632477 0.8846286 0.9093549 0.9097530 1.0000000 0.9508532
-0.7031356 0.9476079 0.9487988 0.9551136 0.9508532 1.0000000
-0.7960527 0.9413438 0.9476510 0.9520965 0.9060283 0.9508532
-0.8420432 0.8805821 0.9640316 0.9495986 0.9154274 0.9307222
0.8351315 -0.5872710 -0.7318301 -0.6988825 -0.6218049 -0.6558817
-0.6824676 0.9750173 0.9567361 0.9682330 0.8180842 0.9239063
-0.7584503 0.9539833 0.9985131 0.9937647 0.8936886 0.9404726
-0.7456812 0.9678341 0.9966462 0.9960230 0.8822151 0.9445584
AUTOCOR3
MRINFL
MRFUNDS
MRIMFNS
MRM2NS MRM2IMFNS
Year
0.9369234 0.9975192 -0.8080723 0.8782463 0.9696137 0.9547696
LCSTOCK
0.8664926 0.9119503 -0.6402895 0.8099785 0.8894880 0.8778663
GNP
0.9575144 0.9808353 -0.7588302 0.9478959 0.9940004 0.9947253
GDP
0.9434643 0.9940487 -0.3646230 0.9329031 0.9675501 0.9609944
SPLICEGROSS 0.9391325 0.9774272 -0.7814267 0.9090652 0.9867425 0.9814615
Inflation
0.9144456 0.9986158 -0.7788310 0.8900329 0.9741104 0.9621024
FEDFUNDS
-0.7960527 -0.8420432 0.8351315 -0.6824676 -0.7584503 -0.7456812
IMFNS
0.9413438 0.8805821 -0.5872710 0.9750173 0.9539833 0.9678341
M2NS
0.9476510 0.9640316 -0.7318301 0.9567361 0.9985131 0.9966462
M2IMFNS
0.9520965 0.9495986 -0.6988825 0.9682330 0.9937647 0.9960230
AUTOCOR
0.9060283 0.9154274 -0.6218049 0.8180842 0.8936886 0.8822151
AUTOCOR2
0.9508532 0.9307222 -0.6558817 0.9239063 0.9404726 0.9445584
AUTOCOR3
1.0000000 0.9238932 -0.6733649 0.9417894 0.9414619 0.9493800
MRINFL
0.9238932 1.0000000 -0.8081528 0.8753418 0.9679235 0.9527529
MRFUNDS
-0.6733649 -0.8081528 1.0000000 -0.6418000 -0.7506362 -0.7293701
MRIMFNS
0.9417894 0.8753418 -0.6418000 1.0000000 0.9538724 0.9741593
MRM2NS
0.9414619 0.9679235 -0.7506362 0.9538724 1.0000000 0.9970302
MRM2IMFNS
0.9493800 0.9527529 -0.7293701 0.9741593 0.9970302 1.0000000
>
> m <- lm(LCSTOCK ~ Year + SPLICEGROSS + MRINFL + MRFUNDS + MRM2IMFNS,comb,
na.action= na.exclude)
> summary(m)
Call:
lm(formula = LCSTOCK ~ Year + SPLICEGROSS + MRINFL + MRFUNDS +
MRM2IMFNS, data = comb, na.action = na.exclude)
Residuals:
Min
1Q
-364914 -203016
Median
-41612
3Q
106983
Max
820959
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.897e+08 3.763e+08 -1.036
0.3134
Year
1.982e+05 1.913e+05
1.036
0.3134
SPLICEGROSS 1.757e+02 8.159e+01
2.153
0.0444 *
MRINFL
-8.347e+02 5.898e+02 -1.415
0.1732
MRFUNDS
1.792e+04 3.871e+04
0.463
0.6486
MRM2IMFNS
-6.165e+02 2.555e+02 -2.413
0.0261 *
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 290400 on 19 degrees of freedom
(8 observations deleted due to missingness)
30
Multiple R-squared: 0.9219,

Adjusted R-squared: 0.9014
F-statistic: 44.87 on 5 and 19 DF, p-value: 7.147e-10
> plot(fitted(m), residuals(m), xlab="Fitted",ylab="Residuals")
> qqnorm(resid(m))
> shapiro.test(residuals(m))
Shapiro-Wilk normality test
data: residuals(m)
W = 0.9142, p-value = 0.03778
> library(lmtest)
Loading required package: zoo
Attaching package: 'zoo'
The following object(s) are masked from package:base :
as.Date.numeric
> dwtest(m)
Durbin-Watson test
data: m
DW = 1.0697, p-value = 0.0002858
alternative hypothesis: true autocorrelation is greater than 0
>
>
>
>
>
>
>
+
>
library(MASS)
boxcox(m,plotit=T)
boxcox(m,plotit=T,lambda=seq(-0.5,0.5,by=0.1))
malt <- lm(log(LCSTOCK) ~ Year + SPLICEGROSS + MRINFL + MRFUNDS + MRM2IMFNS
log(AUTOCOR),comb, na.action= na.exclude)
summary(malt)
Call:
lm(formula = log(LCSTOCK) ~ Year + SPLICEGROSS + MRINFL + MRFUNDS +
MRM2IMFNS + log(AUTOCOR), data = comb, na.action = na.exclude)
Residuals:
Min
1Q
Median
-0.224728 -0.071272 -0.002158
Coefficients:
(Intercept)
Year
SPLICEGROSS
MRINFL
MRFUNDS
MRM2IMFNS
3Q
0.078216
Max
0.189551

-4.631e+02 1.879e+02 -2.464
0.0240 *
2.387e-01 9.603e-02
2.485
0.0230 *
-2.748e-06 3.466e-05 -0.079
0.9377
-3.679e-04 2.572e-04 -1.430
0.1697
-9.905e-04 1.666e-02 -0.059
0.9533
-2.957e-04 1.246e-04 -2.373
0.0290 *
31
log(AUTOCOR) 3.814e-01 1.829e-01

2.085
0.0516 .
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 0.123 on 18 degrees of freedom
F-statistic: 271.3 on 6 and 18 DF, p-value: < 2.2e-16
> shapiro.test(residuals(malt))
data: residuals(malt)
W = 0.9747, p-value = 0.7653
> dwtest(malt)
Durbin-Watson test
data: malt
DW = 1.8845, p-value = 0.0847
> boxcox(malt,plotit=T)
>
> malt2 <- lm(log(LCSTOCK) ~ Year + log(SPLICEGROSS) + log(MRINFL) + MRFUNDS +
log(MRM2IMFNS) + log(AUTOCOR),comb, na.action= na.exclude)
> summary(malt2)
Call:
lm(formula = log(LCSTOCK) ~ Year + log(SPLICEGROSS) + log(MRINFL) +
MRFUNDS + log(MRM2IMFNS) + log(AUTOCOR), data = comb, na.action =
na.exclude)
Residuals:
Min
1Q
Median
-0.242438 -0.088883 -0.004927
3Q
0.094161
Max
0.211644
Coefficients:
(Intercept)
-72.479133 89.941534 -0.806 0.43085
Year
0.037791
0.048280
0.783 0.44395
log(SPLICEGROSS)
1.322121
1.589509
0.832 0.41643
log(MRINFL)
-0.004836
1.307447 -0.004 0.99709
MRFUNDS
-0.019492
0.018379 -1.061 0.30293
log(MRM2IMFNS)
-1.318564
0.621873 -2.120 0.04813 *
log(AUTOCOR)
0.620016
0.196808
3.150 0.00553 **
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
F-statistic: 209.8 on 6 and 18 DF, p-value: 1.178e-15
32
>
> malt3 <- lm(log(LCSTOCK) ~ Year + log(SPLICEGROSS) + MRFUNDS +
log(MRM2IMFNS) + log(AUTOCOR),comb, na.action= na.exclude)
> summary(malt3)
Call:
lm(formula = log(LCSTOCK) ~ Year + log(SPLICEGROSS) + MRFUNDS +
log(MRM2IMFNS) + log(AUTOCOR), data = comb, na.action = na.exclude)
Residuals:
Min
1Q
Median
-0.242398 -0.088944 -0.005036
3Q
0.094195
Max
0.211602
Coefficients:
(Intercept)
-72.58974
82.56247 -0.879 0.39027
Year
0.03784
0.04502
0.841 0.41102
log(SPLICEGROSS)
1.31767
1.00937
1.305 0.20733
MRFUNDS
-0.01945
0.01461 -1.331 0.19881
log(MRM2IMFNS)
-1.31744
0.52746 -2.498 0.02185 *
log(AUTOCOR)
0.62009
0.19068
3.252 0.00419 **
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
>
> malt4 <- lm(log(LCSTOCK) ~ log(SPLICEGROSS)
> summary(malt4)
+ MRFUNDS + log(MRM2IMFNS) +
Call:
lm(formula = log(LCSTOCK) ~ log(SPLICEGROSS) + MRFUNDS + log(MRM2IMFNS) +
log(AUTOCOR), data = comb, na.action = na.exclude)
Residuals:
Min
1Q
-0.24297 -0.08841
Median
0.01259
3Q
0.11157
Max
0.22009
Coefficients:

(Intercept)
-3.22647
2.76583 -1.167
0.2571
log(SPLICEGROSS) 1.83905
0.79046
2.327
0.0306 *
MRFUNDS
-0.01805
0.01441 -1.253
0.2246
log(MRM2IMFNS)
-1.24958
0.51741 -2.415
0.0254 *
log(AUTOCOR)
0.63590
0.18835
3.376
0.0030 **
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
F-statistic:
337 on 4 and 20 DF, p-value: < 2.2e-16
33
>
> #Final model, malt5
> malt5 <- lm(log(LCSTOCK) ~ log(SPLICEGROSS) + log(MRM2IMFNS) +
> summary(malt5)
Call:
lm(formula = log(LCSTOCK) ~ log(SPLICEGROSS) + log(MRM2IMFNS) +
Residuals:
Min
1Q
-0.22263 -0.09233
Median
0.01955
3Q
0.09150
Max
0.23279
Coefficients:
(Intercept)
-4.9809
2.4174 -2.060 0.05196 .
log(SPLICEGROSS)
1.9336
0.7975
2.425 0.02443 *
log(MRM2IMFNS)
-1.0676
0.5033 -2.121 0.04597 *
log(AUTOCOR)
0.5758
0.1846
3.119 0.00519 **
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
>
> malt6 <- lm(log(LCSTOCK) ~ log(SPLICEGROSS) + log(AUTOCOR),comb, na.action=
na.exclude)
> summary(malt6)
Call:
lm(formula = log(LCSTOCK) ~ log(SPLICEGROSS) + log(AUTOCOR),
data = comb, na.action = na.exclude)
Residuals:
Min
1Q
-3.358e-01 -1.018e-01
Median
4.623e-05
3Q
1.066e-01
Max
2.075e-01
Coefficients:

(Intercept)
-1.2297
1.6680 -0.737
0.468
log(SPLICEGROSS)
0.4299
0.3777
1.138
0.267
log(AUTOCOR)
0.7774
0.1647
4.721 9.34e-05 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
>
34
> malt7 <- lm(log(LCSTOCK) ~ log(MRM2IMFNS) + log(AUTOCOR),comb, na.action=

na.exclude)
> summary(malt5)
Call:
Residuals:
Min
1Q
-0.22263 -0.09233
Median
0.01955
3Q
0.09150
Max
0.23279
Coefficients:

(Intercept)
-4.9809
2.4174 -2.060 0.05196 .
log(SPLICEGROSS)
1.9336
0.7975
2.425 0.02443 *
log(MRM2IMFNS)
-1.0676
0.5033 -2.121 0.04597 *
log(AUTOCOR)
0.5758
0.1846
3.119 0.00519 **
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
>
> malt8 <- lm(log(LCSTOCK) ~ log(AUTOCOR),comb, na.action= na.exclude)
> summary(malt8)
Call:
lm(formula = log(LCSTOCK) ~ log(AUTOCOR), data = comb, na.action = na.exclude)
Residuals:
Min
1Q
-0.48053 -0.09394
Median
0.01333
3Q
0.12556
Max
0.22081
Coefficients:
(Intercept)
0.86816
0.35789
2.426
0.0220 *
log(AUTOCOR) 0.94333
0.02646 35.657
<2e-16 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
F-statistic: 1271 on 1 and 28 DF, p-value: < 2.2e-16
> comb2 <- read.table("/home/aaron/Desktop/MBA Statistics/combined2.csv",
header = TRUE)
>
> cor(comb2, use= "pairwise.complete.obs")
Year
LCSTOCK SPLICEGROSS MRM2IMFNS
AUTOCOR AUTOCOR2
Year
1.0000000 0.9261381
0.9887894 0.9547696 0.9261381 0.9261381
LCSTOCK
0.9261381 1.0000000
0.9340474 0.8778663 0.9508532 0.9060283

SPLICEGROSS
MRM2IMFNS
AUTOCOR
AUTOCOR2
AUTOCOR3
0.9887894 0.9340474
1.0000000 0.9814615 0.9352194
0.9547696 0.8778663
0.9814615 1.0000000 0.8822151
0.9261381 0.9508532
0.9352194 0.8822151 1.0000000
0.9261381 0.9060283
0.9320751 0.9445584 0.9508532
0.9261381 0.8664926
0.9391325 0.9493800 0.9060283
AUTOCOR3
Year
0.9261381
LCSTOCK
0.8664926
SPLICEGROSS 0.9391325
MRM2IMFNS
0.9493800
AUTOCOR
0.9060283
AUTOCOR2
0.9508532
AUTOCOR3
1.0000000
> m2 <- lm(log(LCSTOCK) ~ log(SPLICEGROSS) + log(MRM2IMFNS) +
log(AUTOCOR),comb2, na.action= na.exclude)
> summary(m2)
0.9320751
0.9445584
0.9508532
1.0000000
0.9508532
Call:
log(AUTOCOR), data = comb2, na.action = na.exclude)
Residuals:
Min
1Q
-0.22263 -0.09233
Median
0.01955
3Q
0.09150
Max
0.23279
Coefficients:
(Intercept)
-4.9809
2.4174 -2.060 0.05196 .
log(SPLICEGROSS)
1.9336
0.7975
2.425 0.02443 *
log(MRM2IMFNS)
-1.0676
0.5033 -2.121 0.04597 *
log(AUTOCOR)
0.5758
0.1846
3.119 0.00519 **
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
>
> plot(fitted(m2), residuals(m2), xlab="Fitted",ylab="Residuals")
> qqnorm(resid(m2))
> shapiro.test(residuals(m2))
data: residuals(m2)
W = 0.9527, p-value = 0.2885
> library(lmtest)
> dwtest(m2)
Durbin-Watson test
data: m2
DW = 1.8683, p-value = 0.1872
35
36

> library(MASS)
> boxcox(m2,plotit=T)
> boxcox(m2,plotit=T,lambda=seq(-1,3,by=0.1))
> m3 <- lm(log(LCSTOCK) ~ log(SPLICEGROSS) + log(MRM2IMFNS) +
log(SPLICEGROSS)*log(MRM2IMFNS) + log(AUTOCOR),comb2, na.action= na.exclude)
> summary(m3)
Call:
log(SPLICEGROSS) * log(MRM2IMFNS) + log(AUTOCOR), data = comb2,
na.action = na.exclude)
Residuals:
Min
1Q
-0.20975 -0.08643
Median
0.01519
3Q
0.08435
Max
0.22880
Coefficients:

(Intercept)
-11.86766
11.49803 -1.032 0.31432
log(SPLICEGROSS)
2.53960
1.27772
1.988 0.06072 .
log(MRM2IMFNS)
-0.10262
1.65485 -0.062 0.95117
log(AUTOCOR)
0.58928
0.18869
3.123 0.00536 **
log(SPLICEGROSS):log(MRM2IMFNS) -0.08825
0.14395 -0.613 0.54673
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
> # This creates the variables plot

> comb3 <- read.table("/home/aaron/Desktop/MBA Statistics/combined3.csv",
header = TRUE)
> plot(comb3)

Predictors of Stock Market Values

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Predictors of Stock Market Values

Hochgeladen von

Copyright:

Verfügbare Formate

PREDICTORS OF STOCK MARKET VALUES

April 13, 2010