Sie sind auf Seite 1von 16

Multiple Linear Regression Model For Predicting GOLD Prices

Project report submitted to

Prof. Dhiman Bhadra Associate: Kshitij G Trivedi

For the requirements of the course

Probability and Statistics 3


On October 25, 2012 By Group E12
Vineet Singh SonamYadav SaurabhPhelixKachhap Arun Kumar K Rohit Raj Nikhil Pandey

Contents

1. 2. 3. 4. a.

Introduction ............................................................................................................................. 3 Data Description ...................................................................................................................... 3 Exploratory Analysis ............................................................................................................... 4 Regression Modeling ............................................................................................................... 7 Model Building .................................................................................................................... 7

b. Coefficient of Multiple Determination ................................................................................ 9 c. T-tests of Regression Coefficients ..................................................................................... 10

d. Residual Analysis............................................................................................................... 10 e. f. g. Multicollinearity Correction .............................................................................................. 12 Durbin Watson Test ........................................................................................................... 13 Positive Autocorrelation Correction: Lagged Variable ..................................................... 14

h. Model Validation ............................................................................................................... 15 5. 6. Conclusion ............................................................................................................................. 16 Further Improvement ............................................................................................................. 16

1.

Introduction

GOLD has attracted interest from all sorts of investors including several central banks, the IMF, hedge fund managers, and retail investors especially in India. In the last 5 years, GOLD has generated cumulative returns of ~130% v/s ~ -9% returns of the S&P 500 Index. Given its growing importance as an asset class, we intend to predict GOLD prices using the regression techniques learnt during the course.

2.

Data Description

The Gold Prices have been empirically observed to be related to several macroeconomic factors. For our analysis we have considered the following key variables 1. Value of Dollar (Euro v/s US Dollar): Since gold prices are denominated in US Dollar and US Dollar is the reserve currency, demand for gold increases with a loss in value of US dollar 2. Equity Indices (S&P 500 Index and BNY Mellon BRICs ADR Index): Lower correlation with equity indices will increase demand for gold for diversification in a portfolio 3. Commodity Prices (Thomson Reuters/Jefferies CRB Index): Rising commodity prices are a signal that inflationary pressures in the economy are building, the purchasing power of the dollar is declining and the gold price should be rising as well 4. Monetary Policy (US M1 Money Supply): Expansive monetary policy increases risks of high inflation thus increasing demand for gold as an inflation hedge 5. Inflation expectations (University of Michigan Expected Inflation): Rising inflation increases demand for real assets like gold as investors seek to hedge themselves against erosion in real value of money 6. Interest Rates (US Treasury Rate 1 year): Higher real interest rates increase opportunity cost of gold and decrease its demand 3

Our data sources are mentioned in Table 1


S No 1 2 3 4 5 6 FACTORS DEMAND Dollar Value Equity Indices Commodity Prices Monetary Policy Inflation Expectations Interest Rates AFFECTING INDEPENDENT VARIABLE MULTIPLE REGRESSION USD v/s EURO Exchange Rate S&P 500 Index BNY Mellon BRICs ADR Index IN DATA SOURCE REUTERS

REUTERS REUTERS REUTERS

TRB/J CRB Index US Money Supply M1

University of Michigan Expected Inflation REUTERS US Treasury Rate 1 year

REUTERS

Table 1: Variables and Data Sources

3.

Exploratory Analysis

We have used scatter plots to inspect whether a linear relationship exists between Gold Prices and each of the explanatory variables identified Similarly doing a scatter plot of gold rate with Value of Dollar (USD/EUR Rate), Equity Indices (S&P 500 Index and BNY Mellon BRICs ADR Index), Commodity Prices (CRB Index), Monetary Policy (US M1 Money Supply), Inflation Expectations we have obtained the following plots.
Scatterplot of Gold Price U$/Oz vs USD/EUR Rate
2000
2000

Scatterplot of Gold Price U$/Oz vs S&P 500 Index

1500

1500

Gold Price U$/Oz

1000

Gold Price U$/Oz

1000

500

500

0 0.6 0.7 0.8 0.9 USD/EUR Rate 1.0 1.1 1.2


700 800 900 1000 1100 1200 S&P 500 Index 1300 1400 1500 1600

Scatterplot of Gold Price U$/Oz vs BRICs ADR Index


2000 2000

Scatterplot of Gold Price U$/Oz vs CRB Index

1500

1500

Gold Price U$/Oz

Gold Price U$/Oz

1000

1000

500

500

1000

2000

3000 4000 BRICs ADR Index

5000

6000

7000

100

200

300 CRB Index

400

500

Scatterplot of Gold Price U$/Oz vs M1


2000 2000

Scatterplot of Gold Price U$/Oz vs Expected Inflation

1500

1500

Gold Price U$/Oz

1000

Gold Price U$/Oz

1000

500

500

1200

1400

1600 M1

1800

2000

2200

1.5

2.0

2.5

3.0 3.5 4.0 Expected Inflation

4.5

5.0

5.5

There seem to be a positive relationship between Gold Price v/s CRB index, Gold price v/s BRICs ADR index & gold rate v/s US Money Supply M1 which shows that gold rate will increase with increase in either of those values. The scatter plot of Gold price with value of dollar shows a negative relationship that implies when value of dollar decreases there is an increase in gold price. For Treasury Rates, based on the scatter plot obtained below we cannot draw any conclusion in terms of linearity We decide to plot the gold prices with Ln (Treasury Rate) (Treasury rate taken in natural logarithmic terms).The following scatter plot was obtained which shows that there exists a linear relationship between Gold Prices and Ln (Treasury Rate). Since the slope is negative we can infer that the gold price and treasury rate (taken in natural logarithmic terms) will have a negative association. As the value of LN treasury rate increases there would be a decrease in the gold prices. 5

Scatterplot of Gold Price U$/Oz vs Treasury Rate


2000 2000

Scatterplot of Gold Price U$/Oz vs LN_Treasury Rate

1500

1500

Gold Price U$/Oz

1000

Gold Price U$/Oz


0

1000

500

500

3 Treasury Rate

-2

-1

0 LN_Treasury Rate

Following is the scatter plot matrix for the gold rate date set. Owing to the variation in the range of the plots, there is no such perfect positive or negative relationship but they can be in a sense said to have a somewhat positive relation as depicted below
Matrix Plot of Explanatory Variables
800 1.0 0.8 0.6 USD/EUR Rate 1600 S&P 500 Index 1200 800 400 300 200 CRB Index 2000 M1 1600 1200 5000 2500 0 BRICs ADR Index 5.0 Expected Inflation 2 0 -2 0.6 0.8 1.0 200 300 400 0 2500 5000 -2 0 2 LN_Treasury Rate 3.5 2.0 1200 1600 1200 1600 2000 2.0 3.5 5.0

Few plots show positive association, few others negative, which is a little misleading. So in order to get the correct picture, we need to control certain explanatory variables and this is done with the help of multiple regression model 6

In addition to the scatter-plot matrix, a correlation matrix was constructed depicting the correlation coefficients between the response and the predictors and also between the predictors. Gold Price U$/Oz Gold Price S&P 500 Index BRICs M1 ADR Index

USD/EUR Rate

CRB Index

Expected Inflation

US$/Oz USD/EUR

-0.636 0 Rate S&P 500 0.209 0.022 Index 0.602 CRB Index 0 0.954 M1 0 BRICs ADR 0.814 0 Index Expected 0.296 0.001 Inflation LN_Treasury -0.756 0 Rate

-0.445 0 -0.833 0 -0.573 0 -0.783 0 -0.501 0 0.245 0.007

0.707 0 0.09 0.331 0.578 0 0.608 0 0.38 0

0.476 0 0.834 0 0.794 0 -0.03 0.743

0.659 0 0.179 0.052 -0.808 0

0.54 0 -0.35 0

0.138 0.136

Table 2: Correlation Matrix

4.

Regression Modeling a. Model Building

In scatter plots we used single explanatory variable for regression with the gold price. However in real life applications, the response of gold price will depend on more than one explanatory variable. Hence, the entire explanatory variable was taken into account at once to estimate the value of the response. Performing multiple regression in MINITAB, we got the following regressions equation:

MODEL A: ( ( ( ) ) ) ( ( ( ) ) ) ( )

Effect of value of dollar (USD/EUR): Since the slope of value of dollar is positive, controlling for rest of the explanatory variables, we can say that gold price is positively correlated to value of dollar. Specifically the predicted gold price increases by 572 for every one unit increase in value of dollar vs euro. Effect of value of S&P Index: Since the slope of S&P index is negative, controlling for rest of the explanatory variables, we can say that gold price is negatively correlated to S&P Index. Specifically the predicted gold price decreases by 0.0912 for every one unit increase in S&P Index. Effect of value of CRB Index: Since the slope of CRB index is positive, controlling for rest of the explanatory variables, we can say that gold price is positively correlated to CRB Index. Specifically the predicted gold price increases by 1.26 for every one unit increase in CRB Index. Effect of value of Money supply (M1): Since the slope of value of money supply is positive, controlling for rest of the explanatory variables, we can say that gold price is positively correlated to value of money supply. Specifically the predicted gold price increases by 0.955 for every one unit increase in value of money supply. Effect of value of BRICs ADR index: Since the slope of value of BRICs ADR index is positive, controlling for rest of the explanatory variables, we can say that gold price is positively correlated to value of BRICs ADR index. Specifically the predicted gold price increases by 0.0830 for every one unit increase in value of BRICs ADR index. Effect of value of inflation: Since the slope of value of inflation is negative, controlling for rest of the explanatory variables, we can say that gold price is negatively correlated to value of inflation. Specifically the predicted gold price decreases by 30.6 for every one unit increase in value of inflation. 8

Effect of value of US Treasury Rate: Since the slope of value of natural log of US treasury is negative, controlling for rest of the explanatory variables, we can say that gold price is negatively correlated to value of US Treasury. Specifically the predicted gold price decreases by 64.7 for every one unit increase in natural log value of US Treasury.

b. Coefficient of Multiple Determination


This coefficient measures the proportion of variation in Gold Prices, that is simulatneously explained by the set of predictors ( USD/EUR, S&P 500,Expected Inflation, BRICs ADR Index, CRB Index, US Money Supply M1 & US Treasury Rate). R2 is used in the simple regression setup. Evidently, 0< R2<1 with higher values of R2 indicating a better fitting model and vice versa. R2 is given by

However, R2 can only increase when additional predictor variables are added to the model. Increasing the predictors will also increase the number of parameters and the computational cost. In order to achieve a tradeoff between these two factors, an adjusted coefficient of multiple determination is used. ( Following results were obtained: Source Regression Residual Error Total DF 7 111 118 SS 17463193 282119 1774531 MS 2494742 2542 F 981.56 P 0 )

Table 3: ANOVA

So we have Thus taking USD/EUR, S&P 500,Expected Inflation, BRICs ADR Index, CRB Index, US Money Supply M1 & US Treasury Rate explains about 98.73% of the total change in gold price.

c. T-tests of Regression Coefficients


For testing the significance of each predictor, the null and alternative hypothesis are built

The test statistic is given by

The following results were obtained: Predictor Constant USD/EUR Rate S&P 500 Index BRICs ADR Index CRB Index M1 Expected Inflation Coefficient -1498.000 571.630 -0.091 0.083 1.261 0.955 -30.550 SE Coefficient 112.900 91.170 0.054 0.007 0.292 0.063 14.240 14.390 T statistic -13.270 6.270 -1.700 11.530 4.320 15.230 -2.140 -4.500 P Value 0.000 0.000 0.092 0.000 0.000 0.000 0.034 0.000 5.019 4.659 7.371 19.931 9.475 4.432 11.337 VIF

Ln (Treasury Rate) -64.730

Table 4: Regression Diagnostics

For 10% significane level, we can observe from the table the p-values are very small. Hence for each predictor, we reject the null hypothesis and accept the alternate. This means that evey predictor have a significant effect on the gold prices at 10% significance level. However at 5% significance level, the S&P 500 Index has high p-value and hence is not significant

d. Residual Analysis
To test the appropriateness of the multiple regression model we use the following procedure The residuals were plotted against fitted values to test for linearity of regression models and consistency of error variances. The residuals fluctuate more or less randomly about 0 with no 10

noticeable trend or variation. Hence we conclude that gold price can be assumed to be linearly related.
Versus Fits

(response is Gold Price U$/Oz) 3 2

Standardized Residual

1 0 -1 -2 -3 200 400 600 800 1000 1200 Fitted Value 1400 1600 1800

1) To check the validity of normal distributional assumption, the histogram on the normal probability plot of the residuals were done. The following two plots were obtained:
Normal Probability Plot
(response is Gold Price U$/Oz)
99.9 99 95 90 80 70 60 50 40 30 20 10 5 1 0.1

Percent

-3

-2

-1 0 1 Standardized Residual

11

(response is Gold Price U$/Oz) 25

Histogram

20

Frequency

15

10

-2

-1

0 1 Standardized Residual

The above plots indicate the errors can be assumed to have symmetric and bell shaped distribution. From the normal probability plot, we can deduce that the pattern is pretty much linear and error distribution can be assumed to be normal.

e. Multicollinearity Correction
To determine whether any of the variables in the model should be removed or not because of multicollinearity, step wise regression was done as given below. Stepwise Regression: Forward Selection and Backward Elimination Alpha-to-Enter: 0.05, Alpha-to-Remove: 0.05 Response is Gold Price U$/Oz on 7 predictors where Number of Observations = 119 The step wise regression was terminated at the end of six steps (Table 5) regression equation identified using MINITAB was MODEL B: ( ( ( ) ) ) ( ) ( ( ) )

12

Step Constant M1 T-Value P-Value BRICs ADR T-Value Index P-Value LN_Treasury T-Value Rate P-Value USD/EUR T-Value Rate P-Value CRB Index T-Value P-Value Expected T-Value Inflation P-Value S R-Sq R-Sq(adj) Mallows Cp

1 -1650.6 1.624 34.56 0

2 -1338.4 1.257 35.47 0 0.0725 15.72 0

3 -965.7 0.992 18.25 0 0.0834 18.78 0 -54.6 -5.97 0

4 -1314.4 1.049 19.58 0 0.0948 18.39 0 -45.7 -5.09 0 283 3.81 0

5 -1460.6 0.963 17.35 0 0.0808 13.19 0 -71.1 -6.58 0 449 5.43 0 0.72 3.8 0

116 91.08 91 507.8

66 97.15 97.1 86

57.9 97.82 97.77 40.9

54.8 98.07 98 25.8

51.8 98.29 98.21 12.5

6 -1486.3 0.914 15.67 0 0.0767 12.29 0 -80.1 -7.11 0 542 6.01 0 1.26 4.27 0 -33 -2.35 0.021 50.8 98.37 98.28 8.9

Table 5: Stepwise Regression Output

f. Durbin Watson Test


To detect the presence of autocorrelation in the residuals, Durbin Watson test was performed. Durbin Watson test statistic obtained from MINITAB was 0.729. This denotes a high positive autocorrelation because 0.729 < du. For k=6 From D-W Tables From D-W Tables By Interpolation n 100 150 119 dL 1.421 1.543 1.499
Table 6: Durbin Watson Test

dU 1.67 1.708 1.694

Our Durbin-Watson statistic of 0.729223 denotes high positive autocorrelation because 0.729 <dL<dU

13

g. Positive Autocorrelation Correction: Lagged Variable


To correct for autocorrelation we introduce a lagged variable Gold Price (-1) or the Gold Price for the previous month. The whole stepwise regression process was repeated with the lagged variable Gold Price (-1) and this time, the step wise regression was terminated at the end of 7 steps. The following regression was obtained with the new variables MODEL C: ( ( ( Step Constant Gold Price (T-Value 1) P-Value M1 T-Value P-Value BRICs ADR T-Value Index P-Value LN_Treasury T-Value Rate P-Value USD/EUR T-Value Rate P-Value CRB Index T-Value P-Value Expected T-Value Inflation P-Value S R-Sq R-Sq(adj) Mallows Cp 1 1.512 1.015 84.36 0 2 -185.654 0.914 24.21 0 0.177 2.82 0.006 ) 3 -433.719 0.696 12.26 0 0.401 5.43 0 0.0242 4.83 0 ( 4 -366.181 0.617 10.51 0 0.377 5.32 0 0.0346 6.12 0 -24.8 -3.46 0.001 ) ) ( ) 5 -564.617 0.581 9.68 0 0.436 5.83 0 0.043 6.37 0 -22.8 -3.22 0.002 139 2.19 0.03 6 -686.812 0.543 8.89 0 0.434 5.92 0 0.0394 5.82 0 -36.9 -4.02 0 231 3.15 0.002 0.36 2.35 0.021 7 -731.6 0.533 8.93 0 0.402 5.54 0 0.0371 5.57 0 -45.3 -4.76 0 325 4.06 0 0.83 3.55 0.001 -29 -2.61 0.01 39 99.04 98.98 7 ( ) ) ( )

49 98.41 98.4 66.7

47.5 98.51 98.49 57

43.5 98.77 98.73 30.3

41.5 98.89 98.85 18.8

40.8 98.93 98.88 15.6

40 98.98 98.93 11.8

Table 7: Stepwise Regression with Lagged Variable

14

The Durbin Watson test was performed with the new regression equation in Model C and we obtain Durbin-Watson statistic = 1.96059 which implies that there is no evidence for autocorrelation because dL < dU< 1.96059 < 2 For k=6 From D-W Tables From D-W Tables By Interpolation n 100 150 119 dL 1.421 1.543 1.499 dU 1.67 1.708 1.694

Table 8: Durbin-Watson Test for Model C

Hence, the final multiple regression model to predict the gold price is as follows: ( ( ( ) ) ( ) ( ) ) ( ( ) )

h. Model Validation
The gold prices were predicted using each of the three models discussed above and was compared with the original to detect the forecast accuracy.
1950 1900 1850 1800 1750

Model Validation

1700
1650 1600 1550 1500 01/12/11 01/01/12 01/02/12 01/03/12 01/04/12 01/05/12 01/06/12 01/07/12 01/08/12 01/09/12

ACTUAL Gold Price

Model C

Model B

Model A

15

Model C, obtained after correcting for positive autocorrelation is finally used to predict the gold prices because as observed in the chart above, it has better forecasting accuracy as compared to Model A and Model B.

5.

Conclusion

Forecasting Gold Prices can be useful for several investors and policy makers. We have utilized multiple linear regression to develop a model A that can predict Gold prices based on Exchange Rate (USD/EUR), S&P 500 Index, BRICs ADR Index, Commodity Prices (CRB Index), Money Supply (M1), Inflation and Ln(Treasury Rate). We have performed step wise regression to obtain model B and applied correction for multicollinearity. However, Durbin Watson test gave us evidence for positive autocorrelation which we corrected by using a lagged variable Gold Price for previous period (Gold Price-1). We finally obtained Model C which has better forecast or predictive power as compared to Model A and Model B. MODEL C: ( ( ( ) ( ) ) ( ) ( ) ) ( )

6.

Further Improvement

In our study we have not applied correction for heteroskedasticity as we assumed the variance was almost constant for all observations. Also, an even more sophisticated regression model can be obtained if we choose an appropriate lagged explanatory variable such that the correlation coefficient is maximized.

16

Das könnte Ihnen auch gefallen