Sie sind auf Seite 1von 15

GDP Forecasting Using Time Series Analysis

Amit Ranjan 14087


Anuj Nagpal 14116
Charu Bansal 14194
Mounica Nagavalli 14169
Raushan Joshi 14537

Course Project
MTH517 Time Series Analysis
IIT Kanpur

1
Contents
1 Introduction 3
1.1 GDP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.1 Nominal GDP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.2 Real GDP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.3 Components of GDP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Mathematical Background 4
2.1 Time Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.1 Related Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 ARIMA Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2.1 Criteria for choosing order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.3 Holt-Winters Seasonal Smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.4 Augmented Dickey-Fuller Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.5 Kwiatkowski Phillips Schmidt Shin (KPSS) Test . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.6 Ljung Box Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3 Experiments, Observations and Conclusions 6


3.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.2 Holt-Winters Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.3 ARIMA Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

4 Conclusion 15

5 Acknowledgement 15

2
1 Introduction
1.1 GDP
Gross domestic product (GDP) is a monetary measure of the market value of all final goods and services
produced in a period (quarterly or yearly) of time. Governments and businesses use GDP forecasts to help
them determine their strategy, multi-year plans, and budgets for the upcoming year

1.1.1 Nominal GDP


Nominal GDP is GDP evaluated at current market prices. Therefore, nominal GDP will include all of the
changes in market prices that have occurred during the current year due to inflation or deflation.

1.1.2 Real GDP


Real GDP is Nominal GDP per capita. It does not, however, reflect differences in the cost of living and the
inflation rates of the countries; therefore using a basis is arguably more useful when comparing differences in
living standards between nations.

1.1.3 Components of GDP


GDP (Y) is the sum of consumption (C), investment (I), government spending (G) and net exports (X M).

Y = C + I + G + (X M )
where

C (consumption) consists of private expenditures in the economy like durable goods, nondurable goods,
and services. Examples include food, rent, jewelry, gasoline, and medical expenses (not the purchase of
new housing)
I (investment) includes business investment in equipment, but does not include exchanges of existing
assets. Examples include construction of a new mine, purchase of software, or purchase of machinery and
equipment for a factory.
G (government spending) is the sum of government expenditures on final goods and services. It includes
salaries of public servants, purchases of weapons for the military and any investment expenditure by a
government. It does not include any transfer payments, such as social security or unemployment benefits.

X (exports) represents gross exports. GDP captures the amount a country produces, including goods and
services produced for other nations consumption
M (imports) represents gross imports. Imports are subtracted since imported goods will be included in
the terms G, I, or C, and must be deducted to avoid counting foreign supply as domestic.

3
2 Mathematical Background
2.1 Time Series
A time series of observations recorded sequentially over a period of time (i.e. a collection of observations recorded
along with the time stamp) represented as (t, Xt ). Xt may be univariate (single variable) or multivariate
(collection of variables)

2.1.1 Related Terminology


Trend - Smooth long term characteristics of a time series.
Seasonality - Patterns of change in a time series within a year which tends to repeat each year
Stationarity - A stationary time series is one whose statistical properties such as mean, variance, auto-
correlation, etc. are all constant over time.

2.2 ARIMA Model


Auto Regressive Integrated Moving Average Models are denoted by ARIMA(p, d, q) where p is the order
(number of time lags) of the autoregressive model, d is the degree of differencing (the number of times the data
have had past values subtracted), and q is the order of the moving-average model.

(1 pk=1 k B k )(1 B)d Xt = (1 + qk=1 k B k )t

2.2.1 Criteria for choosing order


Akaike Information Criterion (AIC)

AIC = 2log(L) + 2k

where k is the number of parameters in the model being fitted to the data (p + q + 1).
Bayesian Information Criterion (BIC)

BIC = 2log(L) + lognk

where k is same as in AIC and n is the sample size.

2.3 Holt-Winters Seasonal Smoothing


Holt-Winters Seasonal Smoothing or Triple Exponential Smoothing takes into account seasonal changes as well
as trends for forecasting. Mathematically,

lx = (yx sxL ) + (1 )(lx1 + bx1 )


bx = (lx lx1 ) + (1 )bx1
sx = (yx lx ) + (1 )sxL
yx+m = lx + mbx + sxL+1+(m1)modL (AdditveM odel)
yx+m = (lx + mbx ) sxL+1+(m1)modL (M ultiplicativeM odel)

where
yx+m is the forecasted value m number of points into the future
lx or level is the expected value of the xth data point.
bx is the trend or slope
L is the season length
is the smoothing coefficient for series data points.
is the trend factor or coefficient.
is the smoothing factor for the seasonal component.
1 xL+1 x1 xL+2 x2 xL+3 x3 xL+L xL
Initialize s0 = x0 and b0 = L( L + L + L + ... + L )

4
2.4 Augmented Dickey-Fuller Test
Augmented Dickey Fuller test (ADF) tests the null hypothesis that a unit root is present in a time series sample
and the alternative hypothesis is usually stationarity or trend-stationarity, depending on which version of the
test is used.
The intuition behind the test is as follows. If the series Xt is stationary (or trend stationary), then it has a
tendency to return to a constant (or deterministically trending) mean. Therefore large values will tend to be
followed by smaller values (negative changes), and small values by larger values (positive changes). Accordingly,
the level of the series will be a significant predictor of next periods change, and will have a negative coefficient.
You usually reject the null when the p-value is less than or equal to a specified significance level, often 0.05
(5%)

2.5 Kwiatkowski Phillips Schmidt Shin (KPSS) Test


Kwiatkowski Phillips Schmidt Shin (KPSS) tests are used for testing a null hypothesis that an observable time
series is trend stationary (i.e. stationary around a deterministic trend) against the alternative of a unit root.
It breaks up a series into three parts: a deterministic trend (t ), a random walk (rt ), and a stationary error
(t ), with the regression equation:
xt = rt + t + t
If the data is stationary, it will have a fixed element for an intercept or the series will be stationary around a
fixed level. Again we usually reject the null when the p-value is less than or equal to a specified significance
level, often 0.05 (5%)

2.6 Ljung Box Test


The Ljung Box statistical test hypothesis are:
Null Hypothesis or H0 : The data are independently distributed (i.e. the correlations in the population
from which the sample is taken are 0, so that any observed correlations in the data result from randomness
of the sampling process).
Alternative Hypothesis or Ha : The data are not independently distributed; they exhibit serial
correlation.
Test Statistic used:
2k
Q = n(n + 2)hk=1
nk
where n is the sample size, k is the sample autocorrelation at lag k, and h is the number of lags being tested.
Under H0 , the statistic Q follows a 2h . For significance level , the critical region for rejection of the hypothesis
of randomness is:
Q > 21,h
where 21,h is the 1 quantile of the chi-squared distribution with h degrees of freedom.

5
3 Experiments, Observations and Conclusions
3.1 Dataset
We have worked on yearly real GDP data of India in local currency unit that is Rupees for the period 1960-2016.
Since the GDP values were too big, we have worked with loge of GDP values.

Figure 1: Plot of log of GDP vs Time

3.2 Holt-Winters Method


Following are the results obtained from Holt-Winters Smoothing Method:
= 0.8428365
= 0.1282433
(seasonality parameter) not applicable since we had yearly GDP data and not quarterly data.
is close to 1 implying that the forecast gives more weightage to the recent values.
a and b values, if trend is a + bh, came out to be 32.43400540 and 0.06901646 respectively.
Sum of squared errors came out to be 0.04408241 for known values

6
Figure 2: Comparison of the HW predicted and actual values of log GDP

Summary of forecasted values:

Point Confidence Interval


Year
Forecast 80% 95%
2017 32.503 32.467-32.538 32.448-32.557
2018 32.572 32.522-32.621 32.496-32.647
2019 32.641 32.578-32.703 32.545-32.736
2020 32.710 32.634-32.785 32.594-32.825
2021 32.779 32.690-32.867 32.644-32.913
2022 32.848 32.746-32.949 32.693-33.002
2023 32.917 32.802-33.031 32.742-33.092
2024 32.986 32.858-33.114 32.790-33.182
2025 33.055 32.913-33.197 32.837-33.272
2026 33.124 32.967-33.280 32.885-33.363

7
Figure 3: Forecasted Values of log GDP

The Holt Winters method assumes that the residuals are normally distributed with zero mean and are
uncorrelated with constant variance to predict the CI.To test this we plot various graphs and use the
Box-Ljung test.

Figure 4: Plot of residuals (expected-observed) vs Year for Holt Winters Method

The above graph of residuals suggests a zero mean and constant variance.
The Box Ljung test gives us the following results
X-squared = 24.382, df = 20, p-value = 0.2261
suggesting that the residuals are uncorrelated.

8
The ACF graph of residuals also supports this as all correlation values are below significance line.

Figure 5: ACF of residuals of HW fitting

The histogram of the residual errors suggests a fairly normal distribution with a slight skewness towards
the left. Hence we conclude that our estimated CI are correct.

9
3.3 ARIMA Model
Applied Augmented Dickey Fuller(ADF) test to check for stationarity giving the following results
On log GDP
Dickey-Fuller = -0.46268, Lag order = 3, p-value = 0.9804
p-value > critical value implying non stationarity
On differencing series for order 1
Dickey-Fuller = -6.61, Lag order = 3, p-value = 0.01
p-value < critical value implying stationarity
On differencing series for order 2
Dickey-Fuller = -6.7314, Lag order = 3, p-value = 0.01
p-value < critical value implying stationarity
The plots of these series however suggest that the diff series of order 1 is not stationary but of order 2 is
stationary

Figure 6: Plot of log GDP vs Time

10
Figure 7: Plot of 5(log GDP) vs Time

Figure 8: Plot of 52 (log GDP) vs Time

Applying KPSS test on the series yields the following results

On log series
Level Stationarity : KPSS Level = 2.8804, Truncation lag parameter = 1, p-value = 0.01
Trend Stationarity : KPSS Trend = 0.70213, Truncation lag parameter = 1, p-value = 0.01
Data is neither trend nor level stationary
On diff of log series
Level Stationarity : KPSS Level = 1.1825, Truncation lag parameter = 1, p-value = 0.01

11
Trend Stationarity : KPSS Trend = 0.028799, Truncation lag parameter = 1, p-value = 0.1
Data is trend stationary but not level stationary
On diff of order 2 of log series
Level Stationarity : KPSS Level = 0.015645, Truncation lag parameter = 1, p-value = 0.1
Trend Stationarity : KPSS Trend = 0.015604, Truncation lag parameter = 1, p-value = 0.1
Data is both trend stationary and level stationary
Hence we conclude that the d parameter in ARIMA(p,d,q) process is 2.
To conclude the p and q values we observe the ACF and PACF of the 52 log(GDP)

Figure 9: ACF plot of 52 log(GDP)

Figure 10: PACF plot of 52 log(GDP)

12
p/q 0 1
0 -196.0436 -229.5718
1 -207.7324 -228.0191
2 -217.1027 -227.7479
3 -215.9419 -225.7616
4 -221.2816 -229.9328
5 -224.6599 -228.9041
6 -227.9001 -227.9575
7 -229.4173 -227.9231

Table 1: AIC values

p/q 0 1
0 -194.0363 -225.5572
1 -203.7177 -221.9971
2 -211.0807 -219.7185
3 -207.9126 -215.7249
4 -211.2449 -217.8888
5 -212.6159 -214.8528
6 -213.8488 -211.8989
7 -213.3586 -209.8571

Table 2: BIC values

The ACF tails off after lag 1 while PACF tails off after lag 7 implying p1 and q7 . Calculating the
AIC and BIC of all possible values we get
Both the AIC and BIC values suggest an ARIMA(0,2,1) model.Fitting this we obtain the root as shown

Figure 11: ARIMA(0,2,1) coefficient

The residual plot suggests zero mean and a fairly constant variance.
The sum of squares error was found to be 0.04655945 for the known values

13
Figure 12: Residuals of ARIMA process

Figure 13: Forecasting Using ARIMA(0,2,1)

14
4 Conclusion
The GDP data seems to follow ARIMA(0,2,1) process.
Sum of squared errors was found to be nearly same for both Holt Winters and ARIMA fitting methods
making them equally reliable.

5 Acknowledgement
Wikipedia, the free encyclopedia : https://en.wikipedia.org/wiki/Main_Page
Forecasting GDP growth : homepage.univie.ac.at/robert.kunst/070107_efc.pdf

Using R for Time Series Analysis : http://a-little-book-of-r-for-time-series.readthedocs.io/


en/latest/src/timeseries.html
Statistics How To : www.statisticshowto.com

15

Das könnte Ihnen auch gefallen