Beruflich Dokumente
Kultur Dokumente
Let’s take a moment to think about why we would we even want to use
Bayesian techniques in the rst place. Well, there are a couple of
https://towardsdatascience.com/a-bayesian-approach-to-time-series-forecasting-d97dd4168cb7 1/23
2018/11/12 A Bayesian Approach to Time Series Forecasting – Towards Data Science
. . .
Bayesian Theory
Before we jump right into it, lets take a moment to discuss the basics of
Bayesian theory and how it applies to regression. Ordinarily, If
someone wanted to estimate a linear regression of the form:
https://towardsdatascience.com/a-bayesian-approach-to-time-series-forecasting-d97dd4168cb7 2/23
2018/11/12 A Bayesian Approach to Time Series Forecasting – Towards Data Science
Likelihood Function
OLS estimator
variance
where T is the number of rows in our data set. The main di erence
between the classical frequentist approach and the Bayesian approach
is that the parameters of the model are solely based on the information
contained in the data whereas the Bayesian approach allows us to
incorporate other information through the use of a prior. The table
below summarises the main di erences between frequentist and
Bayesian approaches.
So how do we use this prior information? Well, this is when Bayes Rule
comes into play. Remember the formula for Bayes rule is:
https://towardsdatascience.com/a-bayesian-approach-to-time-series-forecasting-d97dd4168cb7 3/23
2018/11/12 A Bayesian Approach to Time Series Forecasting – Towards Data Science
Bayes Rule
Here is a very clear explanation and example of Bayes rule being used.
It demonstrates how we can combine our prior knowledge with
evidence to form a posterior probability using a medical example.
Now let’s apply Bayes rule to our regression problem and see what we
get. Below is the posterior distribution of our parameters. Remember,
this is ultimately what we want to calculate.
Posterior Distribution
https://towardsdatascience.com/a-bayesian-approach-to-time-series-forecasting-d97dd4168cb7 4/23
2018/11/12 A Bayesian Approach to Time Series Forecasting – Towards Data Science
Gibbs Sampling
Imagine we have a joint distribution of N variables:
, repeating this until we have sampled each variable. This ends one
iteration of the Gibbs sampling algorithm. As we repeat these steps a
large number of times the samples from the conditional distributions
converge to the joint marginal distributions. Once we have M runs of
the Gibbs sampler, the mean of our retained draws can be thought of as
an approximation of the mean of the posterior distribution. Below is a
visualisation of the sampler in action for two variables. You can see how
initially the algorithm starts sampling from points outside our
distribution but starts to converge to it after a number of steps.
https://towardsdatascience.com/a-bayesian-approach-to-time-series-forecasting-d97dd4168cb7 5/23
2018/11/12 A Bayesian Approach to Time Series Forecasting – Towards Data Science
Now that we have the theory out of the way, let’s see how it works in
practice. Below is the code for implementing a linear regression using
the Gibbs sampler. In particular, I will estimate an AR(2) model on year
over year growth in quarterly US Gross Domestic Product (GDP). I will
then use this model to forecast GDP growth using a Bayesian
framework. Using this approach we can construct credible intervals
around our forecasts using quantiles from the posterior density i.e.
quantiles from the retained draws from our algorithm.
Model
Our model will have the following form:
AR(2) Model
B above which is just a vector of coe cients, and our matrix of data X
below.
https://towardsdatascience.com/a-bayesian-approach-to-time-series-forecasting-d97dd4168cb7 6/23
2018/11/12 A Bayesian Approach to Time Series Forecasting – Towards Data Science
This gives us the form in equation 1 up above. As I said already our goal
here is to approximate the posterior distribution of our coe cients:
coe cients
. . .
Code
The rst thing we need to do is load in the data. I downloaded US GDP
growth from the Federal Reserve of St.Louis website (FRED). I choose
p=2 as the number of lags I want to use. This choice is pretty arbitrary
and there are formal tests such as the AIC and BIC we can use to choose
the best number of lags but I haven’t used them for this analysis. What
is the old saying? Do as I say not as I do. I think this applies here. 😄
library(ggplot)
p = 2
T1 = nrow(Y)
https://towardsdatascience.com/a-bayesian-approach-to-time-series-forecasting-d97dd4168cb7 7/23
2018/11/12 A Bayesian Approach to Time Series Forecasting – Towards Data Science
matrix like the one below (Notice that the constant term is not
included):
Into an n*n matrix with the coe cients along the top row and an (n-
1)*(n-1) identity matrix below this. Expressing our matrix this way
allows us to calculate the stability of our model later on which will be
an important part of our Gibbs Sampler. I will talk about more about
this later in the post when we get to the relevant code.
https://towardsdatascience.com/a-bayesian-approach-to-time-series-forecasting-d97dd4168cb7 8/23
2018/11/12 A Bayesian Approach to Time Series Forecasting – Towards Data Science
return = list(Y=Y,X=X,nvar2=nvar2,nrow=nrow)
}
######################################################
##########
results = list()
results <- regression_matrix(Y, p, TRUE)
X <- results$X
Y <- results$Y
nrow <- results$nrow
nvar <- results$nvar
# Initialise Priors
B <- c(rep(0, nvar))
B <- as.matrix(B, nrow = 1, ncol = nvar)
sigma0 <- diag(1,nvar)
https://towardsdatascience.com/a-bayesian-approach-to-time-series-forecasting-d97dd4168cb7 9/23
2018/11/12 A Bayesian Approach to Time Series Forecasting – Towards Data Science
What we have done here is essentially set a normal prior for our Beta
coe cients which have mean = 0 and variance = 1. For our mean we
have priors:
reps = 15000
burn = 4000
horizon = 14
out = matrix(0, nrow = reps, ncol = nvar + 1)
colnames(out) <- c(‘constant’, ‘beta1’,’beta2',
‘sigma’)
out1 <- matrix(0, nrow = reps, ncol = horizon)
https://towardsdatascience.com/a-bayesian-approach-to-time-series-forecasting-d97dd4168cb7 10/23
2018/11/12 A Bayesian Approach to Time Series Forecasting – Towards Data Science
Above we set our forecast horizon and initialise some matrices to store
our results. We create a matrix called out which will store all of our
draws. It will need to have rows equal to the number of draws of our
sampler, which in this case is equal to 15,000. We also need to create a
matrix that will store the results of our forecasts. Since we are
calculating our forecasts by iterating an equation of the form:
AR(2) Model
We will need our last two observable periods to calculate the forecast.
This means our second matrix out1 will have columns equal to the
number of forecast periods plus the number of lags, 14 in this case.
gibbs_sampler <-
function(X,Y,B0,sigma0,sigma2,theta0,D0,reps,out,out1)
{
# Main Loop
for(i in 1:reps){
if (i %% 1000 == 0){
print(sprintf("Interation: %d", i))
}
V = solve(solve(sigma0) + as.numeric(1/sigma2) *
t(X) %*% X)
chck = -1
while(chck < 0){ # check for stability
https://towardsdatascience.com/a-bayesian-approach-to-time-series-forecasting-d97dd4168cb7 11/23
2018/11/12 A Bayesian Approach to Time Series Forecasting – Towards Data Science
b = ar_companion_matrix(B)
ee <- max(sapply(eigen(b)$values,abs))
if( ee<=1){
chck=1
}
}
# compute residuals
resids <- Y - X%*%B
T2 = T0 + T1
D1 = D0 + t(resids) %*% resids
for(m in (p+1):horizon){
for (lag in 1:p){
#create X matrix with p lags
X_mat[(lag+1)] = yhat[m-lag]
}
#Use X matrix to forecast yhat
yhat[m] = X_mat %*% B + rnorm(1) * cfactor
}
return = list(out,out1)
results1 <-
gibbs_sampler(X,Y,B0,sigma0,sigma2,T0,D0,reps,out,out1
)
https://towardsdatascience.com/a-bayesian-approach-to-time-series-forecasting-d97dd4168cb7 12/23
2018/11/12 A Bayesian Approach to Time Series Forecasting – Towards Data Science
First of all, we need the following arguments for our function. Our
initial variable, in this case, GDP Growth(Y). Our X matrix, which is just
Y lagged by 2 periods with a column of ones appended. We also need
all of our priors that we de ned earlier, the number of times to iterate
our algorithm (reps) and nally, our 2 output matrices.
The main loop is what we need to pay the most attention to here. This is
where all the main computations take place. The rst two equations M
and V describe the posterior mean and variance of the normal
distribution conditional on B and σ². I won’t derive these here, but if
you are interested they are available in Time Series Analysis by
Hamilton (1994) or in Bishop Pattern Recognition and Machine
Learning Chapter 3 (Albeit with slightly di erent notation). To be
explicit, the mean of our posterior parameter Beta is de ned as:
https://towardsdatascience.com/a-bayesian-approach-to-time-series-forecasting-d97dd4168cb7 13/23
2018/11/12 A Bayesian Approach to Time Series Forecasting – Towards Data Science
We are not done yet though. We still need to get a random draw from
the correct distribution but we can do this using a simple trick. To get a
random variable from a Normal distribution with mean M and variance
V we can sample a vector from a standard normal distribution and
transform it using the equation below.
Now that we have our draw of B, we draw sigma from the Inverse
Gamma distribution conditional on B. To sample a random variable
from the inverse Gamma distribution with degrees of freedom
and scale
https://towardsdatascience.com/a-bayesian-approach-to-time-series-forecasting-d97dd4168cb7 14/23
2018/11/12 A Bayesian Approach to Time Series Forecasting – Towards Data Science
The code below this stores our draws of the coe cients into our out
matrix. We then use these draws to create our forecasts. The code
essentially creates a matrix called yhat, to store our forecasts for 12
periods into the future (3 years since we are using quarterly data). Our
equation for a 1 step ahead forecast can be written as
Forecast equation
Now we can start looking at what the algorithm produced. The code
below extracts the coe cients that we need which correspond to the
columns of the coef matrix. Each row gives us the value of our
parameter for each draw of the Gibbs sampler. Calculating the mean
of each of these variables gives us an approximation of the posterior
mean of the distribution for each coe cient. This distribution can be
quite useful for other statistical techniques such as hypothesis testing
and is another advantage of taking a Bayesian approach to modelling.
Below I have plotted the posterior distribution of the coe cients using
ggplot2. We can see that they closely resemble a normal distribution
which makes sense given we de ned a normal prior and likelihood
function. The posterior means of our parameters are as follows:
https://towardsdatascience.com/a-bayesian-approach-to-time-series-forecasting-d97dd4168cb7 15/23
2018/11/12 A Bayesian Approach to Time Series Forecasting – Towards Data Science
https://towardsdatascience.com/a-bayesian-approach-to-time-series-forecasting-d97dd4168cb7 16/23
2018/11/12 A Bayesian Approach to Time Series Forecasting – Towards Data Science
https://towardsdatascience.com/a-bayesian-approach-to-time-series-forecasting-d97dd4168cb7 17/23
2018/11/12 A Bayesian Approach to Time Series Forecasting – Towards Data Science
The next thing we are going to do is use these parameters to plot our
forecasts and construct our credible intervals around these forecasts.
. . .
library(matrixStats); library(ggplot2);
library(reshape2)
https://towardsdatascience.com/a-bayesian-approach-to-time-series-forecasting-d97dd4168cb7 18/23
2018/11/12 A Bayesian Approach to Time Series Forecasting – Towards Data Science
c(0.16,0.84))
Y_temp = cbind(Y,Y)
GDP Forecast
https://towardsdatascience.com/a-bayesian-approach-to-time-series-forecasting-d97dd4168cb7 19/23
2018/11/12 A Bayesian Approach to Time Series Forecasting – Towards Data Science
I found a very helpful blog post which creates fan charts very similar to
the Bank of England's In ation reports. The library I use is called
fanplot and lets you plots di erent percentiles of our forecast
distribution which looks a little nicer than the previous graph.
library(fanplot)
forecasts_mean <- as.matrix(colMeans(out2))
forecast_sd <- as.matrix(apply(out2,2,sd))
tt <- seq(2018.25, 2021, by = .25)
y0 <- 2018.25
params <- cbind(tt, forecasts_mean[-c(1,2)],
forecast_sd[-c(1,2)])
p <- seq(0.10, 0.90, 0.05)
# Calculate Percentiles
k = nrow(params)
gdp <- matrix(NA, nrow = length(p), ncol = k)
for (i in 1:k)
gdp[, i] <- qsplitnorm(p, mode = params[i,2],
sd = params[i,3])
# BOE aesthetics
axis(2, at = -2:5, las = 2, tcl = 0.5, labels = FALSE)
axis(4, at = -2:5, las = 2, tcl = 0.5)
axis(1, at = 2000:2021, tcl = 0.5)
axis(1, at = seq(2000, 2021, 0.25), labels = FALSE,
tcl = 0.2)
https://towardsdatascience.com/a-bayesian-approach-to-time-series-forecasting-d97dd4168cb7 20/23
2018/11/12 A Bayesian Approach to Time Series Forecasting – Towards Data Science
abline(h = 0)
abline(v = y0 + 1.75, lty = 2) #2 year line
GDP Forecasts
Conclusion
Our forecast seems quite positive, with a mean prediction of annual
growth at around 3 per cent until 2021. There also seems to be quite a
bit of upside risk with the 95 per cent credible interval going up to
nearly 5 per cent. The graph indicates that it is highly unlikely that
there will be negative growth over the period which is quite interesting
and judging by the expansionary economic policies in the US right now
this may turn out to be correct. The con dence bands are pretty large as
you can see, indicating that there is a wide distribution on the value
that GDP growth could have over the forecast period. There are many
other types of models we could have used instead and probably get a
more accurate forecast such as Bayesian VAR’s or Dynamic Factor
models which use a number of other economic variables. While
potentially more accurate, these models are much more complex and a
lot more di cult to code up. For the purposes of an introduction to
Bayesian Regression and to get an intuitive understanding this
approach an AR model is perfectly reasonable.
https://towardsdatascience.com/a-bayesian-approach-to-time-series-forecasting-d97dd4168cb7 21/23
2018/11/12 A Bayesian Approach to Time Series Forecasting – Towards Data Science
way for me to learn something complex like this is to try and reproduce
the algorithm from scratch. This really reinforces what I have learned
theoretically and forces me to put it into an applied setting which can
be extremely di cult if I haven't fully understood the topic. I also nd
that this approach makes things stick in my head a bit longer. In
practice I probably wouldn't use this code as it is likely to break quite
easily and be hard to debug (as I have already discovered) but I think
this is a very e ective way to learn. While this obviously takes a lot
longer then just nding a package in R or Python for the task the
bene t from taking the time out to go through it step by step is
ultimately greater and I would recommend it to anyone trying to learn
or understand how di erent models and algorithms work.
OK folks, that concludes this post. I hope you all enjoyed it and learned
a bit about Bayesian Statistics and how we can use it in practice. If you
have any questions feel free to post below or connect with me via
LinkedIn.
https://towardsdatascience.com/a-bayesian-approach-to-time-series-forecasting-d97dd4168cb7 22/23
2018/11/12 A Bayesian Approach to Time Series Forecasting – Towards Data Science
https://towardsdatascience.com/a-bayesian-approach-to-time-series-forecasting-d97dd4168cb7 23/23