Sie sind auf Seite 1von 9

FRONT

NOTEBOOK

ARIMA FORECASTING WITH EXCEL AND R


Hello! Today I am going to walk you through an introduction to the ARIMA model and its
components, as well as a brief explanation of the Box-Jenkins method of how ARIMA
models are specified. Lastly, I created an Excel implementation using R, which I'll show
you how to set up and use.

Autoregressive Moving Average (ARMA) Models


The Autoregressive Moving Average model is used for modeling and forecasting
stationary, stochastic time-series processes. It is the combination of two previously
developed statistical techniques, the Autoregressive (AR) and Moving Average (MA)
models and was originally described by Peter Whittle in 1951. George E.P. Box and
Gwilym Jenkins popularized the model in 1971 by specifying discrete steps to model
identification, estimation, and verification. This process will be described later for
reference.
We will begin by introducing the ARMA model by its various components, the AR, and
MA models and then present a popular generalization of the ARMA model, ARIMA
(Autoregressive Integrated Moving Average) and forecasting and model specification
steps. Lastly, I will explain an Excel implementation I created and how to use it to make
your time series forecasts.

Autoregressive Models
The Autoregressive model is used for describing random processes and time-varying
processes and specifies the output variable depends linearly on its previous values.
The model is described as:
p
i=1

PORTFOLIO

CONTACT

RECENT

In Excel, R, Statistics

Xt = c +

i , Xt i

+ t

Where 1 , , are the parameters of the model, C is constant, and t is a white


noise term.

The Bisection Method of RootFinding with R


The Secant Method RootFinding Algorithm in R
The Newton-Raphson RootFinding Algorithm in R
Using R and SQL to Analyze
United States Electric Utilities
Simultaneous Confidence
Intervals with Bonferroni and
Working-Hotelling Procedures

CATEGORIES
APIs
Classification
CSS
Excel
Hypothesis Testing
IPython Notebook
Linear Algebra
Modding
Numerical Analysis
Post-Hoc Analysis
Python
R
Regression
SQL
Statistics
Uncategorized

Essentially, what the model describes is for any given value X(t) , it can be explained by
functions of its previous value. For a model with one parameter, = 1, X(t) is
explained by its past value X(t 1) and random error t . For a model with more than
one parameter, for example = 2, X(t) is given by X(t 1), X(t 2) and random
error t .

SITE
Blog
Contact

Moving Average Model

Home

The Moving Average (MA) model is used often for modeling univariate time series and is

Portfolio

defined as:
Xt = + t + 1 , t1 +

Xt = + t + 1 , t1 +

BOOKMARKS

+ q , tq

is the mean of the time series.

1 , , q

are the parameters of the model.

t , t1 ,

are the white noise error terms.

is the order of the Moving Average model.

The Moving Average model is a linear regression of the current value of the series
compared to t terms in the previous period, t , t1 . For example, a MA model of q = 1,
X(t)

is explained by the current error t in the same period and the past error value,

t 1

. For a model of order 2 (q = 2), X(t) is explained by the past two error values, t1

and t 2.
The AR(p ) and MA(q ) terms are used in the ARMA model, which will now be introduced.

Autoregressive Moving Average Model


Autoregressive Moving Average models use two polynomials, AR(p ) and MA(q ) and
describes a stationary stochastic process. A stationary process does not change when
shifted in time or space, therefore, a stationary process has constant mean and variance.
The ARMA model is often referred to in terms of its polynomials, ARMA(p, q ). The
notation of the model is written:
p

Xt = c + t +

i=1

1 Xt1

+ i = 1 i t 1

Selecting, estimating and verifying the model is described by the Box-Jenkins process.

Box-Jenkins Method for Model Identification


The below is more of an outline of the Box-Jenkins method, as the actual process of
finding these values can be quite overwhelming without a statistical package. The Excel
sheet included on this page automatically determines the best-fitting model.
The first step of the Box-Jenkins method is model identification. The step includes
identifying seasonality, differencing if necessary and determining the order of p and q by
plotting the autocorrelation and partial autocorrelation functions.
After the model is identified, the next step is estimating the parameters. Parameter
estimation uses statistical packages and computation algorithms to find the best fitting
parameters.
Once the parameters are chosen, the last step is checking the model. Model checking is
done by testing to see if the model conforms to a stationary univariate time series. One
should also confirm the residuals are independent of each other and exhibit constant
mean and variance over time, which can is done by performing a Ljung-Box test or again
plotting the autocorrelation and partial autocorrelation of the residuals.
Notice the first step involves checking for seasonality. If the data you are working with
contains seasonal trends, you "difference" in order to make the data stationary. This
differencing step generalizes the ARMA model into an ARIMA model, or Autoregressive
Integrated Moving Average, where 'Integrated' corresponds to the differencing step.

Autoregressive Integrated Moving Average Models


The ARIMA model has three parameters, p, d, q . In order to define the ARMA model to

R-Bloggers

The ARIMA model has three parameters, p, d, q . In order to define the ARMA model to
include the differencing term, we start by rearranging the standard ARMA model to
separate X(t) and t from the summation.

(1

i=1

i L )Xt

= (1 +

i=1

i L )t

Where L is the lag operator and i , i , t are autoregressive and moving average
parameters, and the error terms, respectively.

We now make the assumption the first polynomial of the function, (1 i=1 i Li ) has
p

a unitary root of multiplicity d. We can then rewrite it to the following:

(1

i=1

i L )

p d

= (1

i L )(1

i=1

L)

The ARIMA model expresses the polynomial factorisation with p = p


p

(1

i=1

and gives us:

i L )(1 L) Xt

= (1 +

i=1

i L )t

Lastly, we generalize the model further by adding a drift term, which defines the ARIMA
model as ARIMA(p, d, q ) with drift

(1

i=1

i L )(1 L) Xt
i

= + (1 +

i=1

qi L )t

With the model now defined, we can view the ARIMA model as two seperate parts, one
non-stationary and the other wide-sense stationary (joint probability distribution does not
change when shifted in time or space). The non-stationary model:
d

Yt = (1 L) Xt

The wide-sense stationary model:


p

(1

i=1

i L )Yt

= (1 +

i=1

i L )t

Forecasts can now be made on Yt using a generalized autoregressive forecasting


method.
Now that we have discussed the ARMA and ARIMA models, we now turn to how can
we use them in practical applications to provide forecasting. I've built an implementation
with Excel using R to make ARIMA forecasts as well as an option to run Monte Carlo
simulation on the model to determine the likelihood of the forecasts.

Excel Implementation and How to Use


Before using the sheet, you must download R and RExcel from the Statconn website. If
you already have R installed, you can just download RExcel. If you don't have R installed,
you can download RAndFriends which contains the latest version of R and RExcel.
Please Note, RExcel only works on 32bit Excel for its non-commercial license. If you
have 64bit Excel installed, you will have to get a commercial license from Statconn.
It is recommended to download RAndFriends as it makes for the quickest and easiest
installation; however, if you already have R and would like to install it manually, follow
these next steps.

these next steps.

Manually installing RExcel


To install RExcel and the other packages to make R work in Excel, first open R as an
Administrator by right-clicking on the .exe.

In the R console, install RExcel by typing the following statements:

library(RExcelInstaller)
installRExcel()
The above commands will install RExcel on your machine.
The next step is to install rcom, which is another package from Statconn for the RExcel
package. To install this, type the following commands, which will also automatically
install rscproxy as of R version 2.8.0.

library(rcom)
installstatconnDCOM()
comRegisterServer()
With these packages installed, you can move onto to setting the connection between R
and Excel.
Although not necessary to the installation, a handy package to download is Rcmdr,
developed by John Fox. Rcmdr creates R menus that can become menus in Excel. This
feature comes by default with the RAndFriends installation and makes several R
commands available in Excel.
Type the following commands into R to install Rcmdr.

library(Rcmdr)
installRcmdr()
We can create the link to R and Excel.
Note in recent versions of RExcel this connection is made with a simple double-click of
the provided .bat file "ActivateRExcel2010", so you should only need to follow these
steps if you manually installed R and RExcel or if for some reason the connection isn't
made during the RAndFriends installation.

Create the Connection Between R and Excel


Open a new book in Excel and navigate to the options screen.

Click Options and then Add-Ins. You should see a list of all the active and inactive add-ins
you currently have. Click the 'Go' button at the bottom.

On the Add-Ins dialog box, you will see all the add-in references you have made. Click on
Browse.

Navigate to the RExcel folder, usually located in C:Program FilesRExcelxls or something


similar. Find the RExcel.xla add-in and click it.
The next step is to create a reference in order for macros using R to work properly. In
your Excel doc, enter Alt + F11. This will open Excel's VBA editor. Go to Tools ->
References, and find the RExcel reference, 'RExcelVBAlib'. RExcel should now be ready
to use!

Using the Excel Sheet


Now that R and RExcel are properly configured, it's time to do some forecasting!
Open the forecasting sheet and click 'Load Server'. This is to start the RCom server and
also load the necessary functions to do the forecasting. A dialog box will open. Select
the 'itall.R' file included with the sheet. This file contains the functions the forecasting
tool uses. Most of the functions contained were developed by Professor Stoffer at the
University of Pittsburgh. They extend the capabilities of R and give us some helpful
diagnostic graphs along with our forecasting output. There is also a function to
automatically determine the best fitting parameters of the ARIMA model.

After the server loads, enter your data in the Data column. Select the range of the data,
right-click and select 'Name Range'. Name the range as 'Data'.

Next, set the frequency of your data in Cell C6. Frequency refers to the time periods of
your data. If it is weekly, the frequency would be 7. Monthly would be 12 while quarterly
would be 4, and so on.
Enter the periods ahead to forecast. Note that ARIMA models become quite inaccurate
after several successive frequency predictions. A good rule of thumb is not to exceed 30
steps as anything past that could be rather unreliable. This does depend on the size of
your data set as well. If you have limited data available, it is recommended to choose a
smaller steps ahead number.
After entering your data, naming it, and setting the desired frequency and steps ahead to
forecast, click Run. It may take a while for the forecasting to process.
forecastingresult
Once it's completed, you will get predicted values out to the number you specified, the
standard error of the results, and two charts. The left is the predicted values plotted with
the data, while the right contains handy diagnostics featuring standardized residuals, the
autocorrelation of the residuals, a gg plot of the residuals and a Ljung-Box statistics
graph to determine if the model is well fitted.
I won't get into too much detail on how you look for a well fitted model, but on the ACF
graph you don't want any (or a lot) of the lag spikes crossing over the dotted blue line.
On the gg plot, the more circles that go through the line, the more normalized and better
fitted the model is. For larger datasets this might cross a lot of circles. Lastly, the LjungBox test is an article in itself; however, the more circles that are above the dotted blue
line, the better the model is.
If the diagnostics result doesn't look good, you might try adding more data or starting at

If the diagnostics result doesn't look good, you might try adding more data or starting at
a different point closer to the range you want to forecast.
You can easily clear the generated results by clicking the 'Clear Forecasted Values'
buttons.
And that's it! Currently, the date column doesn't do anything other than for your
reference, but it's not necessary for the tool. If I find time, I'll go back and add that so
the displayed graph shows the correct time. You also might receive an error when
running the forecast. This is usually due to the function that finds the best parameters is
unable to determine the proper order. You can follow the above steps to try and arrange
your data better for the function to work.
I hope you get use out of the tool! It's saved me plenty of time at work, as now all I have
to do is enter the data, load the server and run it. I also hope this shows you how
awesome R can be, especially when used with a front-end such as Excel.
Download the workbook here: Forecasting_Tool_final
Code, Excel worksheet and .bas file are also on GitHub here.

TAGS:

A RI M A

T I M E SE RI E S

A UT O RE GRE SS I VE

T I M E SE RI E S A N A LY SI S

2 COMMENTS

E XCE L

FO RE CA S T I N G

MO VI N G A VE R A GE

P RE DI CT I O N

WEI October 16, 2015 at 12:11 pm

REPLY

I am just testing out R and RExcel, with the latest built of R ( x64 3.2.2) and
Office 2007. When I enter the data click on Run Forcast, I get
RExcel error 1002 in Class Module RExcel.RServer: Error running
expression eval(parse(text= ".rexcel.awemdam<(function(Data)find.best.arima(Data)).(rexcel..uhmhpmv)")).
Do you think the latest version of R has changed somehow?

AARON SCHLEGEL January 29, 2016 at 5:11 pm

REPLY

Hi Wei,
Apologies for my delay in responding! I'm not exactly sure why it
wouldn't be working. It was originally written with an old version
of R, so it could be that. Are you using x64 R and Excel? Because
the connection with R and Excel is done through RExcel, it'll only
work with 32 bit Excel.
I also recommend you check out the R forecast package by Rob
Hyndman. It's an amazing package that can automate forecasting
with ARIMA and a bunch of other methodologies. My method of
automatically selecting the parameters is rather primitive
compared to his. I might update or write a new post that utilizes
the forecast package, but since I've moved to 64 bit Excel I can't
use RExcel. Hope that helps!

POST A COMMENT

Write your comment here...

Your full name

E-mail address

Website

SUBMIT

Notify me of follow-up comments by email.


Notify me of new posts by email.

POSTS

PAGES

CATEGORIES

The Bisection Method of Root-Finding with R

Blog

APIs

The Secant Method Root-Finding Algorithm in R

Contact

Classification

The Newton-Raphson Root-Finding Algorithm in R

Home

CSS

Using R and SQL to Analyze United States Electric

Portfolio

Excel

Utilities

Hypothesis Testing

Simultaneous Confidence Intervals with Bonferroni and

IPython Notebook

Working-Hotelling Procedures

Linear Algebra
Modding
Numerical Analysis
Post-Hoc Analysis
Python
R
Regression
SQL
Statistics
Uncategorized

Das könnte Ihnen auch gefallen