Beruflich Dokumente
Kultur Dokumente
NOTEBOOK
Autoregressive Models
The Autoregressive model is used for describing random processes and time-varying
processes and specifies the output variable depends linearly on its previous values.
The model is described as:
p
i=1
PORTFOLIO
CONTACT
RECENT
In Excel, R, Statistics
Xt = c +
i , Xt i
+ t
CATEGORIES
APIs
Classification
CSS
Excel
Hypothesis Testing
IPython Notebook
Linear Algebra
Modding
Numerical Analysis
Post-Hoc Analysis
Python
R
Regression
SQL
Statistics
Uncategorized
Essentially, what the model describes is for any given value X(t) , it can be explained by
functions of its previous value. For a model with one parameter, = 1, X(t) is
explained by its past value X(t 1) and random error t . For a model with more than
one parameter, for example = 2, X(t) is given by X(t 1), X(t 2) and random
error t .
SITE
Blog
Contact
Home
The Moving Average (MA) model is used often for modeling univariate time series and is
Portfolio
defined as:
Xt = + t + 1 , t1 +
Xt = + t + 1 , t1 +
BOOKMARKS
+ q , tq
1 , , q
t , t1 ,
The Moving Average model is a linear regression of the current value of the series
compared to t terms in the previous period, t , t1 . For example, a MA model of q = 1,
X(t)
is explained by the current error t in the same period and the past error value,
t 1
. For a model of order 2 (q = 2), X(t) is explained by the past two error values, t1
and t 2.
The AR(p ) and MA(q ) terms are used in the ARMA model, which will now be introduced.
Xt = c + t +
i=1
1 Xt1
+ i = 1 i t 1
Selecting, estimating and verifying the model is described by the Box-Jenkins process.
R-Bloggers
The ARIMA model has three parameters, p, d, q . In order to define the ARMA model to
include the differencing term, we start by rearranging the standard ARMA model to
separate X(t) and t from the summation.
(1
i=1
i L )Xt
= (1 +
i=1
i L )t
Where L is the lag operator and i , i , t are autoregressive and moving average
parameters, and the error terms, respectively.
We now make the assumption the first polynomial of the function, (1 i=1 i Li ) has
p
(1
i=1
i L )
p d
= (1
i L )(1
i=1
L)
(1
i=1
i L )(1 L) Xt
= (1 +
i=1
i L )t
Lastly, we generalize the model further by adding a drift term, which defines the ARIMA
model as ARIMA(p, d, q ) with drift
(1
i=1
i L )(1 L) Xt
i
= + (1 +
i=1
qi L )t
With the model now defined, we can view the ARIMA model as two seperate parts, one
non-stationary and the other wide-sense stationary (joint probability distribution does not
change when shifted in time or space). The non-stationary model:
d
Yt = (1 L) Xt
(1
i=1
i L )Yt
= (1 +
i=1
i L )t
library(RExcelInstaller)
installRExcel()
The above commands will install RExcel on your machine.
The next step is to install rcom, which is another package from Statconn for the RExcel
package. To install this, type the following commands, which will also automatically
install rscproxy as of R version 2.8.0.
library(rcom)
installstatconnDCOM()
comRegisterServer()
With these packages installed, you can move onto to setting the connection between R
and Excel.
Although not necessary to the installation, a handy package to download is Rcmdr,
developed by John Fox. Rcmdr creates R menus that can become menus in Excel. This
feature comes by default with the RAndFriends installation and makes several R
commands available in Excel.
Type the following commands into R to install Rcmdr.
library(Rcmdr)
installRcmdr()
We can create the link to R and Excel.
Note in recent versions of RExcel this connection is made with a simple double-click of
the provided .bat file "ActivateRExcel2010", so you should only need to follow these
steps if you manually installed R and RExcel or if for some reason the connection isn't
made during the RAndFriends installation.
Click Options and then Add-Ins. You should see a list of all the active and inactive add-ins
you currently have. Click the 'Go' button at the bottom.
On the Add-Ins dialog box, you will see all the add-in references you have made. Click on
Browse.
After the server loads, enter your data in the Data column. Select the range of the data,
right-click and select 'Name Range'. Name the range as 'Data'.
Next, set the frequency of your data in Cell C6. Frequency refers to the time periods of
your data. If it is weekly, the frequency would be 7. Monthly would be 12 while quarterly
would be 4, and so on.
Enter the periods ahead to forecast. Note that ARIMA models become quite inaccurate
after several successive frequency predictions. A good rule of thumb is not to exceed 30
steps as anything past that could be rather unreliable. This does depend on the size of
your data set as well. If you have limited data available, it is recommended to choose a
smaller steps ahead number.
After entering your data, naming it, and setting the desired frequency and steps ahead to
forecast, click Run. It may take a while for the forecasting to process.
forecastingresult
Once it's completed, you will get predicted values out to the number you specified, the
standard error of the results, and two charts. The left is the predicted values plotted with
the data, while the right contains handy diagnostics featuring standardized residuals, the
autocorrelation of the residuals, a gg plot of the residuals and a Ljung-Box statistics
graph to determine if the model is well fitted.
I won't get into too much detail on how you look for a well fitted model, but on the ACF
graph you don't want any (or a lot) of the lag spikes crossing over the dotted blue line.
On the gg plot, the more circles that go through the line, the more normalized and better
fitted the model is. For larger datasets this might cross a lot of circles. Lastly, the LjungBox test is an article in itself; however, the more circles that are above the dotted blue
line, the better the model is.
If the diagnostics result doesn't look good, you might try adding more data or starting at
If the diagnostics result doesn't look good, you might try adding more data or starting at
a different point closer to the range you want to forecast.
You can easily clear the generated results by clicking the 'Clear Forecasted Values'
buttons.
And that's it! Currently, the date column doesn't do anything other than for your
reference, but it's not necessary for the tool. If I find time, I'll go back and add that so
the displayed graph shows the correct time. You also might receive an error when
running the forecast. This is usually due to the function that finds the best parameters is
unable to determine the proper order. You can follow the above steps to try and arrange
your data better for the function to work.
I hope you get use out of the tool! It's saved me plenty of time at work, as now all I have
to do is enter the data, load the server and run it. I also hope this shows you how
awesome R can be, especially when used with a front-end such as Excel.
Download the workbook here: Forecasting_Tool_final
Code, Excel worksheet and .bas file are also on GitHub here.
TAGS:
A RI M A
T I M E SE RI E S
A UT O RE GRE SS I VE
T I M E SE RI E S A N A LY SI S
2 COMMENTS
E XCE L
FO RE CA S T I N G
MO VI N G A VE R A GE
P RE DI CT I O N
REPLY
I am just testing out R and RExcel, with the latest built of R ( x64 3.2.2) and
Office 2007. When I enter the data click on Run Forcast, I get
RExcel error 1002 in Class Module RExcel.RServer: Error running
expression eval(parse(text= ".rexcel.awemdam<(function(Data)find.best.arima(Data)).(rexcel..uhmhpmv)")).
Do you think the latest version of R has changed somehow?
REPLY
Hi Wei,
Apologies for my delay in responding! I'm not exactly sure why it
wouldn't be working. It was originally written with an old version
of R, so it could be that. Are you using x64 R and Excel? Because
the connection with R and Excel is done through RExcel, it'll only
work with 32 bit Excel.
I also recommend you check out the R forecast package by Rob
Hyndman. It's an amazing package that can automate forecasting
with ARIMA and a bunch of other methodologies. My method of
automatically selecting the parameters is rather primitive
compared to his. I might update or write a new post that utilizes
the forecast package, but since I've moved to 64 bit Excel I can't
use RExcel. Hope that helps!
POST A COMMENT
E-mail address
Website
SUBMIT
POSTS
PAGES
CATEGORIES
Blog
APIs
Contact
Classification
Home
CSS
Portfolio
Excel
Utilities
Hypothesis Testing
IPython Notebook
Working-Hotelling Procedures
Linear Algebra
Modding
Numerical Analysis
Post-Hoc Analysis
Python
R
Regression
SQL
Statistics
Uncategorized