The Use of The Variogram in Time Series Analysis

The use of the Variogram in
Time series analysis
George Krasadakis, University of Bath, 1997
1
1 General
The Variogram is a standard tool for assessing spatial dependence in Geostatistics.
Although time can be considered as a one dimensional space, it is only
recently that the Variogram has been proposed as a useful tool in Time Series
Analysis. Since it is a function that characterises second order dependence
properties it is regarded as an alternative tool to the Autocovariance function.
The main theoretical advantage that Variogram holds in comparison to the
Autocovariance function is that it is defined for certain non stationary models
where the Autocovariance is not.
In the current Chapter we will present a summary of the relevant literature.
Although most of the work that has already been done concerns the Variogram
in the Spatial Statistics concept, we focus on the papers and textbooks that
suggest the use of this tool in time series problems.
Chapter 2 gives a brief description of the software package that has been
developed for the purposes of this project. It gives the most basic technical
properties but a detailed user guide.
In Chapter 3 we set up the necessary notation and background and then
we derive the theoretical form of the Variogram for several models, both
stationary and non-stationary. In each case the proofs are given in detail.
We also examine, using simulation, the performance of the relevant estimator.
In Chapter 4 we apply the results obtained in the previous Chapter to
several real datasets and we compare the resulting model with the one (or
more) suggested by applying the Box Jenkins methodology.
2 Literature Review
The Variogram, although widely used in Geostatistics, had a very limited
press in the classical Statistical literature. According to Haslett (1997) in
the current index to Statistics database only 29 papers (out of a total 96089)
have the term Variogram in their title. Furthermore only a few of these apply
the Variogram in context other than Geostatistics. In contrast, 210 papers,
out of the same total, have the term Autocorrelation in their title. It is due to
Cressie (1988) and Diggle (1988, 1990) that it is now becoming more familiar
in the classical Statistical literature. In the next paragraphs we discuss the
papers and textbooks that are relevant to the Use of the Variogram in Time
Series Analysis.
2
Davies & Tremayne (1997) introduce the use of a standardised form of the
Variogram (given by formula 3.3 in Chapter 3) and its first differences as two
graphical devices that can be successfully applied in preliminary data analysis
in Time Series. They show that these two devices can take recognisable forms
for large datasets and hence be useful in identifying the appropriate model
for a given time series. They present the form of the theoretical standardised
Variogram for certain processes, including linear non stationary ones and
random coefficient autoregressive models. Finally they apply these results
on some real datasets and conclude that these tools are likely to be useful in
the model identification stage. The ideas illustrated in this paper were found
very helpful through out this project.
This standardised form of the Variogram has been already suggested
by Box, Jenkins & Reinsel (1994) but in the context of Statistical Process
Control. It is mentioned as a helpful tool that can characterise process
disturbances. They point out that for an uncontrolled disturbance a kind of
monotonically increasing Variogram would be expected, while for a process
that is perfectly stable the standardised Variogram should be a constant.
Robinson (1989) considered with the theoretical advantages of using the
Variogram instead of the Autocorrelation function in Time Series Analysis.
He also gave some simulation results indicating the superiority of the Classical
Semivariogram estimator over the estimated Autocorrelation function, especially
for small sample sizes. Finally he illustrates the use of the Variogram in
problems that involve kriging.
Haslett (1997) is concerned with the estimation of the Covariance structure
in time series for which the classical stationarity conditions may not be
satisfied. He discuss the basic theoretical aspects of the Variogram and
also the advantages of using the sample Variogram instead of the sample
Autocorrelation function for non stationary time series. He also suggests
an alternative estimator of the Classical Variogram (given by formula 3.4 in
Chapter 3). He has also mentioned the additive property of the Variogram,
when the process under study is a sum of independent processes and that,
unlike the Autocorrelation function, the Variogram is well defined for data
that arise irregularly in time. Finally the results are applied in the context
of a time series of the Earth temperatures since the mid 19th century, which
has also been identified as a long memory process (Smith 1993).
Diggle (1996) considers the use of the Variogram in the context of unequally
spaced time series. He uses the empirical Variogram as an indirect way to
estimate the Autocorrelation function as well as a source of information on
3
whether the process is stationary or not.
Beran (1994) discuss the use of the Semivariogram as a tool for identifying
long memory dependence in time series. He also compares the estimated
Semivariogram with the Correlogram (the plot of the estimated autocorrelations)
and he concludes that although there are problems in the use of the estimated
Semivariogram too, it is more informative than the correlogram. Finally
he points out the basic theoretical advantage of the Variogram against the
Autocovariance i.e. that it exists for certain non stationary models.
Cressie (1988) introduces the use of Generalised Covariances in Time
Series Analysis. He mentions the fact that exploratory techniques are particularly
useful in determining the degree of differencing required in order to transform
a non stationary series to a stationary one. He points out that processes of
higher order difference stationary than unity do not have a valid Variogram
since they are not intrinsically stationary (their Variogram depends on time)
and hence he illustrates the use of some further devices named Linvariogram
and Quadvariogram. These devices are applicable to I(2) and I(3) processes
(i.e. d=2 and d=3 difference stationary process). The main problem with
these devices is that in practical situations we do not know whether our data
come from an I(2) or an I(3) process.
Cressie (1991) gives a summary of the Statistical properties of several
Variogram estimators but in the Spatial Statistics context.
Several formal techniques for determining the required degree of difference
in order to achieve stationarity have been proposed. Miller and Newbold
(1995) use the standardised Variogram in order to test whether an observed
series is generated by an ARIMA(p,1,q) model with unknown parameters, or
not. Other formal procedures for testing the existence of unit roots are
proposed by Dickey and Fuller (1979), Said and Dickey (1984), Schwert
(1989) and many other authors. These formal procedures are out of the scope
of this project since we will only examine exploratory techniques which can
be used as preliminary identification tools.
The software package
3 Introduction
A major part of this project was the design and implementation of a software
package for the application of the methods that will be discussed in the
following Chapters. The main technical features of this package are illustrated
4
in section 2.2. Since a full description of this program requires extended
terminology and knowledge of specialised programming techniques (especially
the PHIGS part) only the major characteristics are highlighted. For further
technical details the reader should look at the appendix of this project, where
some of the most important parts of the source code are presented (of course
not all of them since the size of the source file is about 1500 lines...). The
relevant PHIGS references can also be found helpful in making these concepts
clear. Section 2.3.1 gives the basic directions in order to compile and execute
the program, while section 2.3.2 describes the Graphical User Interface and
explains in detail how it can be used in a simple and efficient way.
5
4 A brief Technical Description
The package has been developed using the C language and the SunPHIGS
graphics library. Special effort has been given so that the program is computationally
efficient. A structured as well as defencing style of programming has been
followed. As a result of that, the application provides very short execution
times and also quick updates of the graphical windows. Furthermore, this
style of programming ensures portability and enables an easy extension of
the package. The design of the program consisted of two basic parts:
The Development of the Graphical Environment
This part of the program takes advantage of the SunPHIGS graphics library
in order to setup the Graphical environment. It uses 20 structures organised
in an Hierarchical way leading to a structure network. Most of the structure
elements are Line, Marker or Text primitives. The structure network is
posted to the Workstation using the (void) functions set_up_views() and
built_css(). The first of these determines the size and the location of each
view in the screen while the second creates the Centralised Structure Store
(CSS) where all the graphical information are stored. Dynamic Modification
techniques have been used in order to achieve quick updates of the graphical
output. The function input_loop() controls the input devices (mouse &
keyboard), so that the user’s choices can activate the right procedure.
The Development of the Numerical Techniques
Several functions have been written for this part. Most of them are void
functions and will be briefly discussed in the next section. These functions
store their output in three global vectors. These vectors are then passed as
arguments in the Scaling(Yaxe,Xaxe,LEN,title) function which scales the
results so they can be displayed on the appropriate Window. The first two
arguments of this function correspond to the pairs of data (y, x) to be scaled,
while the third passes the length of the vector and the last one the title to
be displayed on the top of the plot. These vectors are:
• DT: This vector always contains the original time series. It is posted to
the upper window, after having been scaled by the scaling(DT,XD,sze,"")
function.
• tool: This vector contains the output of one of the following functions:
VRG(cl,elim,al), ACF(acf), PART() or VRGDIF(1). Every time you
call one of the above functions, the results are stored in that vector,
6
which is then passed as an argument to the Scaling(tool,toolx,LAG,"")
function and is always displayed on the lower Window.
• TEMP: After every transformation we apply on the original data, the

results are stored in that vector which is also passed as an argument
to the function Scaling(TEMP,TEMPx,LEN,""). It is always displayed
on the upper Window.
All the functions of this part are (directly or not) called through the input_loop()
function.
5 User Guide
5.1 Compiling and Loading the Package
The source file is named EiT.c standing for Exploratory identification Tools
and it can be found in the directory /a_mount/stork/homes2/gk/PRJ/C.
To create an executable form of this you have to login on goshawk, then
type make and press <return>. The running version of the program will be
stored in the same directory with name EiT. To run the program you have
to change the $DISPLAY parameter and type EiT and press <return>. You
may also have to type the command unsetenv LD LIBRARY PATH , (if
this command is not already in your .userlogin file).
5.2 Using the Package

After typing EiT a Workstation will appear on your terminal, as the one
in figure 2.1. On the top of this screen you can see the menu bar that
enables the user to enter his/her options. Just below the menu bar there
is a message box that displays information relevant to the working file and
the transformations you may use. The rest of the screen consists of two
Graphical Windows which display the time plot of the series and the relevant
Graphical identification tools. The upper Window always displays either
the time plot of the raw data or the time plot of a transformed version of
these (the possible transformations are discussed below). The lower Window
displays the selected identification tool which always corresponds to the data
appearing on the upper Window. By default, when a file is opened, the
time plot and the estimated Standardised Variogram appear in the upper
7
and lower Windows respectively. Whenever you decide to transform your
original data, by clicking the appropriate button, the lower window will
automatically display the estimated Standardised Variogram corresponding
to the transformed data. Any other identification tool will also be applied
on the transformed data. The menu bar consists of 15 buttons which are
described below:
• OPEN By clicking this button a prompt appears on the execution
window asking you to specify the name of the file you wish to read in.
You just have to type the name and press <return>. The file should
always be in ASCII format. If the name that you gave corresponds
to an existing file then the contents of the data will appear on your
screen, the time plot and the estimated Standardised Variogram will
be displayed. If the name you gave is not recognised, you will get an
error message and a prompt, asking you to specify another filename.
The name of the successfully opened file will always be displayed in the
message box, which will also inform you about any transformation that
you may apply on the original dataset. This button calls the flopen()
function.
• VRG By clicking this button we get a plot of the estimates of the
Classical Variogram. These are produced using the estimator 3.2 described
in Chapter 3. All the three Variogram estimators are in fact calculated
by the same function VRG(cl,elim,al) described in the appendix of
this project. It calls this function with arguments cl=0 and al=0.
The argument cl stands for classical and passes the values 1 or 0
in the corresponding Boolean variable which determines the type of
Variogram estimates that will be produced. The argument elim is
internally determined and it is used when we apply this function on
differenced (seasonal or not) data.
• alVRG By clicking this button we get a plot of the estimated Classical
Variogram using the alternative estimator 3.4 described in Chapter 3.
It calls the same function as the previous one, but with arguments cl=0
and al=1. The argument al stands for alternative and like before it
takes Boolean values.
• stdVRG By clicking this button we get a plot of the estimates of the
Standardised Variogram using the formula 3.3 described in Chapter 3.
It calls the same function too, but with arguments cl=1, al=0.
8
• ACF Clicking on this button, produces the plot of the estimates of the
Autocorrelation function, using the formula 3.6, appearing in Chapter
3. It calls the function ACF(acf) with argument acf=1. That argument
takes the values 1 and 0 for the Autocorrelation and the Autocovariance
function respectively.
• ACVF Clicking on this button estimates and displays the Autocovariance

function, using the formula 3.5, given in Chapter 3. It calls the function
ACF(acf) with argument acf=0.
• PACF This button produces the plot of the estimates of the Partial
Autocorrelation function, using the Choleski Decomposition on the Yule
Walker linear system. It calls the functions CHOL(mat) and PART().
• SDIFF Transforms the original data to their Seasonal differences and

plots them. By clicking that button, a prompt appears in the execution
window that asks you to specify the length of seasonality (for example
12 or 6 months). After pressing <return> you get the plots of the
seasonal differenced data and the corresponding Standardised Variogram.
It calls the function SDIFF(diff) where the argument diff passes the
length of the seasonality. The message box always indicates the type
of seasonal differences that you might have produced.
• DIFF Produces the d differences of the original data. For every successive
click d increases by one. It uses the formula 3.1 given in Chapter 3. It
calls the function DIFF(d) where the argument d determines the order
of the differences that will be produced. The order d is always displayed
in the message window.
• LOG This just produces and plots the log transformation of the original
data. Of course, it first checks for any non positive value and if FALSE
it applies the transformation. After applying this transformation on
your data the LOG sign will appear in the message window. It calls the
function LOG().
• RESET This function is useful in order to return to the initial state.

For example, after you have produced the d=3 differences of the log
transformed data and the corresponding autocorrelation function, clicking
that button will display again the original series and their Standardised
9
Variogram. It also sets the number of lags to their default value
int((n-elim)/4) as described below. It calls the function RESET().
• PRINT This prints the current graphical output in Computer Graphics

Meta-file (CGM) named cgmout.cgm. This file will be stored in the
working directory and normally can be viewed and transformed to
Postscript file in every UNIX environment that provides the relevant
utilities. It can also be viewed in a Microsoft Windows environment.
It calls the function PRINT().
• CONFIG This button enables the user to change the number of lags m
used to display the relevant results on the lower window. The default
number of lags m is int((n-elim)/4) i.e. the integer part of the ratio
(n-elim)/4, where n is the length of the series and elim corresponds
to the reduction in n by application of any form of differencing. If
the number of lags given by the user is not inside the legal range then
an error message appears and the default value is used. It calls the
function CONFIG().
• vrgDIFF This produces the first differences of the Variogram and plots
them on the lower Window. This button is active only if you have
already displayed one of the above Variogram estimators in the lower
Window. Then clicking this button will produce the first differences of
this specific estimator. Otherwise, if in the lower Window something
else is displayed, for example the Autocorrelation function, you will
have to display one of the three Variogram estimators first and then
press this button. It calls the function DIFF(d) with argument d=1.
• QUIT This ends the session, closes all the opened files and the Workstation
and returns the command to the UNIX prompt.
Figure 2.1 shows a typical screen of the program. The message box
informs us that we are working with the file IBM1.dat and that no differencing
has been applied. The LOG sign informs us that the logarithmic transformation
has been used. Finally it gives the number of lags used (in this case the
default value becomes 92) and also the length of the series n=369.
10
cgmout3.ps
Figure 1: A typical screen of the Software Package
Theoretical Results
6 Introduction
In this chapter we examine some theoretical aspects concerning the Variogram
and the Covariance function. We also obtain the theoretical form of the
standardised Variogram G(m) and the differenced standardised Variogram
D(m) for a range of different models. In the first section we introduce
the necessary definitions and background and in the next one we examine
the proposed estimators. Section 3.4 presents the theoretical form of the
standardised Variogram for several models and also illustrates the proofs
for these results. In the same section we examine, using simulation, the
performance of the relevant estimators.
11
7 Theoretical Background & Motivation
7.1 General
Let {Xt , t ∈ T } be a discrete time stochastic process. We define the mean
function
µt = E(Xt ), ∀t ∈ T , the Variance function σt2 ≡ V (Xt ) = E(Xt −µt )2 , ∀t ∈ T
and the Covariance function γ(t1 , t2 ) ≡ Cov(Xt1 , Xt2 ) = E[(Xt1 − µt1 )(Xt2 −
µt2 )], ∀t1 , t2 ∈ T .
The stochastic process {Xt , t ∈ T } is said to be first order stationary if
the joint distribution of Xt1 , Xt1 . . . Xtn is identical with the joint distribution
of Xt1 +τ , Xt2 +τ . . . Xtn +τ . In the case where n=2 this reduces to saying that
the joint distribution of Xt1 , Xt2 depends only on the difference t1 − t2 and
hence Cov(Xt1 , Xt2 ) = γ(t2 − t1 ), ∀t1 , t2 ∈ T . The process {Xt , t ∈ T } is
second order stationary if E(Xt ) = µ, ∀t ∈ T and Cov(Xt , Xt+k ) = γ(k).
Then the Autocorrelation function is defined as %k = γ(k) γ(0)
and the Partial
Autocorrelation φkk = %xt ,xt−k •xt−1 ,...,xt−k+1 can be defined via the Yule Walker
equations. In Time Series Analysis it is usual to work with second order
stationarity conditions.
Following Matheron (1973), the process {Xt , t ∈ T } is an Intrinsic Random
Field if
E(Xt+m − Xt ) = βm, and V (Xt+m − Xt ) = 2C(m)
where C(m) is a conditionally negative definite function. Cressie (1991) refers
to a process satisfying the above conditions as an intrinsically stationary
process. Note that all the second order stationary models are, in fact,
intrinsically stationary.
The process {t , t ∈ T } is said to be a pure random process or White
Noise if it has constant mean and variance, and γ(k) = Cov(t+k , t ) =
0, ∀k 6= 0. This process is widely used to describe the error structure of more
complicated models.
The stochastic process {Xt , t ∈ T } is called a General Linear Process or
infinite order Moving Average process, if:
∞
X
Xt − µ = t + λj t−j , t ∈ T
j=1
where µ and λj are parameters and ∞ j=1 |λj | < ∞. We write Xt ∼ M A(∞).
P
This representation is very useful and it has been used to obtain the results
appearing in section 3.4.
12
A process Xt is said to be an ARMA(p,q) process if it can be written in
the following form:
Xt − φ1 Xt−1 − φ2 Xt−2 − ... − φp Xt−p = t − θ1 t−1 − θ2 t−2 − ... − θq t−q
where {t } is a white noise process, as defined above. It is also assumed

that the parameters θi and φi satisfy the usual stationarity and invertibility
restrictions, respectively.
A more flexible class of models that also contains non-stationary ones, is
the ARIMA(p,d,q) class, which is defined as:
φ(B)∇d Xt = θ(B)t
where, φ(B) = 1−φ1 B−φ2 B 2 −...−φp B p and θ(B) = 1−θ1 B−θ2 B 2 −...−θq B q
are the autoregression and moving averages operators respectively and B the
back shift operator, such that B j Xt = Xt−j . The difference operator ∇d is
defined as
∇d Xt = (1 − B)d Xt = Xt − (d1 )Xt−1 + (d2 )Xt−2 − ... + (−1)d Xt−d (1)
And here we assume that the stationarity and invertibility restrictions for
the parameters are satisfied.
We define 2C(m) ≡ V (Xt+m − Xt ). This quantity is known as the
Variogram and C(m) as the Semivariogram and they are widely used in the
Spatial Statistics literature. We call g(m) ≡ 2C(m) the Classical Variogram
and G(m) = g(m)/g(1) the Standardised Variogram. We also define d(m) and
D(m) to be the first differences of the Classical and Standardised Variogram
respectively.
7.2 Proposed estimators

The proposed estimators for the above theoretical quantities, are
Pn−m
t=1 (Xt+m − Xt )2
ĝ(m) = (2)
(n − m)
(n − 1) n−m
t=1 (Xt+m − Xt )
2
P
Ĝ(m) = Pn−1 (3)
(n − m) t=1 (Xt+1 − Xt )2
13
ˆ
and for their differences d(m) = ĝ(m)−ĝ(m−1) and D̂(m) = Ĝ(m)−Ĝ(m−1)
An alternative estimator for the Classical Variogram is (Haslett 1997):
Pn−m
i=1(dmi − dm )2
g̃(m) = (4)
n−m−1
where dmi = Xi+m − Xi and dm is the average of dmi ’s. It can be shown that
ĝ(m) is an unbiased estimator for g(m) for stationary time series, while g̃(m)
is always biased, since it estimates the mean of a set of correlated data. The
usual estimator of the Autocovariance function is
−k
1 NX
γ̂(k) = (xt − x)(xt+k − x) (5)
N t=1
where x is the sample mean. The estimator for the Autocorrelation function
becomes
γ̂(k)
%̂k = (6)
γ̂(0)
The partial Autocorrelation is estimated by solving the Yule Walker equations,
or by successively fitting autoregressive models of order 1,2,...,k. The estimator
γ̂(k) is usually referred as the ”biased” estimator in contrast with the so
n
called ”unbiased” estimator γ̃(k) = n−k γ̂(k). In fact both these estimators
are biased and not necessarily one more than the other(Percival 1993).
14
7.3 The Box-Jenkins approach to model identification
Box, Jenkins & Reinsel (1994) proposed an iterative approach to Stochastic
Model Building which consists of the three stages: Identification, Estimation
and Diagnostic Checking. At the identification stage we use the information
we can derive from the data in order to suggest a subclass of models. Firstly,
we need to identify the degree of differencing d required in order to achieve
stationarity. Having an obvious value for d (usually d= 0, 1 or 2) we
estimate the Autocorrelation and Partial Autocorrelation functions on ∇d Xt
where Xt the original series and try to obtain information about the order
of the autoregression coefficient p and the Moving Average q. This is the
case where graphical techniques are particularly usefull while experience in
obtaining information from them, is required. According to this approach
the main information about the parameters (p,d,q) comes from the estimated
Autocorrelation (ACF) and Partial Autocorrelation (PACF) functions. If the
estimated ACF function does not die out ”quickly” we have an indication of
non stationarity (i.e. that d> 0), while the number of lags that ACF and
PACF die out, is a sign of the number of parameters p and q respectively. In
the case that both ACF and PACF decay slowly, a mixed model is suggested.
At the Estimation stage we use the actual data in order to make inferences
about parameters, assuming that the model identified in the previous phase
is adequate. Identification and Estimation necessarily overlap which means
that we may identify a more elaborate model and then estimating the parameters
we can decide about possible simplifications.
Finally, in Diagnostic Checking, we use techniques to test the model
obtained by the previous stages and if necessary to improve it.
Several authors have express dissatisfaction with this method. In the
following we compare the theoretical properties between the Autocovariance
(and hence the Autocorrelation) function and the Variogram, while in Chapter
4 we compare these two approaches by applying them in some actual datasets.
15
7.4 Comparison between covariance & variogram
For a stationary process we have:
V (Xt+m − Xt ) = V (Xt+m ) + V (Xt ) − 2Cov(Xt+m , Xt ) ⇐⇒
⇐⇒ C(m) = γ(0) − γ(m) = σ 2 − γ(m)

In this case the limiting value of the Semivariogram is the variance of the
process, providing that γ(m) tends to zero as m tends to infinity, i.e. that
the covariance of two observations that are m steps apart tends to zero for
sufficiently large m. When the process is not stationary the autocovariance is
not defined and C(m) does not tend to an asymptote. The relation between
G(m) and γ(m), for stationary processes, becomes:
V (Xt+m − Xt) V (Xt+m ) + V (Xt ) − 2Cov(Xt+m , Xt )

G(m) = = =
V (Xt+1 − Xt ) V (Xt+1 ) + V (Xt ) − 2Cov(Xt+1 , Xt )
2γ(0) − 2γ(m) 1 − %m
= = (7)
2γ(0) − 2γ(1) 1 − %1
which is just a scaled function of the autocorrelations at lags m and 1 and
the differenced Variogram
%m−1 − %m
D(m) = (8)
1 − %1
Box, Jenkins & Reinsel (1994) suggest that the shape of the estimated
autocorrelation function can be a leading tool for identifying the degree of
differencing needed to achieve stationarity. According to this suggestion, if
the estimated autocorrelations do not die out ”quickly” then we have an
indication that the data are not stationary. Several authors have expressed
dissatisfaction with this method (for example, Anderson 1985). The main
argument against the use of the estimated autocorrelation function, as an
identification tool, is that for non stationary time series the theoretical autocorrelation
function is not defined. Cressie (1991) points out that, in the case of a non
stationary time series, estimating the autocorrelation function is meaningless,
since we are estimating a parameter that does not exist. In contrast, the
Variogram or Semivariogram, as defined above, does exist for certain types
of non-stationary models. An example where the autocovariance function
doesn’t exist but the Variogram does, is the Wiener process or Brownian
16
motion which is defined as follows: Let {X(t), t ≥ 0} have stationary independent
increments. Furthermore, suppose that for any given time interval (t1 , t2 ) the
difference X(t2 )−X(t1 ) is Normally distributed with zero mean and variance
σ 2 (t2 − t1 ). Then it can be proved (Robinson 1989) that γ(Xt+m , Xt ) = σ 2 t
which is not a function of the difference between the two time points. The
Variogram for this process will be G(m) = σ 2 m.
Another important drawback of using the autocovariance for non-stationary
data, is that because of the need to estimate the mean of a set of (possibly
highly) correlated data first, we end up with biased estimates. Newbold and
Agiakoglou(1993) examine the bias of the autocovariance function for models
with Long Memory structure and illustrate that they may be quite dramatic.
8 The Variogram for Stationary Models

For the ARMA(p,q) class of models, the Variogram is an alternative identification
tool to the autocorrelation function. This is obvious, since the Variogram as
well as the differenced Variogram is just a scaled version of the autocorrelation
at lag m. From the form of the Variogram we can clearly see that the limiting
1
value is 1−% 1
, while the differenced Variogram D(m) tends to zero, as m
increases. In the next two paragraphs we examine the form of G(m) and
D(m) for an AR(1) and a MA(1) process.
8.1 AR(1) model

For the AR(1) model Xt − φ1 Xt−1 = t we have
1 − φm
1
G(m) = (9)
1 − φ1
since for this model %k = φk1 . The differenced Variogram for the same model,
will be
D(m) = φm−11 (10)
which will tend to zero as m increases. In this case the differenced Variogram
D(m) is essentially the same as the sample autocorrelation function, except
that it is lagged by one step. Hence its properties will be the same. The
first part of figure 3-1 shows the theoretical Variogram for the AR(1) process
with parameter φ1 = 0.5 and the estimated Variograms for three simulated
17
AR1c5.ps
Figure 2: AR(1) with φ1 = 0.5

AR1c9.ps
Figure 3: AR(1) with φ1 = 0.9
AR(1) processes, with n=300. We can see that there is some variability in the
estimates of G(m). This may indicate a sensitivity of the estimator G(m) ˆ in
extreme values or outliers. The important point is that the estimates of G(m)
keep the theoretical pattern and after the 5th lag seem to vary around the
theoretical limiting value which, in this case, is 2.0. The second part of the
same figure illustrates the theoretical and estimated differenced Variograms.
For this specific process, D(m) has similar behaviour to the autocorrelation
function, as it can be seen from the graph. It is clear that the estimates
follow the theoretical pattern and, the important point, seem to randomly
vary around zero.
Figure 3-2 illustrates the theoretical Variogram for the AR(1) process with
parameter φ1 = 0.9 and the estimated Variograms for three simulated AR(1)
processes, with n=300. In this case we see greater variability in the estimates
of G(m). Although these estimates seem to follow the theoretical shape, they
seem to be unreliable especially for lags larger than 20. Similar problem arises
in the estimation of the autocorrelation function, where some authors suggest
the maximum number of lags to be n4 , where n is the length of the series. The
limiting value of the Variogram, having specified the parameter to be 0.9 is,
as shown in the graph, 10. The second part of the same figure, presents the
estimated and theoretical differenced Variograms. These estimates seem to
give a more clear picture, since they vary around the theoretical values, and
after 20th lag, around zero.
Finally, figure 3-3 shows D(m) and G(m) for the AR(1) process with parameter
φ = −0.9. In this case, the limiting value of the Variogram is 0.53. Both
Ĝ(m) and D̂(m) seem to perform quite well in this case. The corresponding
differenced Variogram, also tends to zero after just a few lags.
18
AR1cm9.ps
Figure 4: AR(1) with φ1 = −0.9

RW.ps
Figure 5: Random Walk
8.2 MA(1) model

For the MA(1) model, Xt = t − θ1 t−1 the Variogram takes the following
form:
1 + θ12
G(m) =
1 + θ1 + θ12
and the differenced Variogram is zero since G(m) is constant. The above
results easily generalises to any model of the ARMA(p,q) class, since the
only thing we need to know is the form of the autocorrelation function. Then
by the formulas 3.7 & 3.8 we get the form of G(m) and D(m). In general,
for stationary models, G(m) will either approach a limiting value for higher
order lags or will be a constant. Similarly, D(m) will either tend to be or will
be zero.
9 The Variogram for Non-Stationary Models

In this section we examine the form of the Variogram and the differenced
Variogram for some ARIMA(p,d,q) models with d > 0.
9.1 Random Walk

A simple case of non stationary model is the random walk Xt = Xt−1 + t .
Simple algebra yields that the differences Xt+m − Xt can be written as:
m
X
Xt+m − Xt = t+i
i=1
It is straightforward that V (Xt+m − Xt ) = mσ2 and V (Xt+1 − Xt ) = σ2 .

Hence G(m) = m and D(m) = 1. This process is a typical case of variance
non-stationary model. From figure 3-4 it can be seen that the estimates of
G(m) for the three simulated random walks, are satisfactory, up to about the
25th lag.
19
ARIMA011c2.ps
Figure 6: ARIMA(0,1,1) with φ1 = 0.2
9.2 ARIMA(0,1,1)
For this model, the first differences of the random variable Xt are a MA(1)
process, i.e.
Xt − Xt−1 = t − θ1 t−1
The key point in evaluating G(m) is to form an expression of the difference
Xt+m −Xt in terms of error. After repeating substitution we get the following
form of these differences:
m−1
X m
X
Xt+m − Xt = −θ1 t+i + t+i
i=0 i=1
Calculating the variance of the above expression we get:

m m+1
!
X X
V (Xt+m − Xt ) = V −θ1 i + i
i=1 i=2
= θ12 mσ2 + mσ2 − 2θ1 (mσ2 − σ2 ) = mσ2 (1 − θ1 )2 + 2θ1 σ2

and since V (Yt+1 − Yt ) = σ2 (1 + θ12 ), the Variogram will be:
2θ1 + m(1 − θ1 )2 2θ1 (1 − θ1 )2 m

G(m) = = + (11)
1 + θ12 1 + θ12 1 + θ12
and, the differenced Variogram,
(1 − θ1 )2
D(m) = G(m) − G(m − 1) = (12)
1 + θ12
The above confirm the results given without proof by Davies and Tremayne
(1997). For this model, the theoretical Variogram is a linear function of m,
with slope and intercept depending on the value of θ1 . As it can be seen from
figure 3-5 the estimator Ĝ(m) is quite good up to the 25th lag, but after that
it becomes rather poor, although it maintains the theoretical pattern. But
the estimate D̂(m) can be misleading after the 20th lag.
20
ARIMA012c57.ps
Figure 7: ARIMA(0,1,2) with θ1 = 0.5 and θ2 = 0.7
9.3 ARIMA(0,1,2)
For an ARIMA(0,1,2) process, we have:
Xt − Xt−1 = t − θ1 t−1 − θ2 t−2
After repeated substitution we express the difference Xt+m − Xt in terms of

the errors only:
m
X m−1
X m−2
X
Xt+m − Xt = t+i − θ1 t+j − θ2 t+k
i=1 j=0 k=−1
 
Xm m−1
X m−2
X
V ar(Xt+m − Xt ) = V ar  t+i − θ1 t+j − θ2 t+k 
i=1 j=0 k=−1
We can either calculate the covariances of the above formula yielding

 
m−1 m−2
t+k  = (m − 1)σ2
X X
Cov  t+j ,
j=0 k=−1
 
m m−1
t+j  = (m − 1)σ2
X X
Cov  t+i ,
i=1 j=0
and  
m m−2
(
X X 0 ; m=1
Cov  t+i , t+k  = 2
i=1 k=−1
(m − 2)σ ; m > 1
or express the differences Xt+m − Xt as a sum of uncorrelated terms:
m−2
X
Xt+m −Xt = −θ2 t−1 −(θ1 +θ2 )t +(1−θ1 −θ2 ) t+k +(1−θ1 )t+m−1 +t+m
k=1
Both ways led to the following results:

(
σ2 (1 + θ12 + θ22 ) ; m=1
V (Xt+m − Xt ) = 2 2 2
2σ (θ1 + 2θ2 − θ1 θ2 ) + σ (1 − θ1 − θ2 ) m ; m > 1
21
Since the difference Xt+1 − Xt is in fact a MA(2) process, its variance will
be V (Xt+1 − Xt ) = σ2 (1 + θ12 + θ22 ) and the Variogram for this process will
finally be:

 1 ; m=1
G(m) =  2(θ1 +2θ2 −θ1 θ2 ) (1−θ1 −θ2 )2 (13)
(1+θ12 +θ22 )
+ (1+θ12 +θ22 )
m ;m>1
and the differenced Variogram
(1 − θ1 − θ2 )2
D(m) = ; m>1 (14)
(1 + θ12 + θ22 )
Again, the Variogram G(m) is a straight line and its first differences a
constant. The above results agrees with the general formula given without
proof by Davies and Tremayne (1997). According to them the Variogram
for the ARIMA(0,1,q) class of models is G(m) = A + Bm where B =
(1 − θ1 − θ2 − ... − θq )2 /(1 + θ12 + ... + θq2 ) and A a function of the θi ’s
(which is not given).
Figures 3-6 and 3-7 shows realizations from this process, with parameters
θ1 = 0.5, θ2 = 0.7 and θ1 = 0.9, θ2 = −0.5 respectively. We can see that in
both cases the approximation is quite good over the range of m.
22
ARIMA012c9m5.ps
Figure 8: ARIMA(0,1,2) with θ1 = 0.9 and θ2 = −0.5

ARIMA110c5.ps
Figure 9: ARIMA(1,1,0) with φ1 = 0.5
9.4 ARIMA(1,1,0)
For the ARIMA(1,1,0) model, defined as (1 − φ1 B)(Xt − Xt−1 ) = t , the
theoretical Variogram is obtained in a similar way. After repeated substitutions
we obtain the following formula for the differences Xt+m − Xt
m−1
X m
X
Xt+m − Xt = φ1 Wt+i + t+j
i=0 j=1
where Wt = Xt − Xt−1 the first differences of Xt , which by definition, follows

an AR(1) process. The variance of the differences Xt+m − Xt will be:
 
m−1 m−1 m
!
V (Xt+m − Xt ) = φ21 V Wt+i + mσ2 + 2Cov φ1
X X X
Wt+i , t+j 
i=0 i=0 j=1
But
m−1 m−1 X m−1
m−1
!
X X X
V Wt+i = V (Wt+i ) + 2 Cov(Wt+i , Wt+j ) =
i=0 i=0 i=0 j=i+1
m−1
(m − 1)φ1 − mφ21 + φm
(m − j)φj1 = mγ(0) + 2γ(0) 1
X
= mγ(0) + 2γ(0) 2
j=1 (1 − φ1 )
σ2
where γ(0) = 1−φ21
≡ V (Wt ). Writing
∞ ∞
φk1 t+i−k = t+i + φk1 t+i−k
X X
Wt+i =
k=0 k=1
we find
 
m−1 m m−1
t+j  = φ1 (m − 1)σ2 + φ1 σ2 (m − j)φj−1
X X X
Cov φ1 Wt+i , 1 =
i=0 j=1 j=2
23
(m − 1)φ1 − mφ21 + φ21 + φm
1
= φ1 (m − 1)σ2 + φ1 σ2
(1 − φ1 )2
−Xt )
Since G(m) = V (Xt+m
γ(0)
after some more algebra we get the final form of
the Variogram :
m(1 + φ1 ) 2φ1 (φm1 − 1)
G(m) = + (15)
1 − φ1 (1 − φ1 )2
and the differenced Variogram:
2φm+1 − 2φm
1 + (1 − φ1 )(1 + φ1 )
D(m) = G(m) − G(m − 1) = 1 (16)
(1 − φ1 )2
The above corrects the result given, without proof, by Davies and Tremayne
(1997). (N.Davies in private correspondence has admitted that the relevant
formula appearing in their paper is wrong, since it does not give G(1)=1,
as it should). For this process, G(m) will be a linear function of m after a
number of lags and D(m) will tend to a constant.
Figure 3-8 gives Ĝ(m) and D̂(m) for three simulated ARIMA(1,1,0) with
φ1 = 0.5. The estimated Variogram seems to approximate well enough the
theoretical Variogram but D̂(m) becomes rather poor estimate of D(m) after
about 20 lags.
9.5 ARIMA(1,1,1)
For this case the model becomes (1−φ1 B)(Xt −Xt−1 ) = t −θ1 t−1 . Following
the same procedure as before, we express the differences Xt+m − Xt in terms
of error and of first differences Wt = Xt − Xt−1 . By definition Wt follows an
ARMA(1,1) process. We have :
m−1
X m
X m−1
X
Xt+m − Xt = φ1 Wt+i + t+j − θ1 t+k
i=0 j=1 k=0
Calculating the variance of an ARMA(1,1) process we get:
1 + θ12 − 2θ1 φ1 2
σ 2 ≡ γ(0) = σ
1 − φ21
24
ARIMA111c55.ps
Figure 10: Simulated ARIMA(1,1,1) with φ1 = 0.5 and θ1 = 0.5
We also find that if Xt follows a stationary ARMA(1,1) process, then it’s

M A(∞) representation becomes:
∞
φk−1
X
Xt = t + (φ1 − θ1 ) 1 t−k
k=1
Hence
 
m−1 m m−1 m ∞
!
φ1k−1 t+i−k , t+j
X X X X X
Cov φ1 Wt+i , t+j  = Cov φ1 t + φ1 (φ1 − θ1 ) =
i=0 j=1 i=0 j=1 k=1
m−1
σ2 (m (m − j)φj−2
X
= − 1)φ1 + φ1 (φ1 − θ1 ) 1
j=2
In analogous way we find

m−1 m−1 m−1
!
2 2
(m − j)φj−1
X X X
Cov Wt+i , t+k = mσ + (φ1 − θ1 )σ 1
i=0 k=0 j=1
and
m m−1
t+k ) = (m − 1)σ2
X X
Cov( t+j ,
j=1 k=0
After tedious algebra we end up with the following general form of G(m):
G(m) = αm + βφm−1
1 − γφm
1 −δ (17)
where α, β, γ and δ are (complicated) functions of the parameters only. For

this model, the theoretical Variogram is (after a number of lags) a linear
function of m. Actually we expect the Variogram for large number of lags to
have a parallel behaviour to m depending on δ. The differenced Variogram
will tend to be constant. Figure 3-9 gives the time plot of a simulated
ARIMA(1,1,1) with φ = 0.5 = θ. From figure 3-10 we see that the linear
property of G(m) after a number of lags is clearly identified in Ĝ(m) but
D̂(m) loses the theoretical pattern for lags larger than 20.
25
ARIMA111c552.ps
Figure 11: Ĝ(m) and D̂(m) for a simulated ARIMA(1,1,1) with φ1 =

0.5 and θ1 = 0.5
I2.ps
Figure 12: Pure I(2) Process
9.6 Pure I(2) process

A process Xt is said to be a pure I(2) process if its second differences are
white noise, i.e.
∇2 Xt = Xt − 2Xt−1 + Xt−2 = t
Without loss of generality we assume that X−1 = X−2 = 0. Then by
repeating substitution we get the following form of the above model:
t
X t
X
Xt = (t − j)j = jt−j+1
j=1 j=1
Some algebra yields that

σ2 t(t + 1)(2t + 1)
V (Xt ) =
6
and
σ2 (t + m)(t + m + 1)(2t + 2m + 1)
V (Xt+m ) =
6
Further algebra gives
t
σ2 (t(t + 1)(2t + 1) + 3mt(t + 1))
Cov(Xt , Xt+m ) = σ2
X
i(i + m) =
i=1 6
Hence the final form of the Variogram will be:

σ2 (6tm2 + 2m3 + 3m2 + m)/6 m(2m − 1)(m − 1)
G(m) = 2
= m2 + (18)
σ (t + 1) 6(t + 1)
and the differenced Variogram becomes
m2 − 2m − t
D(m) = 2m + (19)
t+1
26
example.ps
Figure 13: An Example
It is clear that G(m) is no longer independent of time. But for large sample
sizes we expect Ĝ(m) to be a nonlinear function of m and D̂(m) as linear
function of m. Figure 3-11 shows the theoretical and estimated Variograms
for three simulated I(2) processes of length n=300, while the second part of
the same figure gives the theoretical and the estimated differenced Variograms.
We can see that the estimator Ĝ(m) performs very well in this case. The non
linear form of Ĝ(m) as well as the linear form of D̂(m), clearly distinguishes
this model from a stationary or a first difference stationary one. The differenced
Variogram estimator D̂(m) also approximates very well the theoretical values.
27
tools.ps
Figure 14: ACF, Ĝ(m) and D̂(m) for a simulated series

trial.ps
Figure 15: Ĝ(m), for m=50
10 Final Remarks
Having obtained the theoretical form of G(m) and D(m) for several models,
we can clearly see that there are some qualitative differences which can be
useful in the model identification process. The key feature is the shape of the
Variogram: For stationary models G(m) is either a constant (MA case) or
tends to a constant(AR case) and correspondingly D(m) is zero or tends to
be zero. For first order differenced stationary series, G(m) is a linear function
of m and D(m) a constant. Furthermore, for second order stationary models,
G(m) is a nonlinear and D(m) a linear function of m (for large sample sizes).
Since in practice it is usually convenient to use differences of order d=1 or
d=2 in order to achieve stationarity, these two graphical devices can be very
helpful (in contrast with the autocorrelation function, which should not be
used with non-stationary series).
Although in most cases Ĝ(m) and D̂(m) contain the important information
that may be used in model identification, there may be practical situations
where these estimates need to be very carefully investigated. For example,
consider the series appearing in figure 3-12. The sample size is n=100. From a
visual examination of this graph we can’t really tell if this series is stationary
or not. The first part of figure 3-13 gives the estimated autocorrelation
function, which strongly suggests that the data do not need to be differenced,
since the autocorrelations even from the third lag are within the + √2
− n limits.
The second part of the same figure shows the estimated Variogram for
the same data. Although we can see a small upward tendency in this plot,
we can not say that this is a sign of non-stationarity. The third part of
figure 3-13 gives the first differences of Ĝ(m). From this plot we could say
that D̂(m) randomly varies around zero, which according to our results,
indicates stationarity. In fact the data generating mechanism of the above
example is an ARIMA(0,1,1) model with parameter θ = 0.9 which is a non-
stationary one. Both the Autocorrelation function and the Variogram failed
28
msim.ps
Figure 16: Ĝ(m) for samples from an ARIMA(0,1,1) model with different
sample sizes
to clearly identify the correct model in this case. The reason for this is
that the theoretical Variogram is a linear function of m with a very small
slope: β = 0.0055. Having in mind the small sample size and the variability
in the estimate Ĝ(m), it is difficult to distinguish if Ĝ(m) varies about a
constant or increases linearly with m but with a small rate. Of course it is
further more difficult to tell if D̂(m) varies about zero or around a constant
close to zero. To eliminate that problem we need a larger sample so we can
have more accurate estimates and also to look at Ĝ(m) for larger m and
recognise easier its linear trend. Figure 3-14 shows Ĝ(m) but for m=50.
Here the upward tendency in Ĝ(m) is more clear, but a decision based on
evidence obtained from that plot would be too risky, having in mind the
small sample size and that we have chosen m to be n/2. Figure 3-15 shows
the estimated Variograms for three simulations of the same non-stationary
process, for several sample sizes. We can see that even for large n it is difficult
to detect such a small slope. The linearity in Ĝ(m) becomes clear for n >
500.
Application on Real datasets
11 Introduction
In this Chapter we apply the theoretical results presented in Chapter 3 to
some real datasets. In each case, we will try to identify the appropriate
model using both the Box Jenkins methodology and the Variogram. We
will examine two series of stock prices (IBM) and four series consisting
of chemical measurements. These datasets have been down loaded from
the Statlib Web-site. Although most of them have been analysed by many
authors, full details concerning the nature of these data and the measurement
techniques used, are not available. These additional informations can be
helpful in the data analysis, since they allow better understanding of the
data and the interpretation of standard patterns, outliers and other ”strange”
characteristics of the data. However we proceed with the analysis of these
datasets using strictly the tools described so far.
29
chemT.ps
Figure 17: Chemical process temperature readings
12 The Chemical Temperature Dataset

This dataset consists of n=226 chemical process temperature measurements.
The time unit is one minute. Box, Jenkins & Reinsel (1994) have already
analysed this time series using the Autocorrelation and Partial Autocorrelation
functions. Davies & Tremayne (1997) analysed the same dataset using the
Variogram and it’s first differences. We will try to identify the appropriate
model using both techniques and of course compare the results. Figure 4-1
presents the time plot of the raw data. This plot clearly shows a kind of a
pseudo-periodical behaviour. There are two cycles with period of about 80
minutes and this pattern seem to continue further. Basic understanding
of the data and the definition of the Variogram G(m) suggest that this
”periodicity” will also appear in the Variogram estimate.
30
ChemTtools1.ps
Figure 18: Ĝ(m) and D̂(m) for the Chemical process temperature dataset
This might make the interpretation of G(m) and D(m) difficult and the
choice of m quite crucial. In the previous Chapter we proved that for a
stationary process G(m) should be constant or tend to be constant as m
increases. The importance of choosing a reasonable m depending on the
characteristics of the dataset to be analysed, is illustrated in figure 4-2.
The upper part of this figure shows the estimated Variogram and it’s first
differences for m=25. Inspection of Ĝ(m) for that number of lags certainly
indicates non-stationarity since it seems to be a non-linear function of m for
lags up to about 5, but for larger m, Ĝ(m) becomes a linear function of m and
D̂(m) seems to have settled down to a constant. Thus, based on these plots,
we would decide that the data are non-stationary and that by taking their
first differences we would probably end up with a stationary series (which
is in fact the right decision). The lower part of figure 4-2 shows the same
functions but for a larger number of lags up to m=50. We can see how the
different choice of m leads to functions with properties that look different.
31
ChemACFS.ps
Figure 19: ACF & PACF for the Chemical Temperature dataset
From the plot of the estimated Variogram for m=50 we get a completely
different impression: it seems to have reached a limiting value instead of
increasing with m which might indicate the exactly opposite decision. In
fact this falling down in the estimated Variogram and differenced Variogram,
is not a sign of stationarity but just the result of this ”cycling” behaviour of
the raw data. By inspecting again the time plot of the original series we see
that it would be reasonable to expect the Variogram to fall down after m=40
or so, i.e. after m reaches about the half of the first ”period”. Plotting
Ĝ(m) for larger m we see that it does increase but with a ”periodicity”.
The problem again is that for large m the estimator of G(m) becomes more
unreliable. On the other hand even if m had been chosen to be 50 or more,
the experienced user should notice that not only Ĝ(m) increases rapidly at
smaller lags, but also that the supposed limiting value is a large number
(G(m) is independent of measurement units). This fact, at least for simple
stationary models, is not expected and hence it is an extra sign
32
ChemTD.ps
Figure 20: Differenced chemical temperature data
of non-stationarity. Hence, after a careful examination of Ĝ(m) and

D̂(m), we conclude that the series is non-stationary. The same conclusion
would have been reached if we had based our choice on the sample Autocorrelation
function, appearing in the first part of figure 4-3. The Autocorrelations,
seem to linearly decay slowly to zero and even for about 20 lags they are
significantly different from zero. This suggests that we need to take at least
the first differences of the data in order to achieve stationarity. At this stage
Davies & Tremayne (1997) had already suggested, based on Ĝ(m) and D̂(m)
for m=25, that the raw data come from an ARIMA(1,1,0). They came to
that conclusion considering that D̂(m), as appearing in the left upper part
of figure 4-2, has already reached a limiting value of around 8. In fact having
in mind the form of the theoretical Variogram and differenced Variogram,
given in the previous Chapter by formulas 3.15 & 3.16, we see that this may
be the case.
33
ChemDifTOOLS.ps
Figure 21: Ĝ(m), D̂(m), ACF and PACF for d=1 differenced Chemical
Temperature data
Hence based on the Variogram as well on the Autocorrelation function

we proceed with differencing the data once. The Variogram, in this case,
gave us additional information about the final form of the model, which may
be useful in the following stages. Figure 4-4 presents the first differences
of the original data. We can see that the ”periodicity” appearing on the
original data time plot has been eliminated. Figure 4-5 gives the estimated
Variograms and Autocorrelations for the differenced series. As we can see
Ĝ(m) is approaching a limiting value of about 6 and D̂(m) tends to zero.
That strongly suggests that the differenced series is a stationary one. From
the shape of the estimated Variogram we can also see that the transformed
series is definitely not a MA(q) process, since in that case both Ĝ(m) and
D̂(m) should be constants for m > 1, while it certainly indicates an AR(1)
or a mixed model.
34
From figure 4-5 we can see that the estimated Autocorrelation function
follow a roughly exponential falloff. The Partial Autocorrelations also are
within the Normal limits (except at the first lag) suggesting that the model
contains one autoregressive term (if we accept that it is stationary). In their
analysis, Box, Jenkins & Reinsel suggest one of the following two models
(1 − 0.8B)(1 − B)Xt = t
or
(1 − B)2 Xt = t
i.e. an ARIMA(1,1,0) model with φ = .8 or an ARIMA(0,2,0), which in
fact are quite similar. The Variogram in this case proved its superiority
in determining with less uncertainty the required degree of differencing.
Furthermore, with careful handling we can also have an indication for the
final model. Following Davies & Tremayne, if we accept that D̂(m) for
the original series, did reach a limiting value (figure 4-2) then using the
formula 3.16, we get an initial value of φ = .8, exactly the one suggested
by Box, Jenkins & Reinsel. Thus, using both the estimated Variogram
and Autocorrelation functions in a complementary way, we can have a more
precise base for the model building process.
13 The Chemical Concentration Dataset

This dataset consists of n=197 observations for every two hours. The time
plot of this series is displayed in figure 4-6. A first look at this plot suggests
non-stationarity. In figure 4-7 we can see the estimated Variograms & Autocorrelations.
The linear form of Ĝ(m) strongly suggests the non-stationarity of the data.
The differenced estimated Variogram seems to vary around a positive constant
near zero, as it should. The important information again should be the linear
form of Ĝ(m) since this is easier to see than that D̂(m) varies around a small
constant and not around zero. The second part of the same figure shows the
estimated Autocorrelations and Partial Autocorrelations.
35
ChemC.ps
Figure 22: The Chemical Concentration readings dataset
And in this case the Autocorrelation function does not lead to a unique
solution: Inspecting the relevant plot in the left lower part of figure 4-7 we
can see that the estimated Autocorrelations decay quite slowly to zero leading
to uncertainty about the need of differencing or not. Box, Jenkins & Reinsel,
suggest that the appropriate model is either ARIMA(1,0,1) or ARIMA(0,1,1).
Figure 4-8 shows the relevant estimates for the d=1 differenced series. The
estimated Variogram clearly suggests stationarity and also that the underlying
mechanism is a Moving Averages process. The same of course can be concluded
by inspecting the differenced estimated Variogram: it seems to randomly vary
around zero.
36
ChemCtOOLS1.ps
Figure 23: Ĝ(m), D̂(m), ACF and PACF for the Chemical Concentration
data
Finally we look at the estimated Autocorrelations and Partial Autocorrelations

given in figure 4-8: The first order differenced series is stationary since the
estimated Autocorrelations ”die out” from the second lag. In addition, the
behaviour of the estimated Partial Autocorrelations suggest that the d=1
differenced series is an MA(1) process which means that the underlying model
for the original series is an ARIMA(0,1,1) process. Again the Variogram gives
a more clear picture for the stationarity properties of the time series. Based
on that we difference the data and applying the Box Jenkins method we end
up with an appropriate solution.
37
ChemCdiffTOOLS2.ps
Figure 24: Ĝ(m), D̂(m), ACF and PACF for d=1 differenced Chemical
Concentration data
14 The Chemical Process Viscosity Dataset

This dataset consists of n=310 measurements on Chemical Viscosity. It has
also been analysed by Box, Jenkins & Reinsel (1994). A time plot of the
raw data can be seen in figure 4-10. Again it is not easy to tell if this
series is stationary or not. Figure 4-11 shows the estimated Variograms &
Autocorrelations. We see that Ĝ(m) tends to an asymptote after about the
15th lag and correspondingly, D̂(m) tends to zero after the same number of
lags.
38
ChemV.ps
Figure 25: The Chemical Viscosity measurements
These two signs indicate that the process is stationary and that the
generating mechanism is either AR or a mixed model. On the other hand,
the estimated Autocorrelation function does not give a clear answer. It
seems to die out quite quickly but not in the way that definitely suggests
stationarity. Again Box, Jenkins & Reinsel express uncertainty on whether
d=0 or d=1. Based on Ĝ(m) and D̂(m) and also using the fact that the
estimated Autocorrelations die out while the estimated Partial Autocorrelations
are zero (except from the first one), we decide that the appropriate model
is an AR(1). Considering the limiting value of Ĝ(m) to be around 5 and
using the formula for the theoretical Variogram of an AR(1) process, given
by formula 3.9, we can also have an initial estimate of the AR(1) parameter.
This was found to be φ = 0.8
39
chemVTOOLS.ps
Figure 26: Ĝ(m), D̂(m), ACF and PACF for the Chemical Viscosity dataset
meth.ps methTOOLS1.ps
Figure 27: Time plot, Ĝ(m), D̂(m), ACF and PACF for the Methanol
measurements
15 The Methanol Dataset

This is the first part of the Gas Furnace Data (Box, Jenkins & Reinsel,
1994). It consists of 296 methanol measurements. The time unit is 9 seconds.
Figure 4-11 presents the time plot of this dataset as well as the estimated
Variograms and Autocorrelation functions. We can see again that it is not
easy to decide if this series is stationary or not. The lower part of the same
figure gives a clear answer to the above problem: The estimated Variogram
and its first differences have the pattern they should have for an AR or a
mixed model. Ĝ(m) approaches a limiting value after about the 20th lag
and even for m > 50, D̂(m) varies around zero (after the 20th lag). The
estimated Autocorrelations in this case suggest stationarity while the Partial
Autocorrelations are zero except at the first two lags indicating that there
are two parameters in the AR model. Hence, once more, combining use of
Ĝ(m) and ACF led to a reasonable initial model.
40
IBM1.ps
Figure 28: The IBM Stock Prices series (first dataset)
16 The IBM Common Stock Closing Prices

16.1 The first dataset
This is a series of n=369 IBM’s daily common stock closing prices, from
17/05/1961 to 02/11/1962. This dataset has been analysed by many authors,
for example Box, Jenkins & Reinsel (1994). A time plot of these data
can be seen in figure 4-12. A first investigation of this plot suggests that
the series is not stationary. Figure 4-13 presents the estimated Variogram
and its first differences and also the estimated Autocorrelation and Partial
Autocorrelation functions for this dataset. Inspecting the first part of this
figure, we see that the estimated Variogram is certainly a function of the lag
m, which clearly indicates non stationarity.
41
ibm1TOOLS1.ps
Figure 29: Ĝ(m), D̂(m), ACF and PACF for the first IBM dataset
The shape of the estimated Variogram has a non-linear form which may
already suggest that the data are second order difference stationary. But this
non-linearity seems to disappear after the first twenty lags. This fact is also
reflected in the differenced Variogram plot (second part of the same figure),
which also clearly indicates non stationarity: It seems to increase in a linear
way with the lag m, but after the 30th lag it looks like it has settled down to
a limiting value. Since the large sample size in this case allows investigation
for Ĝ(m) and D̂(m) for larger number of lags, plotting these two estimates
for m up to 100 clearly shows that Ĝ(m) becomes linear function of m and
that D̂(m) reaches a limiting value of about 3.
42
IBM1D.ps
Figure 30: The d=1 differenced IBM series (first dataset)
Hence the only safe decision that we can take at this stage, based on the
behaviour of Ĝ(m) and D̂(m), is that the series is not stationary. In the same
decision we would end up, by using the estimated Autocorrelation function,
shown in the third part of figure 4-13. The estimated Autocorrelations decay
very slowly in a linear way and even for 40 lags they are all significantly
different from zero.
Figure 4-14 shows the first order differenced IBM data. It is not obvious
from the time plot investigation, if the data are stationary or not. Although
the plot varies around zero, it might indicate variance non-stationarity of
the data, at around 260 days. Figure 4-15 presents the estimated Variogram
and its first differences and also the estimated Autocorrelation and Partial
Autocorrelation for the first order differenced data. By inspecting the behaviour
of Ĝ(m) we can see that it possibly varies around a constant.
43
ibm1TOOLS2.ps
Figure 31: Ĝ(m), D̂(m), ACF and PACF for d=1 differenced IBM data (first
dataset)
IBM2.ps ibm2TOOLS1.ps
Figure 32: Time plot, Ĝ(m), D̂(m), ACF and PACF for the second IBM
stock prices dataset
In fact there is a small upward tendency in Ĝ(m) as the lag m increases.

This might be a similar case to the one discussed in the previous Chapter,
where both the Variogram and the Autocorrelation function failed to clearly
recognise the non-stationarity of the data. The differenced Variogram seems
to randomly vary around zero or a constant very close to zero. Hence either
we have achieved stationarity after differencing the data once, or the data are
non-stationary but with a root close to unity. The most obvious case is that
the first order differenced series is stationary and possibly a Moving average
model. Box, Jenkins & Reinsel suggest that the data generating mechanism
is a ARIMA(0,1,1).
44
ibm2TOOLS2.ps
Figure 33: Ĝ(m), D̂(m), ACF and PACF for d=1 differenced IBM data
(second dataset)
16.2 The second dataset

The time plot of this dataset appears in the upper part of figure 4.16. It
consists of n=255 IBM Common Stock Closing prices, from 29/06/1959 to
30/06/1960. The time unit is again one day. Initial inspection of the time
plot clearly suggests non stationarity. The same indication is obtained from
the estimated Variogram and its first differences appearing in the middle of
the same figure. Ĝ(m) seems to increase linearly with m. In fact it seems
to behave like m except from lags between 20 and 40 where there is some
deviation from that pattern. If we can say that this deviation is due to
sample variation, then Ĝ(m) suggests that the first differences of the data
will be either white noise, which means that the original series is a simple
random walk (section 3.4.1) or, recalling formula 3.17, an ARIMA(1,1,1)
model. Looking at D̂(m) we see that it does not vary randomly around 1
(as it should for a random walk) or around another constant (as it should for
an ARIMA(1,1,1)). Nevertheless, having in mind the amount of variabillity
in the estimates of D(m) for the simulations of these two models, we should
not consider them as inappropriate. The non stationarity of the series is also
obvious from the behaviour of the estimated Autocorrelation function, in
the lower left part of figure 4-16. Figure 4.17 shows Ĝ(m), D̂(m), ACF and
PACF for the differenced series. Ĝ(m) seems to settle down to a constant
of about 0.8 after about the 25th lag while D̂(m) varies around zero. Again
the parameter d was identified to be 1 using both methods but neither Ĝ(m)
nor D̂(m) give a clear answer about the specific underlying model while the
estimated Autocorrelation and Partial Autocorrelation function of the d=1
differenced series, might also indicate a random walk for the original series.
17 Conclusions
In chapter 3 we derived the form of the theoretical Variogram G(m) and D(m)
for several stationary and non stationary models. Although the variogram
is not defined for difference stationary processes with d > 1 its qualitative
characteristics for non-stationary processes provide a clear distinction between
45
these and the stationary ones. Furthermore in specific cases it suggests the
type of the underlying model. Having in mind that the estimator 3.3 of the
Standardised Variogram performs satisfactorily for most of the simulated
series we examined, we see that it can be a promising tool for Time Series
Analysis. This has been confirmed in the current Chapter where in most of
the cases, using the estimated Standardised Variogram and its first differences,
the degree of differencing d, required in order to achieve stationarity, was
straightforward while, at the same time, the Box Jenkins approach led to
uncertainty as to wether the series was stationary or not. Overall, the
Variogram and the Autocorrelation function should not be considered as
competitive but as complementary tools: The Variogram can be more useful
in determining the degree of differencing d and then both the Variogram and
the Autocorrelation function applied to the d- differenced series can help to
suggest an appropriate model.
Appendix
Basic C functions
This appendix presents only some of the basic part of the code written for this
project. Further details can be obtained from the source file.
The following function is used to produce estimates for the classical and standardised
Variogram. Calling this function with cl=0 and al=0 produces estimates using
the formula 3.2. With arguments cl=1 and al=0 it produces estimates for the
standardised Variogram using formula 3.3. Finally, calling this function with
arguments cl=0 and al=1 we get estimates of the classical Variogram but using
the alternative estimator given by formula 3.4.
void Variogram(cl,elim,al)
int cl,elim,al;
{int i,j,m;
float ss,V1,ssmne,mne;
j=0;ss=0,ssmne=0;V1=0;
for(j;j<sze-1-elim;j++){
V1=V1+(TEMP[j+1]-TEMP[j])*(TEMP[j+1]-TEMP[j]);}
m=1;
for(m;m<LAG+1;m++){
i=0; ssmne=0;
/*the next loop is only executed for al=1*/
for(i;i<(sze-m-elim)*al;i++){ssmne=ssmne+(TEMP[i+m]-TEMP[i]);}
mne=ssmne/(sze-m-elim);
46
/*the above loop finds the average of the differences for every m;*/
/*al=0 then mne=0*/
i=0;
for(i;i<(sze-m-elim);i++)
{ss=ss+(TEMP[i+m]-TEMP[i]-mne)*(TEMP[i+m]-TEMP[i]-mne);}
tool[m-1]=(ss*(cl*(sze-2)+1))/((1+cl*(V1-1))*(sze-m-1*al));
/* for cl=0 the above relation reduces to the classical estimator*/
ss=0;toolx[m-1]=m;
}}
47
The next function produces the d differences of the original data, using formula
3.1. The advantage of this function is that we can directly produce the dth
differences of the data. It calls the function Combinations which is also calls
the function factor; both are given below. The last one returns the factorial of
its argument while the first calculates the combinations of n out of N.
void DIFF(dim)
int dim;
{
int i,coef,j=1,count=0;
float ss=0.0;int a,b=-1;i=0;
for(i; i<sze-dim+1;i++){
j=1;a=1;ss=0;
for(j;j<dim+2;j++){
coef=Combinations(dim,j-1);
ss=ss+(float)a*(float)coef*DT[i+dim-j+1];
a=a*b;
}
TEMP[i]=ss;
}}
int Combinations(N,n)
int N,n;{
int numer,out,denumer1,denumer2;
numer=factor(N);denumer1=factor(n);
denumer2=factor(N-n);
out=numer/(denumer1*denumer2);
return(out);}
int factor(order)
int order; {
int i=1;int prd=1;
for(i;i<order+1;i++){prd=prd*i;}
return(prd); }
48
The following piece of code is a typical PHIGS structure. This one is called
GRAPH2 and determines the graphical output of the lower window. It executes the
structure AXES2 which produces the axes and the labels for the lower window and
also the structure TOOLS which contains the graphical information of the relevant
estimates.
/* Open structure GRAPH2 , to draw axes and plot the estimates*/

popen_struct( GRAPH2 );
pset_view_ind( GRAPH2);
plabel(OBJECT_TRANSFORM);
pexec_struct(AXES2);
pexec_struct(TOOLS);
pclose_struct();
Basic Splus functions

The function vrg returns a numerical vector which contains the estimates of
the classical Variogram for the vector data which is passed as an argument. The
second argument determines the number of lags m. This function uses only one
loop over the range of m and hence it gives quite short execution times. To
produce estimates of the standardised Variogram you just have to divide it by
vrg(data,1).
vrg_function(data, m){
n <- length(data)
store <- rep(0, m)
for(i in 1:m) {
d <- rep(i, n - i)
p <- seq(1, n - i)
fpoints <- data[d + p]
points <- data[1:(n - i)]
store[i] <- sum((fpoints - points)^2)/(n - i) }
store}
49
The function vrgAL is equivalent with the previous one, except that it produces
the estimates of the classical Variogram using estimator 3.4. The function msim
produces and plots the Variograms as well as the autocorrelation functions for a
given number of simulated series. The type of the model is also determined by the
user through the lista argument. The arguments mn and sd pass the mean and
the standard deviation of the error term that will be used.
vrgAL
function(data, m){
n <- length(data)
out <- rep(0, m)
store <- rep(0, m)
for(i in 1:m) {
d <- rep(i, n - i)
p <- seq(1, n - i)
fpoints <- data[d + p]
points <- data[1:(n - i)]
av <- mean(fpoints - points)
store[i] <- sum((fpoints - points - av)^2)/(n - 1 - i)
}
store}
msim
function(times = 9, lista = list(ar = 0.9), n = 210, mn = 0, sd = 1){
lags <- floor(n/4)
points <- seq(1, lags)
for(j in 1:times) {
x <- SimAr(n, model = lista, rnorm(n, mn, sd))
lines(points, vrg(x[10:n], lags)/vrg(x[10:n], 1), lty = 2)
autocors <- Gacf(x, plot = F, lmax = lags)$gacf
lines(points, (1 - autocors[2:(lags + 1)])/(1 - autocors[2]))}
}
50

The Use of The Variogram in Time Series Analysis

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

The Use of The Variogram in Time Series Analysis

Hochgeladen von

Copyright:

Verfügbare Formate

The use of the Variogram in

Time series analysis

George Krasadakis, University of Bath, 1997

• TEMP: After every transformation we apply on the original data, the

5.2 Using the Package

• ACVF Clicking on this button estimates and displays the Autocovariance

• SDIFF Transforms the original data to their Seasonal differences and

• RESET This function is useful in order to return to the initial state.

• PRINT This prints the current graphical output in Computer Graphics

Figure 1: A typical screen of the Software Package

Xt − φ1 Xt−1 − φ2 Xt−2 − ... − φp Xt−p = t − θ1 t−1 − θ2 t−2 − ... − θq t−q

where {t } is a white noise process, as defined above. It is also assumed

∇d Xt = (1 − B)d Xt = Xt − (d1 )Xt−1 + (d2 )Xt−2 − ... + (−1)d Xt−d (1)

7.2 Proposed estimators

V (Xt+m − Xt ) = V (Xt+m ) + V (Xt ) − 2Cov(Xt+m , Xt ) ⇐⇒

⇐⇒ C(m) = γ(0) − γ(m) = σ 2 − γ(m)

V (Xt+m − Xt) V (Xt+m ) + V (Xt ) − 2Cov(Xt+m , Xt )

8 The Variogram for Stationary Models

8.1 AR(1) model

Figure 2: AR(1) with φ1 = 0.5

Figure 3: AR(1) with φ1 = 0.9

Figure 4: AR(1) with φ1 = −0.9

Figure 5: Random Walk

8.2 MA(1) model

9 The Variogram for Non-Stationary Models

9.1 Random Walk

It is straightforward that V (Xt+m − Xt ) = mσ2 and V (Xt+1 − Xt ) = σ2 .

Figure 6: ARIMA(0,1,1) with φ1 = 0.2

Calculating the variance of the above expression we get:

= θ12 mσ2 + mσ2 − 2θ1 (mσ2 − σ2 ) = mσ2 (1 − θ1 )2 + 2θ1 σ2

2θ1 + m(1 − θ1 )2 2θ1 (1 − θ1 )2 m

Figure 7: ARIMA(0,1,2) with θ1 = 0.5 and θ2 = 0.7

Xt − Xt−1 = t − θ1 t−1 − θ2 t−2

After repeated substitution we express the difference Xt+m − Xt in terms of

We can either calculate the covariances of the above formula yielding

Both ways led to the following results:

and the differenced Variogram

Figure 8: ARIMA(0,1,2) with θ1 = 0.9 and θ2 = −0.5

Figure 9: ARIMA(1,1,0) with φ1 = 0.5

where Wt = Xt − Xt−1 the first differences of Xt , which by definition, follows

Calculating the variance of an ARMA(1,1) process we get:

Figure 10: Simulated ARIMA(1,1,1) with φ1 = 0.5 and θ1 = 0.5

We also find that if Xt follows a stationary ARMA(1,1) process, then it’s

In analogous way we find

where α, β, γ and δ are (complicated) functions of the parameters only. For

Figure 11: Ĝ(m) and D̂(m) for a simulated ARIMA(1,1,1) with φ1 =

Figure 12: Pure I(2) Process

9.6 Pure I(2) process

Some algebra yields that

Hence the final form of the Variogram will be:

Figure 13: An Example

Figure 14: ACF, Ĝ(m) and D̂(m) for a simulated series

Figure 15: Ĝ(m), for m=50

Figure 17: Chemical process temperature readings

12 The Chemical Temperature Dataset

Figure 20: Differenced chemical temperature data

of non-stationarity. Hence, after a careful examination of Ĝ(m) and

Hence based on the Variogram as well on the Autocorrelation function

13 The Chemical Concentration Dataset

Figure 22: The Chemical Concentration readings dataset

Finally we look at the estimated Autocorrelations and Partial Autocorrelations

14 The Chemical Process Viscosity Dataset

Figure 25: The Chemical Viscosity measurements

Xt − φ1 Xt−1 − φ2 Xt−2 − ... − φp Xt−p = t − θ1 t−1 − θ2 t−2 − ... − θq t−q

where {t } is a white noise process, as defined above. It is also assumed

It is straightforward that V (Xt+m − Xt ) = mσ2 and V (Xt+1 − Xt ) = σ2 .

= θ12 mσ2 + mσ2 − 2θ1 (mσ2 − σ2 ) = mσ2 (1 − θ1 )2 + 2θ1 σ2

Xt − Xt−1 = t − θ1 t−1 − θ2 t−2