Sie sind auf Seite 1von 461

GABRIELE CANTALUPPI

COMPUTATIONAL LABORATORY
FOR ECONOMICS
Notes for the students

GABRIELE CANTALUPPI

COMPUTATIONAL LABORATORY
FOR ECONOMICS
Notes for the students

Milano 2013

2012-2013 EDUCatt - Ente per il Diritto allo Studio Universitario dell'Universit Cattolica
2012-2013 Largo Gemelli 1, 20123 Milano - tel. 02.7234.22.35 - fax 02.80.53.215
2012-2013 e-mail: editoriale.dsu@educatt.it (produzione); librario.dsu@educatt.it (distribuzione)
2012-2013 web: www.educatt.it/libri
2012-2013 ISBN edizione cartacea: 978-88-6780-021-6
2012-2013 ISBN edizione elettronica: 978-88-6780-022-3
Ledizione cartacea di questo volume stata stampata nel mese di settembre 2013
presso la Litografia Solari (Peschiera Borromeo - Milano)

CONTENTS
Preface
1 Some Elements of Statistical Inference
1.1 On the Properties of the Sample Mean
1.1.1 The Normal Distribution Case
1.1.2 The Central Limit Theorem
2 An Introduction to Linear Regression
2.1 Example: Individual wages (2.1.2)
2.1.1 Data Reading and summary statistics
2.1.2 Some graphical representations and grouping statistics
2.1.3 Simple Linear Regression
2.1.4 Confidence intervals (Section 2.5.2)
2.2 Multiple Linear Regression (Section 2.5.5)
2.2.1 Parameter estimation
2.2.2 ANOVA to compare the two models (Section 2.5.5)
2.3 CAPM example (Section 2.7)
2.3.1 CAPM regressions (without intercept) (Table 2.3)
2.3.2 Testing an hypothesis on 1
2.3.3 CAPM regressions (with intercept) (Table 2.4)
2.3.4 CAPM regressions (with intercept and January dummy) (Table
2.5)
2.4 The Worlds Largest Hedge Found (Section 2.7.3)
2.5 Dummy Variables Treatment and Multicollinearity (Section 2.8.1)
2.6 Missing Data, Outliers and Influential Observations
2.7 How to check the form of the distribution
2.7.1 Data histogram with the theoretical density function
2.7.2 The 2 goodness-of-fit test
2.7.3 The Kolmorogov-Smirnov test
2.7.4 The PP-plot and the QQ-plot
2.7.5 Use of the function fit.cont
2.8 Two tests for assessing normality
2.8.1 The Jarque-Bera test
2.8.2 The Shapiro-Wilk test
2.9 Some further comments on the QQ-plot

ix
1
1
1
5
9
9
9
11
14
16
17
17
18
18
19
23
25
26
26
28
31
31
32
32
34
37
40
41
41
43
43

2.9.1
2.9.2
2.9.3
2.9.4

Positively skewed distributions


Negatively skewed distributions
Leptokurtic distributions
Platikurtic distributions

3 Interpreting and comparing Linear Regression Models


3.1 Explaining House Prices (Section 3.4)
3.1.1 Testing the functional form: construction of the RESET test
3.1.2 Testing the functional form: A direct function to perform the
RESET test
3.1.3 Testing the functional form: the RESET test for the extended
model
3.1.4 Testing the functional form: the interaction term
3.1.5 Prediction
3.1.6 Model with price instead of log(price) as dependent variable
and lotsize instead log(lotsize) among the predictors
3.1.7 The PE test to compare a loglinear specification with the linear
specification
3.2 Selection procedures: Predicting Stock Index Returns (Section 3.5)
3.2.1 The full model
2 criterion
3.2.2 The max R
3.2.3 Stepwise
3.2.4 An algorithm to perform a stepwise backward elimination of
regressors
3.2.5 AIC
3.2.6 BIC
3.2.7 A better output to compare the results
3.2.8 Some remarks on the AIC and BIC values
3.2.9 Out of sample forecasting performance (Table 3.5)
3.3 Explaining Individual Wages (Section 3.6)
3.3.1 Linear Models (Section 3.6.1)
3.3.2 Loglinear Models (Section 3.6.2)
3.3.3 The Effects of Gender (Section 3.6.3)
4 Heteroscedasticity and Autocorrelation
4.1 Explaining Labour Demand (Section 4.5)
4.1.1 Linear Model
4.1.2 Breusch-Pagan test - construction
4.1.3 Breusch-Pagan test - direct function
4.1.4 Loglinear model
4.1.5 White Heteroscedasticity test
4.1.6 Heteroscedasticity consistent covariance matrix
4.1.7 Estimated Generalized Least Squares
4.1.8 Types of Heteroscedasticity consistent covariance matrices
4.2 The Demand for Ice Cream (Section 4.8)
4.2.1 The Durbin-Watson statistic - construction

44
45
46
47
55
55
57
60
60
62
63
63
64
66
67
68
72
75
76
78
79
80
82
83
83
85
90
93
93
93
94
95
95
96
98
99
103
108
111

4.2.2
4.2.3
4.2.4

4.3

The Durbin-Watson statistic - direct function


Estimation of the first-order autocorrelation coefficient
The Breusch-Godfrey test to test the presence of autocorrelation - construction
4.2.5 The Breusch-Godfrey test to test the presence of autocorrelation - direct function
4.2.6 Some remarks on the procedure presented by Verbeek on page
113
4.2.7 The EGLS (iterative Cochrane-Orcutt) procedure
4.2.8 The model with the lagged temperature
Risk Premia in Foreign Exchange Markets (Section 4.11)
4.3.1 Tests for Risk Premia in the 1 month Market
4.3.2 Tests for Risk Premia using Overlapping Samples

112
113
115
117
117
118
120
122
124
129

5 Endogeneity, Instrumental Variables and GMM


5.1 Estimating the Returns to Schooling (Section 5.4)
5.2 Example of an application of the Generalized Method of Moments
5.3 Estimating Intertemporal Asset Pricing Models (Section 5.7)

137
137
142
144

6 Maximum Likelihood Estimation and Specification Tests


6.1 Normal distribution
6.2 Bernoulli distribution
6.3 Exponential distribution
6.4 Poisson distribution
6.5 Linear model
6.6 Individual wages (Section 2.5.5)

151
152
162
166
172
177
182

7 Models with Limited Dependent Variables


187
7.1 The Impact of Unemployment Benefits on Recipiency (Section 7.1.6) 187
7.1.1 Estimation of the linear probability model
188
7.1.2 Estimation of the Logit model
189
7.1.3 Estimation of the Probit model
191
7.1.4 A unique table for comparing model estimates
192
7.1.5 Some additional goodness of fit measures
195
7.2 Some remarks on the interpretation of a parameter in a logit model 198
7.3 Explaining Firms Credit Ratings (Section 7.2.1)
200
7.4 Willingness to Pay for Natural Areas (Section 7.2.4)
204
7.5 Patent and R&D Expenditures (Section 7.3.2)
211
7.6 Expenditures on Alcohool and Tobacco (Part 1) (Section 7.4.3)
224
7.7 Expenditures on Alcohool and Tobacco (Part 2) (Section 7.5.4)
228
8 Univariate Time Series Models
8.1 Some examples of stochastic processes
8.1.1 The Gaussian White Noise
8.1.2 The Autoregressive Process
8.1.3 The Moving Average Process
8.1.4 Simulation of a realization from an AR(1) process with drift

239
239
239
240
244
247

8.2

Autocorrelation, Partial autocorrelation functions and ARMA model


identification
8.2.1 Autocorrelation and Partial autocorrelation functions for an
AR(1) process with drift
8.2.2 Autocorrelation and Partial autocorrelation functions for some
AR(p) processes with drift
8.2.3 Autocorrelation and Partial autocorrelation functions for a
MA(1) process
8.2.4 Autocorrelation and Partial autocorrelation functions for some
MA(p) processes
8.2.5 Autocorrelation and Partial autocorrelation functions for an
ARMA(1,1) process
8.2.6 Problems in identifying an ARMA model for a time series
8.3 On the bias of the OLS estimator of the autoregressive coefficient for
an AR(1) process with AR(1) errors
8.3.1 Some remarks on the use of the function curve
8.4 Estimation of ARIMA Models with the function arima
8.4.1 No unit roots in the characteristic equation p (z) = 0
8.4.2 1 unit root in the characteristic equation p+1 (z) = 0
8.4.3 2 unit roots in the characteristic equation p+2 (z) = 0
8.5 Some other R functions for ARMA model parameter estimation
8.5.1 The arima function
8.5.2 The sarima function in the package astsa
8.5.3 The Arima function in the package forecast
8.5.4 The armaFit function
8.5.5 The FitARMA function
8.5.6 The ar function
8.5.7 The arima function in the package TSA
8.6 R functions for predicting with ARMA models
8.7 Stock Prices and Earnings (Section 8.4.4)
8.7.1 Dickey-Fuller test - construction
8.7.2 Dickey-Fuller test - direct function
8.7.3 How to produce the Dickey-Fuller statistic for different lags
8.7.4 Other tests for unit roots detection
8.7.5 Testing for multiple unitary roots
8.8 Some remarks on the function ur.df
8.8.1 The Dickey-Fuller test for a unit root, type "none"
8.8.2 Dickey-Fuller test for a unit root, type drift
8.8.3 Dickey-Fuller test for a unit root, type trend
8.8.4 Example
8.8.5 Exercise
8.8.6 Exercise
8.9 Long-run Purchasing Power Parity (Part 1) (Section 8.5)
8.10 The Persistence of Inflation (Section 8.8)
8.10.1 AR estimation
8.10.2 The Ljung-Box statistic - construction

249
249
254
262
262
270
270
275
280
280
283
286
291
295
296
297
299
299
300
301
302
302
306
307
308
315
315
317
322
322
323
323
324
333
336
338
345
350
351

8.10.3 The Ljung-Box statistic - direct function


8.10.4 AR estimation via Maximum Likelihood
8.10.5 AR(4) estimation
8.10.6 ARMA estimation
8.10.7 AR(6) estimation
8.10.8 Non complete models
8.11 The Expectations Theory of the Term Structure (Section 8.10)
8.12 Autoregressive Conditional Heteroscedasticity
8.12.1 A Brief Presentation of ARCH Processes
8.12.2 A First Example
8.13 Volatility in Daily Exchange Rates (Section 8.11.3)

351
352
352
353
354
356
358
363
363
365
378

9 Multivariate Time Series Models


9.1 Spurious Regression (Section 9.2.1)
9.2 Long-run Purchasing Power Parity (Part 2) (Section 9.3)
9.3 Long-run Purchasing Power Parity (Part 3) (Section 9.5.4)
9.4 Money Demand and Inflation (Section 9.6)

391
391
393
397
399

10 Models based on panel data


10.1 Explaining Individual Wages (Section 10.3)
10.2 Explaining Capital Structure (Section 10.5)

409
409
419

11

431
431

References
A Some useful R functions
A.1 How to Install R
A.2 How to Install and Update Packages
A.3 Data Reading
A.3.1 zip files
A.3.2 Reading from a text file
A.3.3 Reading from a Stata file
A.3.4 Reading from an EViews file
A.3.5 Reading from a Microsoft Excel file
A.4 formula{stats}
A.5 linear model
A.6 Deducer

435
436
436
436
436
437
438
438
438
439
441
444

B Addendum 3rd edition


B.1 Annual Price/Earnings Ratio (Section 8.4.4 third edition)
B.1.1 Dickey-Fuller test
B.1.2 Testing for multiple unitary roots
B.2 Modelling the Price/Earnings Ratio (Section 8.7.5 third edition)
B.2.1 AR estimation
B.2.2 The Ljung-Box statistic
B.2.3 AR estimation via Maximum Likelihood
B.2.4 MA estimation

449
449
449
452
452
454
456
457
457

B.3
B.4
B.5
B.6

B.2.5 Non complete models


Volatility in Daily Exchange Rates
Long-run Purchasing Power Parity
Long-run Purchasing Power Parity
Long-run Purchasing Power Parity

(Section
(Part 1)
(Part 2)
(Part 3)

8.10.3 third edition)


(Section 8.5 third edition)
(Section 9.3)
(Section 9.5.4)

458
463
471
477
480

PREFACE
These Lecture Notes refer to the examples and illustrations proposed in the book A
Guide to Modern Econometrics by Marno Verbeek (4th and 3rd editions).
The source codes here described are written in the R language (R Development
Core Team 2012) (R version 3.0.1 was used).
Subjects are presented in the course Computational Laboratory for Economics
held at Universit`
a Cattolica del Sacro Cuore, Graduate Program Economics. The
course runs in parallel with the course Empirical Economics where the methodological
background is assessed.
Attention was paid in order to obtain results first according to their mathematical
structure, and then by using appropriate built-in R functions, anyway searching for
an efficient and elegant programming style.
The reader is assumed to possess the basic knowledge of R. An introduction to R by
Longhow Lam, available on http://www.splusbook.com/RIntro/RCourse.pdf may
represent a good reference.
Chapters from 2 to 10 recall the contents of Verbeeks Guide. Appendix A1 describes
how to read data from text, Stata and EViews files, which are the formats used by
Verbeek on his book website, where data sets are available. Appendix B contains
results for examples which were present on the 3rd edition of Verbeeks Guide.
Some companion materials to these Lecture Notes can be downloaded from the
booksite www.educatt.it/libri/materiali.
I warmly thank Diego Zappa and Giuseppe Boari for having read parts of the
manuscript. I wish to thank Stefano Iacus for his short course on an efficient and
advanced use of R, and Achim Zeileis, Giovanni Millo and Yves Croissant for having
improved their packages lmtest and plm in order to properly fit some problems here
presented.

1
Some Elements of Statistical
Inference
1.1

On the Properties of the Sample Mean

Consider a random variable X with mean E(X) = and variance V ar(X) = 2 .


Let (x1 , . . . xn ) be a realization of the n-dimensional random variable X1 , . . . Xn ,
whose components are identically and independently distributed as X.
The random variable sample mean
n

has the properties:


and

1.1.1

X
= 1
Xi
X
n i=1

(1.1)

=
E(X)

(1.2)

= 2 /n.
V ar(X)

(1.3)

The Normal Distribution Case

We consider, as an example giving evidence of the properties of the sample mean,


the empirical distribution of the sample means for k = 100 replications of samples of
size n = 5 of pseudo-random numbers from a Normal distribution, with mean = 4
and variance 2 = 2. We remind that, since the sum of normally distributed random
will be
variables is a normally distributed random variable, also the sample mean X
normally distributed.
By means of the following code it is possible to create an array x whose elements are
the sample means evaluated for the k replications of n pseudo-random numbers from
X N ( = 4, 2 = 2).
>
>
>
>
>

set.seed(1000)
k <- 100
n <- 5
mean <- 4
sigma2 <- 2

Some Elements of Statistical Inference


Table 1.1 Four samples of size 5 from
X N ( = 4, 2 = 1)
and their sample mean

1
2
3
4

x1

x2

x3

x4

x5

3.37
3.45
2.61
4.24

2.29
3.33
3.22
4.22

4.06
5.02
4.17
4.04

4.90
3.97
3.83
1.11

2.89
2.06
2.11
4.30

3.50
3.57
3.19
3.58

> sd <- sigma2^0.5


> x <- replicate(k, mean(rnorm(n, mean, sd = sigma2^0.5)))
By initializing the seed with the instruction set.seed(), we fix the starting point
of the pseudo-random number generation routine; so, when needed, it is possible, by
invoking this instruction, to reproduce the same sequence of pseudo-random numbers.
The instruction rnorm(n,mean,sd) generates n pseudo-random numbers from a
normal random variable with mean = mean and standard deviation = sd.
Note that the third argument of the instruction makes reference to the standard
deviation and not to the variance 2 of the distribution. So in the preceding code
the command rnorm may also be stated as: rnorm(n, mean, sd), which is equivalent
to rnorm(n=n, mean=mean, sd=sd), having respected the ordering of the arguments.
The instruction replicate(n,expr) repeats for the number of times n, specified as
first argument, the expression expr (possibly a function) defined as second argument.
The result is a vector with the returns of expr in the n replications.
Table 1.1 reports 4 realizations of X1 , . . . , X5 (Xi N ( = 4, 2 = 2)) and in the
last column the corresponding value of the sample mean. We remark that each
replication differs, with probability 1, from the other ones; it follows the stochastic
nature also of the sample mean.
Table 1.1 may be reproduced by using the following code:
> set.seed(1000)
> sampletable <- matrix(rnorm(20, mean, sd), 4, 5,
byrow = TRUE)
> xbar <- rowMeans(sampletable)
> sampletable <- cbind(sampletable, xbar)
Figure 1.1, obtained with the code hist(x,breaks=15), reports the histogram of the
generated above.
k = 100 realizations of the random variable sample mean, X,
We can study in more depth the behaviour in presence of k = 50, 100, 500, 1000
replications for each of the sample sizes n = 9, 25, 64, 100 from the same Normal
distribution N ( = 4, 2 = 2), see Fig. 1.2, and we can observe that, according to
relationship (1.3), the dispersion of the sample mean estimator reduces when n
increases, while for large values of k the histograms resemble better the behaviour
of a Normal density function.
Fig. 1.2 may be obtained by means of the following code.

Frequency

10

15

Some Elements of Statistical Inference

2.5

3.0

3.5

4.0

4.5

5.0

5.5

Figure 1.1 Distribution of the sample mean from X N (4, 2);


sample size: n = 5, number of replications: k = 100

>
>
>
>
>

>
>
>
>
>

set.seed(1000)
kvals <- c(50, 100, 500, 1000)
nvals <- c(9, 25, 64, 100)
X <- data.frame(k = NA, n = NA, xbar = NA)
for (k in kvals) {
for (n in nvals) {
set.seed(1000)
X <- rbind(X, cbind(k = k, n = n, xbar = replicate(k,
mean(rnorm(n, 4, 1)))))
}
}
X <- X[-1, ]
X$k <- factor(X$k)
X$n <- factor(X$n)
library(lattice)
histogram(~xbar | k:n, data = X, breaks = seq(from = min(X$xbar),

Some Elements of Statistical Inference

to = max(X$xbar), length = 25), type = "density",


as.table = TRUE, xlab = paste("n = ", paste(nvals,
collapse = ", ")), ylab = paste("k = ", paste(rev(kvals),
collapse = ", ")))
>
>
>
>
>

>
>
>
>
>

set.seed(1000)
kvals <- c(50, 100, 500, 1000)
nvals <- c(9, 25, 64, 100)
X <- data.frame(k = NA, n = NA, xbar = NA)
for (k in kvals) {
for (n in nvals) {
set.seed(1000)
X <- rbind(X, cbind(k = k, n = n, xbar = replicate(k,
mean(rnorm(n, 4, 1)))))
}
}
X <- X[-1, ]
X$k <- factor(X$k)
X$n <- factor(X$n)
library(lattice)
histogram(~xbar | k:n, data = X, breaks = seq(from = min(X$xbar),
to = max(X$xbar), length = 25), type = "density",
as.table = TRUE, xlab = paste("n = ", paste(nvals,
collapse = ", ")), ylab = paste("k = ", paste(rev(kvals),
collapse = ", ")))

kvals and nvals are arrays containing respectively the values of the variables k and
n in the 16 situations depicted in Fig. 1.2.
X<-data.frame(k=NA,n=NA,xbar=NA) defines a data.frame X, with three columns
named k, n and xbar; the rows of X will contain the number (k) of replications and
the sample size (n) of the experiment, which the sample mean (xbar) refers to.
The sample means are evaluated for the k = 50, 100, 500, 1000 replications of n =
9, 25, 64, 100 pseudo-random numbers from X N ( = 4, 2 = 1). The construction
of X, which may seem a bit clumsy, will simplify the production of the graphs in Fig.
1.2 by means of the function histogram in the package lattice.
The assignment of the rows of X is obtained by using a double for cycle.
Observe that the pseudo-random numbers are generated, by the function replicate,
in blocks (arrays) of dimension k.
cbind binds column/matrix elements in a single matrix: in the present case blocks are
constructed, which contain in the first and second column the k and n identifiers and
in the third column the values of the sample means. All the blocks are subsequently
stacked in the X matrix by means of the function rbind.
By initializing the seed for each n, the first generated samples do not vary when we
increase the number k of replications.
Variables k and n in the data.frame X are then assigned the nature (class) of factors,
that is categorical variables, to simplify the graphical representation by means of the

Some Elements of Statistical Inference

3.0 3.5 4.0 4.5 5.0

3.0 3.5 4.0 4.5 5.0

50:9

50:25

50:64

50:100

100:9

100:25

100:64

100:100

4
3
2
1
0

k = 1000, 500, 100, 50

4
3
2
1
0

500:9

500:25

500:64

500:100

1000:9

1000:25

1000:64

1000:100

4
3
2
1
0

4
3
2
1
0
3.0 3.5 4.0 4.5 5.0

3.0 3.5 4.0 4.5 5.0

n = 9, 25, 64, 100

Figure 1.2 Distribution of the sample mean from X N (4, 1);


n: sample size, k: number of replications

function histogram available in the package lattice.


The function histogram is applied to represent the values of the sample means (xbar)
classified according to the different levels of the interaction of k and n, see the R help
?lattice::histogram for more information on the function.

1.1.2

The Central Limit Theorem

We now consider what happens when X, random variable with E(X) = and variance
V ar(X) = 2 , is not Normally distributed.
is the sample mean from (x1 , . . . xn ), realization of the n-dimensional random
If X
variable X1 , . . . Xn , whose components are identically and independently distributed
as X, by invoking the central limit theorem we have asymptotically that:

N (, 2 /n).

(1.4)

We remark that in this instance we have not required X to be normally distributed.

Some Elements of Statistical Inference

To give evidence to the central limit theorem result let X1 , . . . Xn be identically


and independently distributed as a Uniform(0,1) or an Exponential( = 4) random
variable, whose density functions are respectively:

1
0<y<1
Y U (0, 1) :
f (y) =
0
elsewhere

ew
0<w<
W Exp() :
f (w; ) =
0
elsewhere
with expected values:
E(Y ) =
and variances:

1
2

and

E(W ) =

1
1
and
V ar(W ) = 2 .
12

If we study the behaviour of the sample mean in presence of k = 50, 100, 500, 1000
replications for the sample sizes n = 9, 25, 64, 100 from the above distributions X
U (0, 1) and X Exp(), we can observe that, according to relationship (1.4), the
dispersion of the sample mean estimator reduces when n increases; while when k
gets larger the distribution of the sample mean is approximated by a Normal random
variable.
Figures 1.3 and 1.4 give evidence of the result and can be obtained by using the
same code producing Fig. 1.2, after having substituted the instruction
X<- rbind(X,cbind("k"=k,"n"=n,"xbar"=replicate(k,mean(rnorm(n,4,1)))))
with the code:
X<- rbind(X,cbind("k"=k,"n"=n,"xbar"=replicate(k,mean(runif(n)))))
for the uniform case and
X<- rbind(X,cbind("k"=k,"n"=n,"xbar"=replicate(k,mean(rexp(n,4)))))
for the exponential case.
V ar(Y ) =

Some Elements of Statistical Inference

0.2 0.3 0.4 0.5 0.6 0.7 0.8

0.2 0.3 0.4 0.5 0.6 0.7 0.8

50:9

50:25

50:64

50:100

100:9

100:25

100:64

100:100

10
5
0

k = 1000, 500, 100, 50

10
5
0

500:9

500:25

500:64

500:100

1000:9

1000:25

1000:64

1000:100

10
5
0

10
5
0
0.2 0.3 0.4 0.5 0.6 0.7 0.8

0.2 0.3 0.4 0.5 0.6 0.7 0.8

n = 9, 25, 64, 100

Figure 1.3 Distribution of the sample mean from Y U (0, 1);


n: sample size, k: number of replications

Some Elements of Statistical Inference

0.1 0.2 0.3 0.4 0.5 0.6

0.1 0.2 0.3 0.4 0.5 0.6

50:9

50:25

50:64

50:100

100:9

100:25

100:64

100:100

15
10
5
0

k = 1000, 500, 100, 50

15
10
5
0

500:9

500:25

500:64

500:100

1000:9

1000:25

1000:64

1000:100

15
10
5
0
15
10
5
0
0.1 0.2 0.3 0.4 0.5 0.6

0.1 0.2 0.3 0.4 0.5 0.6

n = 9, 25, 64, 100

Figure 1.4 Distribution of the sample mean from W Exp(4);


n: sample size, k: number of replications

2
An Introduction to Linear
Regression
2.1

Example: Individual wages (2.1.2)

We have first to read the data, available in the file wages1.dat, included in the
compressed file ch02.zip.

2.1.1

Data Reading and summary statistics

The function read.table allows one to read from a text data set file, where data
have been stored in text format, and create a data.frame, see Appendix A.3. The
data set file is assumed to be in a tabular form with one or more spaces or a tab as
field separator. The function unzip extracts a file from a compressed archive.
> wages1 <- read.table(unzip("ch02.zip", "Chapter 2/wages1.dat"),
header = T)
The description of the variables is provided in the file wages1.txt:

exper: experience in years

male: 1 if male, 0 if female

school: years of schooling

wage: wage (in 1980 $) per hour

To explore the initial and the final part of a data frame use the functions head and
tail.
> head(wages1)
EXPER MALE SCHOOL
1
9
0
13
2
12
0
12
3
11
0
11
4
9
0
14
5
8
0
14
6
9
0
14

WAGE
6.315296
5.479770
3.642170
4.593337
2.418157
2.094058

10

An Introduction to Linear Regression

> tail(wages1)
3289
3290
3291
3292
3293
3294

EXPER MALE SCHOOL


WAGE
5
1
8 5.512004
6
1
9 4.287114
5
1
9 7.145190
6
1
9 4.538784
10
1
8 2.909113
7
1
7 4.153974

The function summary produces some statistics summarizing the columns (variables)
of the data frame. The results may be compared with the sample statistics provided
by Verbeek in the file wages1.txt.
> summary(wages1)
EXPER
Min.
: 1.000
1st Qu.: 7.000
Median : 8.000
Mean
: 8.043
3rd Qu.: 9.000
Max.
:18.000
WAGE
Min.
: 0.07656
1st Qu.: 3.62157
Median : 5.20578
Mean
: 5.75759
3rd Qu.: 7.30451
Max.
:39.80892

MALE
Min.
:0.0000
1st Qu.:0.0000
Median :1.0000
Mean
:0.5237
3rd Qu.:1.0000
Max.
:1.0000

SCHOOL
Min.
: 3.00
1st Qu.:11.00
Median :12.00
Mean
:11.63
3rd Qu.:12.00
Max.
:16.00

If you want all the sample statistics provided in the file wages1.txt you can use the
function vsummary defined by the following code1 :
> vsummary0 <- function(x) c(Obs = length(x), Mean = mean(x),
Std.Dev. = sd(x), Min = min(x), Max = max(x),
na = sum(is.na(x)))
> vsummary <- function(x) t(apply(x, 2, vsummary0))
> vsummary(wages1)
EXPER
MALE
SCHOOL
WAGE

Obs
Mean Std.Dev.
3294 8.0434123 2.2906610
3294 0.5236794 0.4995148
3294 11.6305404 1.6575447
3294 5.7575850 3.2691858

Min
Max na
1.00000000 18.00000 0
0.00000000 1.00000 0
3.00000000 16.00000 0
0.07655561 39.80892 0

1 We add the information regarding the possible presence of missing values. The function is.na
returns the logical value TRUE if its argument is identified as not available (NA), otherwise FALSE.

40

An Introduction to Linear Regression

11

30

20

females

males

10

Figure 2.1

2.1.2

Box & Whiskers plot of wages for males and females

Some graphical representations and grouping statistics

Lets compare the wages for males and females. A useful graphical representation is
the Box & Whiskers plot, see Fig. 2.1. Recall that the levels of the three lines defining
the box correspond respectively to the first, the second and the third Quartile of the
data (the second Quartile is the median). The values placed outside the two whiskers
may be considered anomalous with respect to the other data, see Chambers et. al.
(1983).
We can obtain the graph by having recourse to the function boxplot. The first
argument in this function is a formula, see Appendix A.4, establishing that we are
studying the WAGE as a function (~) of the gender (dummy variable MALE). The second
argument is the name of the data.frame containing the involved variables. By using
the third argument we attribute proper names to the values 0 and 1, that are assumed
by the variable MALE, which will appear on the graph.
> boxplot(WAGE ~ MALE, data = wages1, names = c("females",
"males"))
We can also represent the wage as a function of the years of experience, see Fig. 2.2

12

40

An Introduction to Linear Regression

20

10

WAGE

30

10

15

40

EXPER

30

20

10

Figure 2.2
experience

9 10

12

14

16

18

Scatterplot and Box & Whiskers plot of wages by the number of years of

> layout(1:2)
> plot(WAGE ~ EXPER, data = wages1)
> boxplot(WAGE ~ EXPER, data = wages1)
The function plot results in a scatter plot diagram of the involved variables. The
function layout(matrix) creates a multifigure environment, the numbers in the
matrix (in our instance a column vector) define the pointer sequence specifying the
order the different graphs will appear.
We may desire to produce different graphs, for males and females, representing the
wage as a function of the years of experience, see Fig. 2.3. It is preferable to first
recode the dummy variable MALE in a categorical one, e.g. GENDER, that is a factor,
whose levels are f and m.
> wages1$gender <- as.factor(wages1$MALE)
> levels(wages1$gender) <- c("f", "m")
Finally we can produce the boxplot by studying the wage as a function of the
interaction (:) between experience and gender, that is as a function of the set of

40

An Introduction to Linear Regression

13

30

20

10

9 10

40

12

14

16

18

30

20

10

1.f

4.f

7.f

10.f 13.f 16.f 1.m

4.m

7.m

11.m

15.m

Figure 2.3 Scatterplot and Box & Whiskers plot of wages by gender and the number of
years of experience

pairs of levels assumed by the two variables.


> layout(1:2)
> boxplot(WAGE ~ EXPER, data = wages1)
> boxplot(WAGE ~ EXPER:gender, data = wages1)
An easy way to obtain summary results for the variables in the data.frame, separately
for females and males is by means of the instruction by.
The first argument is an array or a data.frame or a matrix on whose columns
the function specified as third argument will be applied. The second argument is a
grouping variable whose length must be equal to the number of rows of the object
given as first argument.
We omit from the analysis the categorical variable gender (fifth column in the
data.frame wages1).
> by(wages1[, -5], wages1$MALE, summary)
wages1$MALE: 0
EXPER
MALE
SCHOOL

WAGE

14

An Introduction to Linear Regression

Min.
: 1.000
Min.
:0
Min.
: 5.00
Min.
: 0.07656
1st Qu.: 6.000
1st Qu.:0
1st Qu.:11.00
1st Qu.: 3.17564
Median : 8.000
Median :0
Median :12.00
Median : 4.69326
Mean
: 7.732
Mean
:0
Mean
:11.84
Mean
: 5.14692
3rd Qu.: 9.000
3rd Qu.:0
3rd Qu.:13.00
3rd Qu.: 6.53275
Max.
:16.000
Max.
:0
Max.
:16.00
Max.
:32.49740
-----------------------------------------------wages1$MALE: 1
EXPER
MALE
SCHOOL
WAGE
Min.
: 2.000
Min.
:1
Min.
: 3.00
Min.
: 0.1535
1st Qu.: 7.000
1st Qu.:1
1st Qu.:10.00
1st Qu.: 4.0290
Median : 8.000
Median :1
Median :12.00
Median : 5.6543
Mean
: 8.326
Mean
:1
Mean
:11.44
Mean
: 6.3130
3rd Qu.:10.000
3rd Qu.:1
3rd Qu.:12.00
3rd Qu.: 7.8913
Max.
:18.000
Max.
:1
Max.
:16.00
Max.
:39.8089

2.1.3

Simple Linear Regression

Lets study by a linear regression model how the mean level of the variable WAGE
changes as a function of the gender: we can regress the variable WAGE on the dummy
variable MALE, which assumes value 1 when the subject is male and 0 when she is
female. We make use of the function linear model (lm); the first argument is the
regression formula, where the ~ symbol separates the dependent variable from the
independent one. The intercept is enclosed by default. The data argument specifies
the name of the data.frame containing the data.
We are thus studying the model
WAGE = 1 + 2 MALE + ERROR

(2.1)

whose parameter estimates are reported in Verbeeks Table 2.1.


> regr2.1 <- lm(WAGE ~ MALE, data = wages1)
With the function summary it is possible to summarize the results contained in the
lm object regr2.1 produced by the linear model estimation.
> summary(regr2.1)
Call:
lm(formula = WAGE ~ MALE, data = wages1)
Residuals:
Min
1Q Median
-6.160 -2.102 -0.554

3Q
Max
1.487 33.496

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.14692
0.08122
63.37
<2e-16 ***

An Introduction to Linear Regression

15

MALE
1.16610
0.11224
10.39
<2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 3.217 on 3292 degrees of freedom
Multiple R-squared: 0.03175,
Adjusted R-squared: 0.03145
F-statistic: 107.9 on 1 and 3292 DF, p-value: < 2.2e-16
Recall that, in R, every result is an object and that the instructions names and str2
allow one to discover respectively the element names and the structure of any object.
> names(regr2.1)
[1] "coefficients"
[4] "rank"
[7] "qr"
[10] "call"

"residuals"
"fitted.values"
"df.residual"
"terms"

"effects"
"assign"
"xlevels"
"model"

Thus the object regr2.1 is a list containing 12 elements. If we want to extract one of
its elements, e.g. the coefficients, we may invoke one of the 3 following commands:
> regr2.1$coefficients
(Intercept)
MALE
5.146924
1.166097
> regr2.1["coefficients"]
$coefficients
(Intercept)
MALE
5.146924
1.166097
> regr2.1[["coefficients"]]
(Intercept)
MALE
5.146924
1.166097
obtaining respectively a vector, a list and again a vector.
Pay attention! The command3
> regr2.1["coefficients"] %*% c(1,2)
returns an Error, since the result of regr2.1["coefficients"] is a list and not a
vector and cannot be used as an argument of a matrix product. See Chapter 2 of
Longhow Lam (2010) for the definition of the Data Objects: list and vector.
Remember to use always double square brackets to extract elements in form of vectors
from a list object. The following instructions are correct:
> regr2.1[["coefficients"]] %*% c(1, 2)
2 We

omit to report the call and the result of the function str(regr2.1).
the help ?Arithmetic to have information on arithmetic operators in R: here %*% stands for
the matrix product.
3 See

16

An Introduction to Linear Regression

[,1]
[1,] 7.479118
> regr2.1$coefficients %*% c(1, 2)
[,1]
[1,] 7.479118
Other useful statistics resulting from a regression analysis are available in the
object obtained by applying the function summary to the result of lm; so
names(regr2.1) and names(summary(regr2.1)) give different information. The
result of summary(regr2.1) is itself a list containing 11 elements.
> output <- summary(regr2.1)
> names(output)
[1] "call"
"terms"
[4] "coefficients" "aliased"
[7] "df"
"r.squared"
[10] "fstatistic"
"cov.unscaled"
> output$fstatistic
value
numdf
dendf
107.9338
1.0000 3292.0000

2.1.4

"residuals"
"sigma"
"adj.r.squared"

Confidence intervals (Section 2.5.2)

To test whether the parameter 2 is zero, that is to test the null hypothesis
H0 : 2 = 0, we can construct a confidence interval at level (1 ).
We have first to recall the coefficient estimates, their standard errors and the
degrees of freedom, we must establish a value for and determine the corresponding
percentage points for the t random variable.

As we have just recalled regr2.1$coefficients and regr2.1$df extract


respectively the coefficients and the degrees of freedom from the object regr2.1.
output$cov.unscaled extracts from the object output the matrix (X 0 X)1 .
The instruction is equivalent to summary(regr2.1)$cov.unscaled, remembering that we have assigned to the object output the result of summary(regr2.1).
Finally the function diag extracts the main diagonal from a matrix and by
means of qt(p,df) it is possible to define the p quantile of a t distribution
with df degrees of freedom.

> regr2.1$coefficients
(Intercept)
MALE
5.146924
1.166097
> coefse <- output$sigma * diag(output$cov.unscaled)^0.5
> coefse
(Intercept)
MALE
0.08122482 0.11224216

An Introduction to Linear Regression

17

> regr2.1$df
[1] 3292
> alpha <- 0.05
> qt(1 - alpha/2, regr2.1$df)
[1] 1.960685
The lower and upper bounds of the MALE coefficient result respectively:
> regr2.1$coefficients[2] + c(-1, 1) * qt(1 - alpha/2,
regr2.1$df) * output$sigma * output$cov.unscaled[2,
2]^0.5
[1] 0.946 1.386
The confidence intervals, based on the t distribution, may also be obtained directly
for all parameter estimates, by using the function confint:
> confint(regr2.1, level = 1 - alpha)
2.5 % 97.5 %
(Intercept) 4.988 5.306
MALE
0.946 1.386

2.2

Multiple Linear Regression (Section 2.5.5)

2.2.1

Parameter estimation

We want to obtain the parameter estimates of the following linear model:


WAGE = 1 + 2 MALE + 3 SCHOOL + 4 EXPER + ERROR

(2.2)

The function lm allows us to perform also a linear regression with more variables as
regressors.
As we have already stated, the symbol ~ separates in a formula the dependent
variable from the independent ones and the + symbol, preceding a variable, indicates
the presence of that variable in the model. The intercept is enclosed by default. See
Appendix A.4.
With the following syntax we declare we desire to study, by making use of a linear
model (lm), the relationship between the variable WAGE and the set of independent
variables MALE, SCHOOL and EXPER for the data.frame wages1.
> regr2.2 <- lm(WAGE ~ MALE + SCHOOL + EXPER, data = wages1)
> summary(regr2.2)
Call:
lm(formula = WAGE ~ MALE + SCHOOL + EXPER, data = wages1)
Residuals:
Min
1Q Median
-7.654 -1.967 -0.457

3Q
Max
1.444 34.194

18

An Introduction to Linear Regression

Coefficients:
Estimate Std. Error t value
(Intercept) -3.38002
0.46498 -7.269
MALE
1.34437
0.10768 12.485
SCHOOL
0.63880
0.03280 19.478
EXPER
0.12483
0.02376
5.253
--Signif. codes: 0 "***" 0.001 "**" 0.01

Pr(>|t|)
4.50e-13
< 2e-16
< 2e-16
1.59e-07

***
***
***
***

"*" 0.05 "." 0.1 " " 1

Residual standard error: 3.046 on 3290 degrees of freedom


Multiple R-squared: 0.1326,
Adjusted R-squared: 0.1318
F-statistic: 167.6 on 3 and 3290 DF, p-value: < 2.2e-16

2.2.2

ANOVA to compare the two models (Section 2.5.5)

To establish if the variables SCHOOL and EXPER add a significant joint effect to the
variable MALE for explaining the dependent variable WAGE, we can compare the latter
model we have estimated (2.2) with (2.1) by using the function anova which performs
an analysis of variance in presence of nested models, see Verbeek p. 27. The first
argument of anova is the object resulting from lm applied to the simpler model, the
second argument is the lm object from the estimation of the more complex model.
> anova(regr2.1, regr2.2)
Analysis of Variance Table
Model 1: WAGE ~ MALE
Model 2: WAGE ~ MALE + SCHOOL + EXPER
Res.Df
RSS Df Sum of Sq
F
Pr(>F)
1
3292 34077
2
3290 30528 2
3549 191.24 < 2.2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1

2.3

CAPM example (Section 2.7)

We can import as we made in Section 2.1 the data from the data set capm.dat.
> capm <- read.table(unzip("ch02.zip", "Chapter 2/capm.dat"),
header = T)
We remind that by using the functions head(), tail() and summary() applied to
the data.frame capm it is possible to explore the beginning and the final sections of
the data.frame and to obtain the summary statistics for all the variables included
in capm.

An Introduction to Linear Regression

19

The data set contains information on stock market data, see the file capm.dat. Data,
pertaining the following variables, were collected from January 1960 to December
2006.

foodrf: excess returns food industry

durblrf: excess returns durables industry

constrrf: excess returns construction industry

rmrf: excess returns market portfolio

rf: risk free return

jan: dummy for January

smb: excess return on the Fama-French size (small minus big) factor

hml: excess return on the Fama-French value (high minus low) factor

2.3.1

CAPM regressions (without intercept) (Table 2.3)

Verbeek first considers the parameter estimation of the following three linear
regression models where the intercept is not included.
foodrf = 1 rmrf + ERROR

(2.3)

durblrf = 1 rmrf + ERROR

(2.4)

constrrf = 1 rmrf + ERROR

(2.5)

Observe the presence of the element -1 in the following formulae, first arguments of
the call to lm. It drops the intercept from the list of the regressors. See Appendix A.4.
> regr2.3f <- lm(foodrf ~ -1 + rmrf, data = capm)
> regr2.3d <- lm(durblrf ~ -1 + rmrf, data = capm)
> regr2.3c <- lm(constrrf ~ -1 + rmrf, data = capm)
Food
> summary(regr2.3f)
Call:
lm(formula = foodrf ~ -1 + rmrf, data = capm)
Residuals:
Min
1Q
-13.539 -1.026

Median
0.141

3Q
1.745

Max
15.924

Coefficients:
Estimate Std. Error t value Pr(>|t|)
rmrf 0.75774
0.02579
29.39
<2e-16 ***
---

20

An Introduction to Linear Regression

Signif. codes:

0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1

Residual standard error: 2.884 on 609 degrees of freedom


Multiple R-squared: 0.5864,
Adjusted R-squared: 0.5857
F-statistic: 863.5 on 1 and 609 DF, p-value: < 2.2e-16
Durables
> summary(regr2.3d)
Call:
lm(formula = durblrf ~ -1 + rmrf, data = capm)
Residuals:
Min
1Q Median
-9.6504 -1.9420 -0.3069

3Q
Max
1.7332 17.8871

Coefficients:
Estimate Std. Error t value Pr(>|t|)
rmrf 1.04736
0.02775
37.74
<2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 3.105 on 609 degrees of freedom
Multiple R-squared: 0.7005,
Adjusted R-squared:
F-statistic: 1424 on 1 and 609 DF, p-value: < 2.2e-16

0.7

Construction
> summary(regr2.3c)
Call:
lm(formula = constrrf ~ -1 + rmrf, data = capm)
Residuals:
Min
1Q
-12.9414 -1.7193

Median
-0.1866

3Q
1.4458

Max
11.6551

Coefficients:
Estimate Std. Error t value Pr(>|t|)
rmrf 1.16662
0.02535
46.01
<2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 2.836 on 609 degrees of freedom
Multiple R-squared: 0.7766,
Adjusted R-squared: 0.7763
F-statistic: 2117 on 1 and 609 DF, p-value: < 2.2e-16

An Introduction to Linear Regression

21

How to produce results more appealing to read


The three preceding outputs are useful to separately interpret the three models we
had to estimate, regarding respectively the food, durables and construction industries.
We can present the results in a more efficient way to compare the three models,
by making use of the function mtable that is available in the package memisc.
The arguments to pass to mtable are the three objects we obtained applying the
instruction linear model lm to the food, durables and construction industries.
> library(memisc)
> mtable(regr2.3f, regr2.3d, regr2.3c)
Calls:
regr2.3f: lm(formula = foodrf ~ -1 + rmrf, data = capm)
regr2.3d: lm(formula = durblrf ~ -1 + rmrf, data = capm)
regr2.3c: lm(formula = constrrf ~ -1 + rmrf, data = capm)
=============================================
regr2.3f regr2.3d regr2.3c
--------------------------------------------rmrf
0.758*** 1.047*** 1.167***
(0.026)
(0.028)
(0.025)
--------------------------------------------R-squared
0.586
0.700
0.777
adj. R-squared
0.586
0.700
0.776
sigma
2.884
3.105
2.836
F
863.524 1424.100 2117.287
p
0.000
0.000
0.000
Log-likelihood -1511.236 -1556.104 -1500.924
Deviance
5066.744 5869.713 4898.298
AIC
3026.472 3116.207 3005.847
BIC
3035.299 3125.034 3014.674
N
610
610
610
=============================================

22

An Introduction to Linear Regression

We can change the title and the labels in the preceding table, specify which statistics
have to appear in the final part of the table, and also relabel the name of the
independent variable rmrf:
> mtable2.3fdc <- mtable(Food = regr2.3f, Durables = regr2.3d,
Construction = regr2.3c, summary.stats = c("R-squared",
"sigma"))
> mtable2.3fdc <- relabel(mtable2.3fdc, rmrf = "excess market return")
> mtable2.3fdc
Calls:
Food: lm(formula = foodrf ~ -1 + rmrf, data = capm)
Durables: lm(formula = durblrf ~ -1 + rmrf, data = capm)
Construction: lm(formula = constrrf ~ -1 + rmrf, data = capm)
============================================================
Food
Durables
Construction
-----------------------------------------------------------excess market return
0.758***
1.047***
1.167***
(0.026)
(0.028)
(0.025)
-----------------------------------------------------------R-squared
0.586
0.700
0.777
sigma
2.884
3.105
2.836
============================================================
Evaluation of the uncentered R2 s
According to relationship (2.43) in Verbeek the uncentered R2 s is to be evaluated
when a linear model has no intercept. The uncentered R2 s are automatically produced
by R for the three models and figure in the previous output as R-squared (the R
software takes into account the information that the models are constrained).
> 1
[1]
> 1
[1]
> 1
[1]

- sum(regr2.3f$residuals^2)/sum(capm$foodrf^2)
0.5864245
- sum(regr2.3d$residuals^2)/sum(capm$durblrf^2)
0.7004574
- sum(regr2.3c$residuals^2)/sum(capm$constrrf^2)
0.7766193

An Introduction to Linear Regression

2.3.2

23

Testing an hypothesis on 1

To test if the coefficients 1 in the linear models (2.3)-(2.5) can be assumed different
from 1 we have to evaluate the statistic:
1 1
.
se(1 )
The estimate of the variance of 1 may be obtained by using the instruction vcov,
which returns the covariance matrix of the parameter estimates. The matrix reduces
in the present case to a scalar, since we are considering a linear model with only one
predictor and without the constant term.
> vcov(regr2.3f)
rmrf
rmrf 0.0006649123
We can thus evaluate the above statistic for the three situations:
> sampletf <- (regr2.3f$coefficients[[1]] - 1)/vcov(regr2.3f)^0.5
> sampletd <- (regr2.3d$coefficients[[1]] - 1)/vcov(regr2.3d)^0.5
> sampletc <- (regr2.3c$coefficients[[1]] - 1)/vcov(regr2.3c)^0.5
and by using the code:
> paste("(Food) statistic: ", round(sampletf, 4),
"
p-value: ", round(2 * (1 - pt(abs(sampletf),
regr2.3f$df)), 4))
> paste("(Durables) statistic: ", round(sampletd,
4), "
p-value: ", round(2 * (1 - pt(abs(sampletd),
regr2.3d$df)), 4))
> paste("(Construction) statistic: ", round(sampletc,
4), "
p-value: ", round(2 * (1 - pt(abs(sampletc),
regr2.3c$df)), 4))
we obtain
[1] "(Food) statistic:
-9.3951
p-value: 0"
[1] "(Durables) statistic:
1.7065
p-value: 0.0884"
[1] "(Construction) statistic:
6.5719
p-value: 0"
The function linearHypothesis in the package car performs directly an F test. The
first argument is the lm object and the second one specifies the hypothesis to be tested
in matrix or symbolic form (see the help ?car::linearHypothesis).
Observe that the values of the statistic F are equal to the squared values of the t
statistics obtained above, while the p-values do coincide, since the proposed tests are
similar.
> library(car)
> linearHypothesis(regr2.3f, "rmrf=1")

24

An Introduction to Linear Regression

Linear hypothesis test


Hypothesis:
rmrf = 1
Model 1: restricted model
Model 2: foodrf ~ -1 + rmrf
Res.Df
RSS Df Sum of Sq
F
Pr(>F)
1
610 5801.1
2
609 5066.7 1
734.37 88.268 < 2.2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
> linearHypothesis(regr2.3d, "rmrf=1")
Linear hypothesis test
Hypothesis:
rmrf = 1
Model 1: restricted model
Model 2: durblrf ~ -1 + rmrf
Res.Df
RSS Df Sum of Sq
F Pr(>F)
1
610 5897.8
2
609 5869.7 1
28.067 2.912 0.08843 .
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
> linearHypothesis(regr2.3c, "rmrf=1")
Linear hypothesis test
Hypothesis:
rmrf = 1
Model 1: restricted model
Model 2: constrrf ~ -1 + rmrf
Res.Df
RSS Df Sum of Sq
F
Pr(>F)
1
610 5245.7
2
609 4898.3 1
347.39 43.19 1.068e-10 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1

An Introduction to Linear Regression

2.3.3

25

CAPM regressions (with intercept) (Table 2.4)

In Verbeek it is then considered the parameter estimation of the following three linear
regression models:
foodrf = 1 + 2 rmrf + ERROR

(2.6)

durblrf = 1 + 2 rmrf + ERROR

(2.7)

constrrf = 1 + 2 rmrf + ERROR

(2.8)

>
>
>
>
>

regr2.4f <- lm(foodrf ~ rmrf, data = capm)


regr2.4d <- lm(durblrf ~ rmrf, data = capm)
regr2.4c <- lm(constrrf ~ rmrf, data = capm)
library(memisc)
mtable2.4 <- mtable(Food = regr2.4f, Durables = regr2.4d,
Construction = regr2.4c, summary.stats = c("R-squared",
"sigma"))
> mtable2.4 <- relabel(mtable2.4, "(Intercept)" = "constant",
rmrf = "excess market return")
> mtable2.4
Calls:
Food: lm(formula = foodrf ~ rmrf, data = capm)
Durables: lm(formula = durblrf ~ rmrf, data = capm)
Construction: lm(formula = constrrf ~ rmrf, data = capm)
============================================================
Food
Durables
Construction
-----------------------------------------------------------constant
0.325**
-0.131
-0.073
(0.117)
(0.126)
(0.115)
excess market return
0.751***
1.050***
1.168***
(0.026)
(0.028)
(0.025)
-----------------------------------------------------------R-squared
0.583
0.700
0.776
sigma
2.869
3.104
2.837
============================================================

26

An Introduction to Linear Regression

2.3.4

CAPM regressions (with intercept and January dummy) (Table


2.5)

The following models are considered to verify the presence of the January effect:
foodrf = 1 + 2 jan + 3 rmrf + ERROR

(2.9)

durblrf = 1 + 2 jan + 3 rmrf + ERROR

(2.10)

constrrf = 1 + 2 jan + 3 rmrf + ERROR

(2.11)

>
>
>
>
>

regr2.5f <- lm(foodrf ~ jan + rmrf, data = capm)


regr2.5d <- lm(durblrf ~ jan + rmrf, data = capm)
regr2.5c <- lm(constrrf ~ jan + rmrf, data = capm)
library(memisc)
mtable2.5 <- mtable(Food = regr2.5f, Durables = regr2.5d,
Construction = regr2.5c, summary.stats = c("R-squared",
"sigma"))
> mtable2.5 <- relabel(mtable2.5, "(Intercept)" = "constant",
rmrf = "excess market return", jan = "January dummy")
> mtable2.5
Calls:
Food: lm(formula = foodrf ~ jan + rmrf, data = capm)
Durables: lm(formula = durblrf ~ jan + rmrf, data = capm)
Construction: lm(formula = constrrf ~ jan + rmrf, data = capm)
============================================================
Food
Durables
Construction
-----------------------------------------------------------constant
0.397**
-0.143
-0.122
(0.121)
(0.132)
(0.120)
January dummy
-0.878*
0.139
0.604
(0.419)
(0.455)
(0.415)
excess market return
0.753***
1.050***
1.167***
(0.026)
(0.028)
(0.025)
-----------------------------------------------------------R-squared
0.586
0.700
0.776
sigma
2.861
3.107
2.835
============================================================

2.4

The Worlds Largest Hedge Found (Section 2.7.3)

Data are available in the file madoff.dat in the zip file ch02.zip.
> madoff <- read.table(unzip("ch02.zip", "Chapter 2/madoff.dat"),
header = T)

An Introduction to Linear Regression

27

The following variables are included:

fsl: return (in %) on Fairfield Sentry

fslrf: excess returns

rf: risk free rate

rmrf: excess return on the market portfolio

hml: excess return on the Fama-French value (high minus low) factor

smb: excess return on the Fama-French size (small minus big) factor

Verbeek observes that a simple inspection of the return series produces some
suspiciuos results that are evident by considering some summary statistics:

the mean and the standard deviation that can be obtained by using the functions
mean and sd
> mean(madoff$fsl)
[1] 0.8422326
> sd(madoff$fsl)
[1] 0.7086928

the number of months with a negative return computed by summing up the


elements of the logical variable resulting from madoff$fsl<0 (which results
TRUE = 1 when the return is negative)
> sum(madoff$fsl < 0)
[1] 16

and the fraction of months with a negative return over the whole considered
periods, that is the ratio between the last result we obtained and the length of
the series (number of periods)
> sum(madoff$fsl < 0)/length(madoff$fsl)
[1] 0.0744186

A CAPM analysis is then performed, see Verbeeks Table 2.6, by considering the
following linear model
fslrf = 1 + 2 rmrf + ERROR

> regr2.6 <- lm(fslrf ~ rmrf, data = madoff)


> summary(regr2.6)

28

An Introduction to Linear Regression

Call:
lm(formula = fslrf ~ rmrf, data = madoff)
Residuals:
Min
1Q
Median
-1.34773 -0.48005 -0.08337

3Q
0.38865

Max
2.97276

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.50495
0.04570 11.049 < 2e-16 ***
rmrf
0.04089
0.01072
3.813 0.00018 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.6658 on 213 degrees of freedom
Multiple R-squared: 0.06388,
Adjusted R-squared: 0.05949
F-statistic: 14.54 on 1 and 213 DF, p-value: 0.0001801

2.5

Dummy Variables Treatment and Multicollinearity


(Section 2.8.1)

With regard to the data set Wages in the USA it is now considered the parameter
estimation of the following three equivalent linear regression models4 :

4 Remind

WAGE = const + M MALE + ERROR

(2.12)

WAGE = const + F FEMALE + ERROR

(2.13)

WAGE = M MALE + F FEMALE + ERROR

(2.14)

that the parameters in the model


WAGE = const + M MALE + F FEMALE + ERROR,

where MALE is a dummy variable with values 0 and 1 and FEMALE satisfies FEMALE = 1 - MALE, are
not identified, since there is exact collinearity among the constant and the dummy variables MALE
and FEMALE; so one of the variables has to be omitted from the model.
In (2.12) the substitution FEMALE = 1 - MALE has been performed, so dropping the variable FEMALE:
(const + F ) + (M F ) MALE = const + M MALE,
In (2.13) the substitution MALE = 1 - FEMALE has been performed, so dropping the variable MALE
(const + M ) + (F M ) FEMALE = const + F FEMALE,
In (2.14) the identity FEMALE + MALE = 1 has been taken into account, it follows that
WAGE = const (MALE + FEMALE) + M MALE + F FEMALE + ERROR
and:
(const + M ) MALE + (const + F ) FEMALE = M MALE + F FEMALE
Finally we have:
const = F

and

const = M .

An Introduction to Linear Regression

29

to which correspond the following three model formulae.

WAGE ~ MALE

WAGE ~ I(1 - MALE)

WAGE ~ -1 + MALE + I(1 - MALE)

Remember that the dummy variable MALE assumes value 1 when the statistical unit
is male and 0 when she is female; so we can define a new dummy variable FEMALE as
1 - MALE.
To write the formula for the second regression model we have to use WAGE ~ I(1
- MALE), unless we do explicit define the new variable FEMALE <- 1 - MALE and use
it in the formula: WAGE ~ FEMALE, but this can be avoided.
Observe that with the function as is I() we specify to R that the difference sign
(-) is to be interpreted in the arithmetic sense and not in the formula sense, which
would drop the variable MALE from the model.
The function lm(WAGE ~ 1 - MALE, data = wages1) would namely result in a
model containing only the intercept, since the minus sign indicates to drop the variable
MALE from the model.
In specifying the third model the presence of the term -1 in the formula excludes
the intercept.
> wages1 <- read.table(unzip("ch02.zip", "Chapter 2/wages1.dat"),
header = T)
Regression 2.7A
> regr2.7A <- lm(WAGE ~ MALE, data = wages1)
> summary(regr2.7A)
Call:
lm(formula = WAGE ~ MALE, data = wages1)
Residuals:
Min
1Q Median
-6.160 -2.102 -0.554

3Q
Max
1.487 33.496

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.14692
0.08122
63.37
<2e-16 ***
MALE
1.16610
0.11224
10.39
<2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 3.217 on 3292 degrees of freedom
Multiple R-squared: 0.03175,
Adjusted R-squared: 0.03145
F-statistic: 107.9 on 1 and 3292 DF, p-value: < 2.2e-16
Regression 2.7B

30

An Introduction to Linear Regression

> regr2.7B <- lm(WAGE ~ I(1 - MALE), data = wages1)


> summary(regr2.7B)
Call:
lm(formula = WAGE ~ I(1 - MALE), data = wages1)
Residuals:
Min
1Q Median
-6.160 -2.102 -0.554

3Q
Max
1.487 33.496

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.31302
0.07747
81.50
<2e-16 ***
I(1 - MALE) -1.16610
0.11224 -10.39
<2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 3.217 on 3292 degrees of freedom
Multiple R-squared: 0.03175,
Adjusted R-squared: 0.03145
F-statistic: 107.9 on 1 and 3292 DF, p-value: < 2.2e-16
Regression 2.7C
> regr2.7C <- lm(WAGE ~ -1 + MALE + I(1 - MALE), data = wages1)
> summary(regr2.7C)
Call:
lm(formula = WAGE ~ -1 + MALE + I(1 - MALE), data = wages1)
Residuals:
Min
1Q Median
-6.160 -2.102 -0.554

3Q
Max
1.487 33.496

Coefficients:
Estimate Std. Error t value Pr(>|t|)
MALE
6.31302
0.07747
81.50
<2e-16 ***
I(1 - MALE) 5.14692
0.08122
63.37
<2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 3.217 on 3292 degrees of freedom
Multiple R-squared: 0.764,
Adjusted R-squared: 0.7638
F-statistic: 5328 on 2 and 3292 DF, p-value: < 2.2e-16

An Introduction to Linear Regression

31

Presenting the results in a nicer way


As we have already recalled in Section 2.3.1 we can produce an output to compare
the results for the three models, like Verbeeks Table 2.7, by having recourse to the
function mtable in the package memisc.
> library(memisc)
> mtable2.7 <- mtable(A = regr2.7A, B = regr2.7B, C = regr2.7C,
summary.stats = c("R-squared", "sigma"))
> mtable2.7 <- relabel(mtable2.7, "(Intercept)" = "constant",
MALE = "male", "I(1 - MALE)" = "female")
> mtable2.7
Calls:
A: lm(formula = WAGE ~ MALE, data = wages1)
B: lm(formula = WAGE ~ I(1 - MALE), data = wages1)
C: lm(formula = WAGE ~ -1 + MALE + I(1 - MALE), data = wages1)
========================================
A
B
C
---------------------------------------constant
5.147*** 6.313***
(0.081)
(0.077)
male
1.166***
6.313***
(0.112)
(0.077)
female
-1.166*** 5.147***
(0.112)
(0.081)
---------------------------------------R-squared
0.032
0.032
0.764
sigma
3.217
3.217
3.217
========================================

2.6

Missing Data, Outliers and Influential Observations

See Section 4.1.8.


The Least Absolute Deviation approach to parameter estimation has been
implemented in R by Koenker in the package quantreg, see Koenker (2012).

2.7

How to check the form of the distribution

In statistical analyses is important to check if data follow some distribution. For


example the classical assumptions on the linear model require that errors are
distributed according to a Normal random variable. Thus, after having estimated
a linear model, one has to check if this distributional hypothesis is not rejected for
the residuals. The same issue is present in the analysis of time series when e.g. a
distributional assumption on white noise is made, see Chapter 8.

32

An Introduction to Linear Regression

Later on let data be a series with elements x1 , . . . , xn . For the sake of simplicity
we work with data simulated from a normal distribution with mean equal to 50 and
unitary variance.
> set.seed(123)
> data <- rnorm(100) + 50
We first consider a graphical inspection of the distribution by plotting an histogram
of data with the theoretical density function of a Normal random variable; then the
2 goodness-of-fit test is introduced. The discussion will proceed by comparing the
empirical cumulative distribution function of data with the theoretical cumulative
distribution function of a Normal random variable; the Kolmogorov-Smirnov test is
based on this comparison. Two graphical tools, the QQ-plot and the PP-plot, will
be derived from the comparison of the empirical and the theoretical distribution
functions. All reasoning applies also in case we want to test some distributional
assumptions different than the Normal one.

2.7.1

Data histogram with the theoretical density function

We can obtain the histogram of data by using the function hist; pay attention to set
the argument freq = FALSE; in this way relative frequencies and relative densities,
in case of equal and different length intervals respectively, will be plotted
> data.hist <- hist(data, freq = FALSE)
We can add the density of a Normal distribution, by setting the mean and the standard
deviation arguments equal to the sample mean and the sample standard deviation
values of data, see Fig. 2.4.
> curve(dnorm(x, mean = mean(data), sd = sd(data)),
add = TRUE)

2.7.2

The 2 goodness-of-fit test

The object data.hist contains all information necessary to create the histogram.
Namely data.hist$breaks gives the limits of the intervals (classes) in the histogram,
and data.hist$counts the count corresponding to each class.
> data.hist$breaks
[1] 47.5 48.0 48.5 49.0 49.5 50.0 50.5 51.0 51.5 52.0 52.5
> data.hist$counts
[1] 1 3 10 11 23 22 13 9 5 3
We can thus build the following table by considering the same classes of the histogram
(the lowest and highest bounds of the histogram are replaced with and +
respectively)

An Introduction to Linear Regression

33

0.2
0.0

0.1

Density

0.3

0.4

Histogram of data

48

49

50

51

52

data

Figure 2.4 Histogram of data with the theoretical density function under the hypothesis
of normality

> data.hist$breaks[1] <- -Inf


> data.hist$breaks[length(data.hist$breaks)] <- Inf
> table <- cbind(inf = data.hist$breaks[-length(data.hist$breaks)],
sup=data.hist$breaks[-1],"observed count"=data.hist$counts,
"theoretical count" = sum(data.hist$counts) *
diff(pnorm(data.hist$breaks, mean = mean(data),
sd = sd(data))))
> table
inf sup observed count theoretical count
[1,] -Inf 48.0
1
1.100883
[2,] 48.0 48.5
3
2.971850
[3,] 48.5 49.0
10
7.540375
[4,] 49.0 49.5
11
14.275082
[5,] 49.5 50.0
23
20.167108
[6,] 50.0 50.5
22
21.262835
[7,] 50.5 51.0
13
16.730787

34

An Introduction to Linear Regression

[8,] 51.0 51.5


[9,] 51.5 52.0
[10,] 52.0 Inf

9
5
3

9.824401
4.304671
1.822008

The first two columns of table contain the class bounds zj1 and zj . The third column
contains the observed frequencies and the fourth column the theoretical frequencies
under the assumption of normality. These theoretical frequencies are obtained as n
pj ,
where the probabilities pj are defined as




zj1 x

zj x

pj =
s
s
, s2 the sample mean and the sample variance.
being zj1 and zj the class limits, and x
For testing the null hyphotesis of Normality we can have recourse to the 2
goodness-of-fit test, see Mood, Graybill and Boes (1974), which is based on the
statistic
k+1
X (nj n
pj )2
Q0k =
n
pj
j=1
where k + 1 is the number of the classes.
Q0k is distributed according to a 2k random variable with k degrees of freedom. With
reference to data we have
> (qstat <- sum((table[, 3] - table[, 4])^2/table[,
4]))
[1] 3.761746
with a corresponding p-value equal to
> 1 - pchisq(qstat, nrow(table) - 1)
[1] 0.9263825
so we will not reject the null hypothesis that the elements of data are distributed
according to a Normal random variable.

2.7.3

The Kolmorogov-Smirnov test

Let

#xi x
n
be the empirical cumulative distribution function (cdf) of data and F0 (x) a theoretical
cumulative distribution function, see Fig. 2.5 where the empirical cdf is the step
function and the theoretical cdf is the continuous one.
The Kolmogorov-Smirnov statistic to test the null hypothesis X F0 (), where
F0 () is some completely specified continuous cumulative distribution function is
Fn (x) =

Kn =

sup
<x<

|Fn (x) F0 (x)|.

(2.15)

0.6
0.4
0.0

0.2

Fn(x) and F0(x)

0.8

1.0

An Introduction to Linear Regression

47

48

49

50

51

52

35

53

Figure 2.5 Empirical cumulative distribution function (the step function) and the
theoretical distribution function under the null hypothesis of normality

This test can also be used to check if the observations in two data sets (x1 , . . . , xnx )
and (y1 , . . . , yny ) come from the same distribution; in this case F0 (x) is replaced with
the empirical cdf calculated on (y1 , . . . , yny ).
The Kolmogorov-Smirnov statistic is based on the maximum absolute distance
between the empirical cdf Fn () and the theoretical one F0 (), see Fig. 2.6.
> plot(ecdf(data), xlim = c(47, 53), cex = 0.5, main = "",
ylab = expression(F[n](x)~~and~~F[0](x)))
> curve(pnorm(x, mean = mean(data), sd = sd(data)),
add = TRUE)
> x <- sort(data)
> curve(ecdf(data)(x) - pnorm(x, mean = mean(data),
sd = sd(data)), n = 10000, xlim = c(47, 53),
ylim = c(-0.06, 0.06), ylab = "distance")
> abline(h = 0)

36

0.00
0.06

0.04

0.02

distance

0.02

0.04

0.06

An Introduction to Linear Regression

47

48

49

50

51

52

53

Figure 2.6 Distance between the empirical cumulative distribution function and the
theoretical distribution function under the null hypothesis of normality

The Kolmogorov-Smirnov test can be performed by having recourse to the function


ks.test, whose arguments are: x the data whose distribution we want to test; y
either a numeric vector of data values (in case one wants to compare y to x), or
a character string naming a cdf given by the user or one of the cdfs available in
R such as pnorm (only continuous cdfs are valid); ... are additional arguments
specifying the parameters of the distribution given (as a character string) by y;
alternative indicates the alternative hypothesis and must be one of "two.sided"
(default), "less", or "greater"; exact, which is NULL by default, can be a logical
indicating whether an exact p-value should be computed.
Relationship (2.15) makes reference to the two.sided alternative hypothesis. By
setting the option alternative = "less" the null hypothesis is specified as FX ()
FY (), that is X Y i.e. X is stochastically smaller than Y ; while if we set the option
alternative = "greater" the null hypothesis is specified as FX () FY (), that is
X Y i.e. X is stochastically greater than Y .
The corresponding Kolmogorov-Smirnov statistics are respectively

An Introduction to Linear Regression

D = sup<x< {FY (x) FX (x)} for the set of hypotheses



H0 : X Y i.e. FX (x) FY (x)
alternative
H1 : X > Y i.e. FX (x) < FY (x)
D+ = sup<x< {FX (x) FY (x)} for the set of hypotheses

H0 : X Y i.e. FX (x) FY (x)
alternative =
H1 : X < Y i.e. FX (x) > FY (x)

37

"less",

"greater",

max(D , D+ ) = sup<x< |FY (x) FX (x)| for the set of hypotheses



H0 : X = Y i.e. FX (x) = FY (x)
alternative = "two.sided".
H1 : X 6= Y i.e. FX (x) 6= FY (x)

By applying the function ks.test to our data we have


> ks.test(data, "pnorm", mean = mean(data), sd = sd(data))
One-sample Kolmogorov-Smirnov test
data: data
D = 0.0581, p-value = 0.8884
alternative hypothesis: two-sided
and according to this result we will not reject the null hypothesis of normality.

2.7.4

The PP-plot and the QQ-plot

We now focus our attention, following an idea suggested by Diego Zappa, on the
comparison of the empirical cdf with the theoretical cdf to obtain the so-called PPplot and QQ-plot, see Zappa, Bramante and Nai Ruscone (2012). Figure 2.7 shows a
zoom of the graph in Fig. 2.5.
Figure 2.7 can be obtained with the following code.
> xlim <- c(47, 50)
> xtextshift <- 0.15
> plot(ecdf(data), xlim = xlim, ylim = ecdf(data)(xlim),
xaxs = "i", yaxs = "i", main = "",
ylab = expression(F[n](x)~~and~~F[0](x)), cex = 0.5)
> curve(pnorm(x, mean = mean(data), sd = sd(data)),
add = TRUE, xlim = xlim)
> point <- data[which.max(abs(ecdf(data)(data) - pnorm(data,
mean = mean(data), sd = sd(data))))]
> arrows(point, 0, point, ecdf(data)(point), length = 0.1,
angle = 22)
> arrows(point, 0, point, pnorm(point, mean = mean(data),
sd = sd(data)), length = 0.1, angle = 22)

38

An Introduction to Linear Regression

> arrows(point, ecdf(data)(point), xlim[1], ecdf(data)(point),


length = 0.1, angle = 22)
> arrows(point, pnorm(point, mean = mean(data), sd = sd(data)),
xlim[1], pnorm(point, mean = mean(data), sd = sd(data)),
length = 0.1, angle = 22)
> text(point + 0.05, ecdf(data)(xlim[1]) + 0.01, expression(x[p]))
> text(xlim[1] + xtextshift, ecdf(data)(point) + 0.01,
expression(F[n](x[p])))
> text(xlim[1] + xtextshift, pnorm(point, mean = mean(data),
sd = sd(data)) + 0.01, expression(F[0](x[p])))
> point <- which(ecdf(data)(sort(data)) == 0.2)
> point <- (sort(data))[point - 6]
> fpoint <- ecdf(data)(point)
> arrows(xlim[1], fpoint, qnorm(fpoint, mean = mean(data),
sd = sd(data)), fpoint, length = 0.1, angle = 22)
> arrows(xlim[1], fpoint, point, fpoint, length = 0.1,
angle = 22)
> arrows(point, fpoint, point, ecdf(data)(xlim[1]),
length = 0.1, angle = 22)
> arrows(qnorm(fpoint, mean = mean(data), sd = sd(data)),
fpoint, qnorm(fpoint, mean = mean(data), sd = sd(data)),
ecdf(data)(xlim[1]), length = 0.1, angle = 22)
> text(xlim[1] + xtextshift/2, fpoint + 0.01, expression(tilde(p)))
> text(point - xtextshift, ecdf(data)(xlim[1]) + 0.01,
expression(x[tilde(p)]))
> text(qnorm(fpoint, mean = mean(data), sd = sd(data)) +
xtextshift, ecdf(data)(xlim[1]) + 0.01,
expression(x[0][tilde(p)]))
We have seen above that the Kolmogorov-Smirnov statistics are defined as a function
of the largest absolute, positive or negative difference between the two functions on
varying x.
For each xp in the ordered data set let p = Fn (xp ) be the value assumed by the
empirical cdf, see Fig. 2.7: xp is the p percentage point in the data.
Let now p = F0 (xp ) be the value assumed by the theoretical cdf in xp .
One way to compare the empirical cdf with the theoretical cdf is to obtain a scatter
plot diagram representing the pairs (p , p). This graphical representation is named
the probability-probability plot (PP-plot), see Fig. 2.8.
> p.orders <- 1:length(data)/length(data)
> plot(pnorm(sort(data), mean = mean(data), sd = sd(data)),
p.orders,pch=16, xlab = "p* (theoretical probabilities)",
ylab = "p (sample probabilities)")
> abline(0, 1)
Observe that if points are displaced on the straight line (0,0), (1,1) then F0 could
represent the data generating model. The PP-plot is particularly effective in detecting

An Introduction to Linear Regression

39

Fn(xp)

0.4

F0(xp)

0.3

0.2

Fn(x) and F0(x)

~
p

0.1

0.0
47.0

47.5

48.0

x0p~

x~p

48.5

49.0

xp
49.5

50.0

Figure 2.7 Empirical cumulative distribution function and theoretical cumulative


distribution function: introduction to the PP-plot and QQ-plot graphical representations

deviatons from F0 in regions of high probability density (typically in the middle of


the distribution), see Section 2.9.
A dual way to compare the empirical cdf with the theoretical cdf is to start from a
generic value p assumed by the empirical cdf, see Fig. 2.7. We have two inverse images
of p: the value xp whose image through the empirical cdf, Fn (x), is p and the value
x0p which has image p by using the theoretical cdf F0 . The scatter plot diagram of
the pairs (x0p, xp) is named Quantile-Quantile Plot (QQ-plot), see Fig. 2.9.
> plot(qnorm(p.orders, mean = mean(data), sd = sd(data)),
sort(data), pch = 16,
xlab = expression(x[0][tilde(p)]~~(theoretical~quantiles)),
ylab = expression(x[tilde(p)]~~(sample~quantiles)))
> abline(0, 1)
The same graph can be obtained by applying to data the function qqnormAlso in
this case if points are on a straight line then F0 could represent the data generating
model. The QQ-plot is particularly effective in detecting deviations from F0 on the
tails of the distribution, see Section 2.9.

40

0.6
0.4
0.0

0.2

p (sample probabilities)

0.8

1.0

An Introduction to Linear Regression

0.0

0.2

0.4

0.6

0.8

1.0

p* (theoretical probabilities)

Figure 2.8

2.7.5

PP-plot

Use of the function fit.cont

The function fit.cont, available in the package rriskDistributions, gives


several goodness-of-fit statistics (loglikelihood, AIC, BIC, Chi-squared, AndersonDarling and Kolmogorov-Smirnov) to check if the data follow some theoretical
cdf. The Beta, Cauchy, chi-square, non-central chi-square, exponential, F, gamma,
Gompertz, hypergeometric, lognormal, logistic, negative binomial, Normal, pert,
Poisson, Students t, truncated normal, triangular, uniform and Weibull model are
implemented.
A theoretical cdf appears in the output only when the procedure succeeds in
estimating its parameters otherwise a warning message is returned. Have a look at
the help system for more information.
The function fit.cont also produces the histogram with the theoretical density,
the QQ-plot, the empirical and theoretical cdfs and the PP-plot, see Fig. 2.10.
We observe that other statistical softwares draw the PP- and QQ-plots by switching
the x and y axes, so theoretical probabilities and theoretical quantiles will appear on
the y axis.

An Introduction to Linear Regression

41

52

49

50

x~p (sample quantiles)

51

48

48

49

50

51

52

x0p~ (theoretical quantiles)

Figure 2.9

2.8

QQ-plot

Two tests for assessing normality

We consider two tests for assessing the normality distributional assumption.

2.8.1

The Jarque-Bera test

The Jarque-Bera test, see Jarque and Bera (1987), is obtained as a Lagrange
Multiplier statistic, see Verbeeks Chapter 6, and has the following forms:

in case of a sample of n observations (x1 , . . . , xn ) the Jarque-Bera statistic is


defined as:


(b1 )2
(b2 3)2
+
JB = n
6
24
where:
b1 =

,
3/2

b2 =

4
,

22

j =

1X
(xi x
)j
n i=1

and x
=

1X
xi .
n i=1

42

An Introduction to Linear Regression

Figure 2.10

Fitting a continuous distribution by using the function fit.cont

Observe that b1 and b2 are respectively the skewness and kurtosis sample
coefficients, which are null under the normality assumption.

An Introduction to Linear Regression

43

in case of a sample of n OLS residuals (e1 , . . . , en ) the Jarque-Bera statistic is


defined as:
"
2 #


 2
1

23
4
3
1

3
1
+

3
JB = n
+
n

32
2
6
24
22
2

22
where:
n

j =

1X j
e .
n i=1 i

When the linear model includes a constant the residuals have zero mean, that
is
1 = 0, and the Jarque-Bera statistics reduces to the former definition.
In both cases the Jarque-Bera statistic is distributed as a 22 random variable with 2
degrees of freedom.
In the package tseries the function jarque.bera.test is available to perform the
Jarque-Bera test on a set of observations. By applying it on data we obtain
> library(tseries)
> jarque.bera.test(data)
Jarque Bera Test
data: data
X-squared = 0.1691, df = 2, p-value = 0.9189
and the null hypothesis of normality will not be rejected.

2.8.2

The Shapiro-Wilk test

The Shapiro-Wilk normality test, see Shapiro and Wilk (1965), is implemented in the
function shapiro.test; applying this function to data we obtain
> shapiro.test(data)
Shapiro-Wilk normality test
data: data
W = 0.9939, p-value = 0.9349
which does not reject the null hypothesis of normality.

2.9

Some further comments on the QQ-plot

We now consider the behaviour of the QQ-plot (and of the PP-plot), under the null
hypothesis of normality, in presence of data characterized by skewness, leptokurtic
and platikurtic behaviour.

44

An Introduction to Linear Regression

2.9.1

Positively skewed distributions

Let X be distributed according to a Gamma distribution. The density function is


f (x; , ) =

1 1 x
x
e
I(0,) (x),
()

> 0, > 0

and we have E(X) = and V ar(X) = 2 .


Figure 2.11 shows the density functions and the cdfs of a Gamma random variable,
X, with parameters = 4 and = 2 and of a Normal random variable, Y , with mean
4
4
2 = 2 and variance 22 = 1
>
>
>
>
>
>
>
>
>
>
>
>

layout(1:2)
par(mai = c(0.5, 0.82, 0.1, 0.42))
alpha = 4
lambda = 2
curve(dgamma(x, alpha, lambda), xlim = c(-2, 6),
ylab = expression(f[X](x)~~and~~f[Y](x)))
curve(dnorm(x, mean = alpha/lambda), add = TRUE)
text(0.75, 0.4, expression(f[X](x)), cex = 0.75)
text(3, 0.35, expression(f[Y](x)), cex = 0.75)
curve(pgamma(x, alpha, lambda), xlim = c(-2, 6),
ylab = expression(F[X](x)~~and~~F[Y](x)))
curve(pnorm(x, mean = alpha/lambda), add = TRUE)
text(2, 0.75, expression(F[X](x)), cex = 0.75)
text(2, 0.35, expression(F[Y](x)), cex = 0.75)

We can establish the behaviour of the PP- and QQ-plots by considering the cumulative
distribution functions as was shown in Section 2.7.4
>
>
>
>

>
>
>

>
>
>

layout(1:2)
par(mai = c(0.9, 0.82, 0.1, 0.42))
x <- seq(-2, 6, length = 500)
plot(pnorm(x, mean = alpha/lambda), pgamma(x, alpha,
lambda), type = "l", xaxs = "i", yaxs = "i",
xlab = "theoretical probabilities",
ylab = "sample probabilities",
ylim = c(0, 1))
abline(0, 1)
x <- seq(0, 1, length = 1000)
plot(qnorm(x, mean = alpha/lambda), qgamma(x, alpha,
lambda), xlim = c(-2, 6), ylim = c(-2, 6), type = "l",
xlab = "theoretical quantiles", ylab = "sample quantiles")
abline(0, 1)
text(-0.75, 1.5, "left tail thinner than the normal tail",
cex = 0.75)
text(3, 5.5, "right tail fatter than the normal tail",
cex = 0.75)

An Introduction to Linear Regression

45

In this situation the left tail of X is thinner than that of Y while the right tail of X
is fatter than that of Y . Thus the quantiles on the tails of the two distributions will
have the following behaviour: for any given p (close to 0 or to 1) the quantiles of X
are larger than those of Y . The behaviour is evident by examining the QQ plot.
The PP-plot clearly detects a different behaviour of the two distributions in the
middle of the domain.
We now apply the function fit.cont to some simulated data, see Fig. 2.15.
> set.seed(123)
> skew.data <- rgamma(100, alpha, lambda)
> library(rriskDistributions)
> fit.cont(skew.data)

2.9.2

Negatively skewed distributions

Figure 2.12 shows the density functions and the cdfs of X = W , being W the Gamma
random variable with parameters = 4 and = 2 considered in the previous section,
and of a Normal random variable, Y , with mean 24 = 2 and variance 242 = 1
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>

layout(1:2)
par(mai = c(0.5, 0.82, 0.1, 0.42))
alpha = 4
lambda = 2
curve(dgamma(-x, alpha, lambda), xlim = c(-6, 2),
ylab = expression(f[X](x)~~and~~f[Y](x)))
curve(dnorm(x, mean = -alpha/lambda), add = TRUE)
text(-0.75, 0.4, expression(f[X](x)), cex = 0.75)
text(-3, 0.35, expression(f[Y](x)), cex = 0.75)
curve(1 - pgamma(-x, alpha, lambda), xlim = c(-6,
2), ylab = expression(F[X](x)~~and~~F[Y](x)))
curve(pnorm(x, mean = -alpha/lambda), add = TRUE)
text(-1.75, 0.75, expression(F[X](x)), cex = 0.75)
text(-1.75, 0.35, expression(F[Y](x)), cex = 0.75)

layout(1:2)
par(mai = c(0.9, 0.82, 0.1, 0.42))
x <- seq(-6, 2, length = 500)
plot(pnorm(x, mean = -alpha/lambda), 1 - pgamma(-x,
alpha, lambda), type = "l", xaxs = "i", yaxs = "i",
xlab = "theoretical probabilities",
ylab = "sample probabilities",
ylim = c(0, 1))
> abline(0, 1)
> x <- seq(0, 1, length = 1000)
> plot(qnorm(x, mean = -alpha/lambda), -qgamma(1 x, alpha, lambda), xlim = c(-6, 2), ylim = c(-6,

46

An Introduction to Linear Regression

2), type = "l", xlab = "theoretical quantiles",


ylab = "sample quantiles")
> abline(0, 1)
> text(-2.5, -5, "left tail fatter than the normal tail",
cex = 0.75)
> text(0.75, -1.75, "right tail thinner than the normal tail",
cex = 0.75)
In this situation the left tail of X is fatter than that of Y while the right tail of X is
thinner than that of Y . Thus the quantiles on the tails of the two distributions will
have the following behaviour: for any given p (close to 0 or to 1) the quantiles of X
are smaller than those of Y . The behaviour is evident by examining the QQ plot.
As above the PP-plot clearly detects a different behaviour of the two distributions
in the middle of the domain.
We apply the function fit.cont to some simulated data, see Fig. 2.16.
> set.seed(123)
> skew.data <- -rgamma(100, alpha, lambda)
> library(rriskDistributions)
> fit.cont(skew.data)

2.9.3

Leptokurtic distributions

Let X be distributed according to a tk distribution with k degrees of freedom. We


k
.
have E(X) = 0 and V ar(X) = k2
The t distributions is used in finance since it is able to capture the fatter tails which
characterize the residuals distribution.
Figure 2.13 shows the density functions and the cdfs of a t random variable with
k = 4 degrees of freedom and of a Normal random variable, Y , with mean 0 and
variance 64 = 1.5
>
>
>
>
>
>
>
>
>
>
>

layout(1:2)
par(mai = c(0.5, 0.82, 0.1, 0.42))
k = 4
curve(dt(x, k), xlim = c(-8, 8),
ylab = expression(f[X](x)~~and~~f[Y](x)))
curve(dnorm(x, mean = 0, sd = (k/(k - 2))^0.5), add = TRUE)
text(0.75, 0.35, expression(t[4]), cex = 0.75)
text(0, 0.24, "normal", cex = 0.75)
curve(pt(x, k), xlim = c(-8, 8),
ylab = expression(F[X](x)~~and~~F[Y](x)))
curve(pnorm(x, mean = 0, sd = (k/(k - 2))^0.5), add = TRUE)
text(0, 0.2, expression(F[X](x)), cex = 0.75)
text(1.5, 0.7, expression(F[Y](x)), cex = 0.75)

An Introduction to Linear Regression

>
>
>
>

>
>
>

>
>
>

47

layout(1:2)
par(mai = c(0.9, 0.82, 0.1, 0.42))
x <- seq(-8, 8, length = 500)
plot(pnorm(x, mean = 0, sd = (k/(k - 2))^0.5), pt(x,
k), type = "l", xaxs = "i", yaxs = "i",
xlab = "theoretical probabilities",
ylab = "sample probabilities", ylim = c(0, 1))
abline(0, 1)
x <- seq(0, 1, length = 1000)
plot(qnorm(x, mean = 0, sd = (k/(k - 2))^0.5), qt(x,
k), xlim = c(-8, 8), ylim = c(-8, 8), type = "l",
xlab = "theoretical quantiles", ylab = "sample quantiles")
abline(0, 1)
text(-1.25, -7.5, "left tail fatter than the normal tail",
cex = 0.75)
text(1, 7.5, "right tail fatter than the normal tail",
cex = 0.75)

In this situation the tails of X are fatter than those of Y . Thus the quantiles on the
tails of the two distributions will have the following behaviour: for any given p close
to 0 the quantiles of X are smaller than those of Y ; for any given p close to 1 the
quantiles of X are larger than those of Y . The behaviour is evident by examining the
QQ plot.
The density functions are now symmetric and thus the PP-plot intersect the 0-1 line
at the center of the distributions; however it always can detect the different behaviour
of the two distributions in the middle of their domain.
We apply the function fit.cont to some simulated data, see Fig. 2.17.
> set.seed(123)
> leptokurtic.data <- rt(100, k)
> library(rriskDistributions)
> fit.cont(leptokurtic.data)

2.9.4

Platikurtic distributions

Let W be distributed according to a uniform distribution. We have E(X) = 0.5 and


1
.
V ar(X) = 12
Figure 2.14 shows the density functions and the cdfs of X and of a Normal random
1
variable, Y , with mean 0.5 and variance 12
> layout(1:2)
> par(mai = c(0.5, 0.82, 0.1, 0.42))
> curve(dunif(x), xlim = c(-1, 2), ylim = c(0, 1.5),
ylab = expression(f[X](x)~~and~~f[Y](x)))
> curve(dnorm(x, mean = 0.5, sd = 1/12^0.5), add = TRUE)
> text(-0.1, 1, expression(f[X](x)), cex = 0.75)
> text(0.75, 1.25, expression(f[Y](x)), cex = 0.75)

48

An Introduction to Linear Regression

> curve(punif(x), xlim = c(-1, 2),


ylab = expression(F[X](x)~~and~~F[Y](x)))
> curve(pnorm(x, mean = 0.5, sd = 1/12^0.5), add = TRUE)
> text(0.9, 0.75, expression(F[X](x)), cex = 0.75)
> text(0.4, 0.2, expression(F[Y](x)), cex = 0.75)
>
>
>
>

>
>
>

>
>
>

layout(1:2)
par(mai = c(0.9, 0.82, 0.1, 0.42))
x <- seq(-1, 2, length = 500)
plot(pnorm(x, mean = 0.5, sd = 1/12^0.5), punif(x),
type = "l", xaxs = "i", yaxs = "i",
xlab = "theoretical probabilities",
ylab = "sample probabilities", ylim = c(0, 1))
abline(0, 1)
x <- seq(0, 1, length = 1000)
plot(qnorm(x, mean = 0.5, sd = 1/12^0.5), qunif(x),
xlim = c(-1, 2), ylim = c(-1, 2), type = "l",
xlab = "theoretical quantiles", ylab = "sample quantiles")
abline(0, 1)
text(-0.5, 0.5, "left tail thinner than the normal tail",
cex = 0.75)
text(1.5, 0.5, "right tail thinner than the normal tail",
cex = 0.75)

In this situation the tails of Y are fatter than those of X. Thus the quantiles on the
tails of the two distributions will have the following behaviour: for any given p close
to 0 the quantiles of X are larger than those of Y ; for any given p close to 1 the
quantiles of X are smaller than those of Y . The behaviour is evident by examining
the QQ plot.
As above the density functions are now symmetric and thus the PP-plot intersect
the 0-1 line at the center of the distributions; however it can detect the different
behaviour of the two distributions in the middle of their domain.
We apply the function fit.cont to some simulated data, see Fig. 2.18.
> set.seed(123)
> platikurtic.data <- runif(100)
> library(rriskDistributions)
> fit.cont(platikurtic.data)

An Introduction to Linear Regression

49

0.4

fX(x)

0.3
0.2
0.0

0.1

fX(x) and fY(x)

fY(x)

1.0

0.8

0.4

0.6

FX(x)

FY(x)

0.0

0.2

FX(x) and FY(x)

0.8
0.6
0.4
0.2
0.0

sample probabilities

1.0

0.2

0.4

0.6

0.8

theoretical probabilities

4
2
0

left tail thinner than the normal tail

sample quantiles

right tail fatter than the normal tail

theoretical quantiles

Figure 2.11 Density and cumulative distribution functions of a positively skewed


distribution (Gamma( = 4, = 2)) and of a Normal random variable. Theoretical PP-plot
and QQ-plot for the comparison of the two distributions

50

An Introduction to Linear Regression

0.4

fX(x)

0.3
0.2
0.0

0.1

fX(x) and fY(x)

fY(x)

1.0

0.8

0.4

0.6

FX(x)

FY(x)

0.0

0.2

FX(x) and FY(x)

0.8
0.6
0.4
0.2
0.0

sample probabilities

1.0

0.2

0.4

0.6

0.8

0
2

right tail thinner than the normal tail

sample quantiles

theoretical probabilities

left tail fatter than the normal tail

theoretical quantiles

Figure 2.12 Density and cumulative distribution functions of a negatively skewed


distribution and of a Normal random variable. Theoretical PP-plot and QQ-plot for the
comparison of the two distributions

An Introduction to Linear Regression

51

0.1

0.2

normal

0.0

fX(x) and fY(x)

0.3

t4

1.0

0.4

0.6

FY(x)

FX(x)

0.0

0.2

FX(x) and FY(x)

0.8

0.8
0.6
0.4
0.2
0.0

sample probabilities

1.0

0.0

0.2

0.4

0.6

0.8

1.0

theoretical probabilities

0
5

sample quantiles

right tail fatter than the normal tail

left tail fatter than the normal tail

theoretical quantiles

Figure 2.13 Density and cumulative distribution functions of a t4 random variable with
4 degrees of freedom (leptokurtic distribution) and a Normal random variable. Theoretical
PP-plot and QQ-plot for the comparison of the two distributions

52

1.5

An Introduction to Linear Regression

0.5

1.0

fX(x)

0.0

fX(x) and fY(x)

fY(x)

0.5

0.0

0.5

1.0

1.0

1.5

2.0

1.5

2.0

0.8

0.4

0.6

FX(x)

FY(x)

0.0

0.2

FX(x) and FY(x)

1.0

0.5

0.0

0.5

1.0

0.8
0.6
0.4
0.2
0.0

sample probabilities

1.0

1.0

0.2

0.4

0.6

0.8

0.0 0.5 1.0 1.5 2.0

right tail thinner than the normal tail

left tail thinner than the normal tail

1.0

sample quantiles

theoretical probabilities

1.0

0.5

0.0

0.5

1.0

1.5

2.0

theoretical quantiles

Figure 2.14 Density and cumulative distribution functions of a uniform random variable
(platikurtic distribution) and a Normal random variable. Theoretical PP-plot and QQ-plot
for the comparison of the two distributions

An Introduction to Linear Regression

53

Figure 2.15
fit.cont

Fitting positively skewed data, see Section 2.9.1, by using the function

Figure 2.16
fit.cont

Fitting
negatively skewed data, see Section 2.9.2, by using the function

54

An Introduction to Linear Regression

Figure 2.17

Fitting leptokurtic data, see Section 2.9.3, by using the function fit.cont

Figure 2.18

Fitting platikurtic data, see Section 2.9.4, by using the function fit.cont

3
Interpreting and comparing Linear
Regression Models
3.1

Explaining House Prices (Section 3.4)

The variable names in the dat file housing.dat provided in the zip file ch03.zip are
not stored only in the first line (the last two names are reported on the second line
of the text file), so if one desires to read data with the command read.table, see
Section 2.1, he has to first settle, by using a text editor, the variable names in the
housing.dat file only on the first line.
Here, we import data from the file housing.dta, which is saved in the Stata format.
We have first to invoke the package foreign and next the command read.dta.
Remember that the function unzip extracts a file from a compressed archive.
> library(foreign)
> housing <- read.dta(unzip("ch03.zip", "Chapter 3/housing.dta"))
Recall that it is possible to explore the beginning section and the final section of the
data.frame and to obtain summary statistics for all the variables included in the
data-frame by using the functions head(), tail() and summary().
The first linear model proposed by Verbeek studies the interpretation of
log(price) as a function of the log(lotsize), the number of bedrooms, the number
of bathrooms and the presence of air conditioning. The corresponding parameter
estimates may be obtained by using the command lm; the log transformation on
the specified variables can also be performed without applying the function as is
I(log()) (see Section 2.5) since, for the logarithm case, there is no ambiguity
between the use of mathematical and symbolic operators proper of the formula
function. See Appendix A.4.
> regr3.1 <- lm(log(price) ~ log(lotsize) + bedrooms +
bathrms + airco, data = housing)
> summary(regr3.1)
Call:
lm(formula = log(price) ~ log(lotsize) + bedrooms + bathrms +
airco, data = housing)

56

Interpreting and comparing Linear Regression Models

Residuals:
Min
1Q
-0.81782 -0.15562

Median
0.00778

3Q
0.16468

Max
0.84143

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
7.09378
0.23155 30.636 < 2e-16 ***
log(lotsize) 0.40042
0.02781 14.397 < 2e-16 ***
bedrooms
0.07770
0.01549
5.017 7.11e-07 ***
bathrms
0.21583
0.02300
9.386 < 2e-16 ***
airco
0.21167
0.02372
8.923 < 2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.2456 on 541 degrees of freedom
Multiple R-squared: 0.5674,
Adjusted R-squared: 0.5642
F-statistic: 177.4 on 4 and 541 DF, p-value: < 2.2e-16
The estimate of s = 0.2456 (Residual standard error) is also available by invoking
the instruction summary(regr3.1)$sigma:
The expected log(price) of an house with specific characteristic may be obtained
by applying the function predict: the first argument is the lm object containing
the parameter estimates to use in the prediction; the second argument specifies the
values of the regressors, for which the prediction of the response is desired, in form
of data.frame.
> predict(regr3.1, data.frame(lotsize = 5000, bedrooms = 4,
bathrms = 1, airco = 0))
1
11.03088
One may obtain1 the prediction of the price by calculating the exp of the preceding
value or directly by means of the following expression:
> exp(predict(regr3.1, data.frame(lotsize = 5000, bedrooms = 4,
bathrms = 1, airco = 0)))
1
61751.63
To include one half of the residual variance s2 in the prediction use:
> exp(predict(regr3.1, data.frame(lotsize = 5000, bedrooms = 4,
bathrms = 1, airco = 0)) + summary(regr3.1)$sigma^2/2)
1
63641.78
1 Values

different from those proposed by Verbeek are due to approximations.

Interpreting and comparing Linear Regression Models

57

Verbeek observes how the average of predicted prices, which can be extracted from
the lm object regr3.1 as regr3.1$fitted
> mean(exp(regr3.1$fitted))
[1] 66152.74
underestimates the sample average of observed prices
> mean(housing$price)
[1] 68121.6
concluding that the bias can be reduced by adding the half-variance term
> mean(exp(regr3.1$fitted + summary(regr3.1)$sigma^2/2))
[1] 68177.61

3.1.1

Testing the functional form: construction of the RESET test

According to the RESET test procedure for checking the functional form of the
preceding model, one has to include, as predictors in the model specification, the
first Q powers, e.g. the second and the third ones, of the values of y estimated by the
model. The fitted y values may be obtained with the instruction:
> yhat <- predict(regr3.1)
Observe that when no value of the regressors is given as argument of predict, the
prediction is made for the data set used to estimate the parameters in the linear
model: in this way the fitted y values are obtained.
Two ways exist to include additional terms into a formula defining a linear model:2

by invoking lm and including the specified terms in the formula:


> regr3.1RESET2 <- lm(log(price) ~ log(lotsize) + bedrooms +
bathrms + airco + I(yhat^2) + I(yhat^3), data = housing)

we can modify the formula in the basic model by using the function update.
In the sequel we will follow this latter solution.

The function update has three arguments. The first is the object to update, which is
an object of class lm, that is the result of a preceding linear model call. The second one
is the updating formula: the dot stands for the same elements, so .~. means both
members of the invoked model in their original form. The additional term squared
predicted values is then included in the formula. The third optional3 argument is the
data.frame the variables involved in the linear model refer to.
2 Remind to use the as.is I() operator to specify the powers of the regressors, since is a symbol
proper of the formula method, see Appendix A.4.
3 In this way it is possible to update an existing formula and apply it to a new data.frame

58

Interpreting and comparing Linear Regression Models

> regr3.1RESET2 <- update(regr3.1, . ~ . + I(yhat^2))


> summary(regr3.1RESET2)
Call:
lm(formula = log(price) ~ log(lotsize) + bedrooms + bathrms +
airco + I(yhat^2), data = housing)
Residuals:
Min
1Q
-0.81468 -0.15694

Median
0.00836

3Q
0.16274

Max
0.84243

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
5.00888
4.05883
1.234
0.218
log(lotsize) -0.13381
1.03870 -0.129
0.898
bedrooms
-0.02570
0.20157 -0.128
0.899
bathrms
-0.07774
0.57105 -0.136
0.892
airco
-0.07225
0.55235 -0.131
0.896
I(yhat^2)
0.06032
0.11724
0.515
0.607
Residual standard error: 0.2457 on 540 degrees of freedom
Multiple R-squared: 0.5676,
Adjusted R-squared: 0.5636
F-statistic: 141.8 on 5 and 540 DF, p-value: < 2.2e-16
From the p-value of I(yhat^2)) one can observe that the coefficient of the squared
predicted values is not significantly different from 0.
We have now to include also the third power of the predicted values: we update the
latter model regr3.1RESET2.
> regr3.1RESET3 <- update(regr3.1RESET2, . ~ . + I(yhat^3))
> summary(regr3.1RESET3)
Call:
lm(formula = log(price) ~ log(lotsize) + bedrooms + bathrms +
airco + I(yhat^2) + I(yhat^3), data = housing)
Residuals:
Min
1Q
-0.81241 -0.15526

Median
0.00843

3Q
0.15948

Max
0.84892

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -274.5008
300.6983 -0.913
0.362
log(lotsize) -33.4090
35.8094 -0.933
0.351
bedrooms
-6.4829
6.9490 -0.933
0.351
bathrms
-18.0151
19.3038 -0.933
0.351
airco
-17.6684
18.9363 -0.933
0.351
I(yhat^2)
7.4812
7.9835
0.937
0.349
I(yhat^3)
-0.2207
0.2375 -0.930
0.353

Interpreting and comparing Linear Regression Models

59

Residual standard error: 0.2458 on 539 degrees of freedom


Multiple R-squared: 0.5683,
Adjusted R-squared: 0.5635
F-statistic: 118.3 on 6 and 539 DF, p-value: < 2.2e-16
To check if the joint effect of the second and third powers of the predicted values is
significant, one can perform an F test by comparing the original simpler linear model
with the current RESET model specification.
> anova(regr3.1, regr3.1RESET3)
Analysis of Variance Table
Model 1: log(price) ~ log(lotsize) + bedrooms + bathrms + airco
Model 2: log(price) ~ log(lotsize) + bedrooms + bathrms + airco +
I(yhat^2) + I(yhat^3)
Res.Df
RSS Df Sum of Sq
F Pr(>F)
1
541 32.622
2
539 32.554 2 0.068179 0.5644 0.569
It is also possible to perform the Wald test, available in the package lmtest: the
formula is similar to the one used for anova, but with the presence of an argument
specifying the kind of test to be used: the exact F test assuming that the distribution
of the errors is normal or the asymptotic result 2 based upon the application of a
likelihood ratio test.
> library(lmtest)
> waldtest(regr3.1, regr3.1RESET3, test = "F")
Wald test
Model 1: log(price) ~ log(lotsize) + bedrooms + bathrms + airco
Model 2: log(price) ~ log(lotsize) + bedrooms + bathrms + airco +
I(yhat^2) + I(yhat^3)
Res.Df Df
F Pr(>F)
1
541
2
539 2 0.5644 0.569
> waldtest(regr3.1, regr3.1RESET3, test = "Chisq")
Wald test
Model 1: log(price) ~ log(lotsize) + bedrooms + bathrms + airco
Model 2: log(price) ~ log(lotsize) + bedrooms + bathrms + airco +
I(yhat^2) + I(yhat^3)
Res.Df Df Chisq Pr(>Chisq)
1
541
2
539 2 1.1288
0.5687
The first call gives results analogous to anova.

60

Interpreting and comparing Linear Regression Models

3.1.2

Testing the functional form: A direct function to perform the


RESET test

In the package lmtest it is available also the resettest function, which performs
directly the RESET test, by specifying the following arguments:

the linear model to be tested,

the powers of the additional terms to include in the RESET specification,

the kind of terms to include (in our case the fitted values, that is the predicted
values).

> library(lmtest)
> resettest(regr3.1, power = 2, type = "fitted")
RESET test
data: regr3.1
RESET = 0.2647, df1 = 1, df2 = 540, p-value = 0.6071
> resettest(regr3.1, power = 2:3, type = "fitted")
RESET test
data: regr3.1
RESET = 0.5644, df1 = 2, df2 = 539, p-value = 0.569
The RESET statistics correspond to the F statistics in an ANOVA test comparing the
two models; in the first instance the RESET value is equal to the squared t statistic
calculated above to test the significance of I(yhat^2), namely 0.51452 = 0.2647 (the
significance level is the same for the two proposed tests, which are similar, that is
equivalent, since Tk2 = F1,k ). The second RESET value coincides with the F statistic
in the ANOVA analysis.

3.1.3

Testing the functional form: the RESET test for the extended
model

Since prices may also depend on other characteristics, all the variables available in the
data set are included in the preceding model specification: we have to update model
regr3.1:
> regr3.2 <- update(regr3.1, . ~ . + driveway + recroom +
fullbase + gashw + garagepl + prefarea + stories)
> summary(regr3.2)
Call:
lm(formula = log(price) ~ log(lotsize) + bedrooms + bathrms +
airco + driveway + recroom + fullbase + gashw + garagepl +
prefarea + stories, data = housing)

Interpreting and comparing Linear Regression Models

Residuals:
Min
1Q
-0.68355 -0.12247

Median
0.00802

3Q
0.12780

61

Max
0.67564

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
7.74509
0.21634 35.801 < 2e-16 ***
log(lotsize) 0.30313
0.02669 11.356 < 2e-16 ***
bedrooms
0.03440
0.01427
2.410 0.016294 *
bathrms
0.16576
0.02033
8.154 2.52e-15 ***
airco
0.16642
0.02134
7.799 3.29e-14 ***
driveway
0.11020
0.02823
3.904 0.000107 ***
recroom
0.05797
0.02605
2.225 0.026482 *
fullbase
0.10449
0.02169
4.817 1.90e-06 ***
gashw
0.17902
0.04389
4.079 5.22e-05 ***
garagepl
0.04795
0.01148
4.178 3.43e-05 ***
prefarea
0.13185
0.02267
5.816 1.04e-08 ***
stories
0.09169
0.01261
7.268 1.30e-12 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.2104 on 534 degrees of freedom
Multiple R-squared: 0.6865,
Adjusted R-squared: 0.6801
F-statistic: 106.3 on 11 and 534 DF, p-value: < 2.2e-16
the estimate of sigma can be extracted as usual with:
> summary(regr3.2)$sigma
[1] 0.2103959
An F test is performed to compare the present model with the previous one:
> anova(regr3.1, regr3.2)
Analysis of Variance Table
Model 1: log(price) ~ log(lotsize) + bedrooms + bathrms + airco
Model 2: log(price) ~ log(lotsize) + bedrooms + bathrms + airco +
driveway + recroom + fullbase+gashw+garagepl+prefarea+stories
Res.Df
RSS Df Sum of Sq
F
Pr(>F)
1
541 32.622
2
534 23.638 7
8.9839 28.993 < 2.2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
The following results show that according to the RESET tests the hypothesis of linear
model should not be rejected.
> resettest(regr3.2, power = 2, type = "fitted")

62

Interpreting and comparing Linear Regression Models

RESET test
data: regr3.2
RESET = 0.0033, df1 = 1, df2 = 533, p-value = 0.9539
> resettest(regr3.2, power = 2:3, type = "fitted")
RESET test
data: regr3.2
RESET = 0.0391, df1 = 2, df2 = 532, p-value = 0.9616
The 0.0033 in the first RESET output is the square of the rounded value (0.06) present
in Verbeek at p. 75, which is referred to the t-test formulation of the RESET test.

3.1.4

Testing the functional form: the interaction term

To establish in the formula the presence of the interaction term among prefarea and
bedrooms we can follow two methods, see Appendix A.4:

define the new term in the formula as the product between the involved
variables: I(prefarea*bedrooms).

define the new term in the formula by making use of the : operator, which in
the formula algebra stands for interaction, as prefarea:bedrooms.

We will use the second option.


> regr3.2interact <- update(regr3.2, . ~ . + prefarea:bedrooms)
> summary(regr3.2interact)
Call:
lm(formula = log(price) ~ log(lotsize) + bedrooms + bathrms +
airco + driveway + recroom + fullbase + gashw + garagepl +
prefarea + stories + bedrooms:prefarea, data = housing)
Residuals:
Min
1Q
-0.68341 -0.12302

Median
0.00793

3Q
0.12909

Max
0.67559

Coefficients:
(Intercept)
log(lotsize)
bedrooms
bathrms
airco
driveway
recroom
fullbase
gashw

Estimate Std. Error t value Pr(>|t|)


7.743235
0.216995 35.684 < 2e-16 ***
0.303096
0.026719 11.344 < 2e-16 ***
0.035086
0.015213
2.306 0.021477 *
0.166005
0.020429
8.126 3.12e-15 ***
0.166688
0.021453
7.770 4.06e-14 ***
0.110282
0.028259
3.903 0.000107 ***
0.057990
0.026077
2.224 0.026581 *
0.104573
0.021721
4.814 1.93e-06 ***
0.178903
0.043943
4.071 5.39e-05 ***

Interpreting and comparing Linear Regression Models

garagepl
0.047961
prefarea
0.146040
stories
0.091473
bedrooms:prefarea -0.004675
--Signif. codes: 0 "***" 0.001

0.011487
0.110285
0.012729
0.035556

4.175
1.324
7.186
-0.131

63

3.48e-05 ***
0.186003
2.26e-12 ***
0.895454

"**" 0.01 "*" 0.05 "." 0.1 " " 1

Residual standard error: 0.2106 on 533 degrees of freedom


Multiple R-squared: 0.6866,
Adjusted R-squared: 0.6795
F-statistic: 97.29 on 12 and 533 DF, p-value: < 2.2e-16

3.1.5

Prediction

The expected log sale price for an arbitrary house in Windsor, with the characteristics
specified in Verbeek can be obtained as:
> predictregr3.2 <- predict(regr3.2, data.frame(lotsize = 10000,
bedrooms = 4, bathrms = 1, airco = 1, driveway = 1,
recroom = 1, fullbase = 1, gashw = 1, garagepl = 2,
prefarea = 1, stories = 2))
> predictregr3.2
1
11.86959
> exp(predictregr3.2)
1
142855.1
and the prediction corrected by considering the half variance factor:
> exp(predictregr3.2 + summary(regr3.2)$sigma^2/2)
1
146052.2

3.1.6

Model with price instead of log(price) as dependent variable and


lotsize instead log(lotsize) among the predictors

Prices are then modelled instead of log prices:


> regr3.3 <- update(regr3.2, price ~ lotsize + . log(lotsize))
> summary(regr3.3)
Call:
lm(formula = price ~ lotsize + bedrooms + bathrms + airco + driveway +
recroom + fullbase + gashw + garagepl + prefarea + stories,
data = housing)

64

Interpreting and comparing Linear Regression Models

Residuals:
Min
1Q Median
-41389 -9307
-591

3Q
7353

Max
74875

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -4038.3504 3409.4713 -1.184 0.236762
lotsize
3.5463
0.3503 10.124 < 2e-16 ***
bedrooms
1832.0035 1047.0002
1.750 0.080733 .
bathrms
14335.5585 1489.9209
9.622 < 2e-16 ***
airco
12632.8904 1555.0211
8.124 3.15e-15 ***
driveway
6687.7789 2045.2458
3.270 0.001145 **
recroom
4511.2838 1899.9577
2.374 0.017929 *
fullbase
5452.3855 1588.0239
3.433 0.000642 ***
gashw
12831.4063 3217.5971
3.988 7.60e-05 ***
garagepl
4244.8290
840.5442
5.050 6.07e-07 ***
prefarea
9369.5132 1669.0907
5.614 3.19e-08 ***
stories
6556.9457
925.2899
7.086 4.37e-12 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 15420 on 534 degrees of freedom
Multiple R-squared: 0.6731,
Adjusted R-squared: 0.6664
F-statistic: 99.97 on 11 and 534 DF, p-value: < 2.2e-16

3.1.7

The PE test to compare a loglinear specification with the linear


specification

To choose between a linear and a loglinear functional form Verbeek, p. 69, suggests
to make recourse to the PE test procedure4 .
We consider first the construction step by step of the test. We have first to obtain
the predictors in the linear and in the loglinear specifications.
> predlin <- predict(regr3.3)
> predloglin <- predict(regr3.2)
Then we have to consider the estimation of the augmented linear model by adding
the proper term, see Verbeek, and perform an ANOVA to compare the augmented
model with the initial one.
> linaugm <- update(regr3.3, . ~ . + I(log(predlin) predloglin))
> anova(linaugm, regr3.3)
4 At Verbeeks pp. 67-68 the encompassing procedure is presented to compare two non-nested
linear models. This is implemented in the R function encomptest, available in the package lmtest.
See the help ?lmtest::encomptest for more information on this function.

Interpreting and comparing Linear Regression Models

65

Analysis of Variance Table


Model 1: price ~ lotsize + bedrooms + bathrms + airco + driveway +
recroom + fullbase + gashw + garagepl + prefarea + stories +
I(log(predlin) - predloglin)
Model 2: price ~ lotsize + bedrooms + bathrms + airco + driveway +
recroom + fullbase + gashw + garagepl + prefarea + stories
Res.Df
RSS Df
Sum of Sq
F
Pr(>F)
1
533 1.1849e+11
2
534 1.2703e+11 -1 -8534724490 38.391 1.159e-09 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Finally we have to estimate the augmented loglinear model by adding the proper
term, see again Verbeek, and perform an ANOVA to compare the augmented model
with the initial one.
> loglinaugm <- update(regr3.2, . ~ . + I(predlin exp(predloglin)))
> anova(loglinaugm, regr3.2)
Analysis of Variance Table
Model 1: log(price) ~ log(lotsize) + bedrooms + bathrms + airco
driveway + recroom + fullbase + gashw + garagepl + prefarea
stories + I(predlin - exp(predloglin))
Model 2: log(price) ~ log(lotsize) + bedrooms + bathrms + airco
driveway + recroom + fullbase + gashw + garagepl + prefarea
stories
Res.Df
RSS Df Sum of Sq
F Pr(>F)
1
533 23.624
2
534 23.638 -1 -0.014342 0.3236 0.5697

+
+
+
+

However, in the package lmtest it is available the function petest that performs the
two previous tests by adding the proper augmentation terms to the linear and loglinear
models and returns the parameter estimates and the t statistics of the augmented
terms in the augmented models.
> library(lmtest)
> petest(regr3.3, regr3.2)
PE test
Model 1: price ~ lotsize + bedrooms + bathrms + airco + driveway +
recroom + fullbase + gashw + garagepl + prefarea + stories
Model 2: log(price) ~ log(lotsize) + bedrooms + bathrms + airco +
driveway + recroom + fullbase + gashw + garagepl + prefarea +
stories
Estimate Std. Error t value Pr(>|t|)

66

Interpreting and comparing Linear Regression Models

M1 + log(fit(M1))-fit(M2)
-74774
12068 -6.1961 1.159e-09 ***
M2 + fit(M1)-exp(fit(M2))
0
0 -0.5688
0.5697
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Observe that the square of the t values correspond to the F statistic obtained before.

3.2

Selection procedures: Predicting Stock Index Returns


(Section 3.5)

In Section 3.2.2 Verbeek presents some criteria apt to perform regressor selection. In
2 , the stepwise,
Section 3.5 Verbeek compares the models corresponding to the max R
the min AIC and the min BIC criteria with a general unrestricted model explaining
the excess return on S&P 500 index, EXRET, conditional on the full set of regressors
consisting of:

CS_1: credit spread (yield on Moodys Aaa minus BBa debt), lagged one month,

DY_1: dividend yield S&P 500 index, lagged one month (in % per month),

I12_1: 12-month interest rate, lagged one month,

I12_2: 12-month interest rate, lagged two months,

I3_1: 1-month interest rate, lagged one month,

I3_2: 1-month interest rate, lagged two months,

INF_2: inflation, lagged two months,

IP_2: change industrial production, lagged two months,

MB_2: change in monetary base, lagged two months,

PE_1: price earnings ratio (S&P500), lagged one month,

TS_1: term spread, lagged one month,

WINTER: dummy, 1 in November to April, 0 otherwise,

First we read the data set available in the file predictsp.dat.


> pred <- read.table(unzip("ch03.zip", "Chapter 3/predictsp.dat"),
header = T)
One, as usual, may use the instructions head(pred), tail(pred) and
summary(pred) to explore the data and check the main summary statistics.
To check the time extension of the data in the data.frame, we can derive from the
variable OBS, which has the following text form: yyyyMmm, (e.g. 1985M07) the variables
year and month by extracting respectively the substrings consisting of the first 4 and
the last 2 characters:
> year <- substr(pred$OBS, 1, 4)
> month <- substr(pred$OBS, 6, 7)

Interpreting and comparing Linear Regression Models

67

We next construct a contingency table month by year. By looking at this table possibly
missing time records may be found:
> table(year, month)
month
year
01 02 03 04 05 06 07 08 09 10 11 12
1966 1 1 1 1 1 1 1 1 1 1 1 1
1967 1 1 1 1 1 1 1 1 1 1 1 1
1968 1 1 1 1 1 1 1 1 1 1 1 1
omitted
month
year
01 02 03 04 05 06 07 08 09 10 11 12
2003 1 1 1 1 1 1 1 1 1 1 1 1
2004 1 1 1 1 1 1 1 1 1 1 1 1
2005 1 1 1 1 1 1 1 1 1 1 1 1
To check if the time series is complete (presence of no missing data) one may control
if some entry in the preceding table exists, whose value is zero.
> prod(table(year, month))
[1] 1
In our case the time series is complete since the product of the entries in the table is
different from 0.
The regression analyses are proposed on the time window beginning at January 1966
and ending at December 1995, so it is possible to define a time series object and a
temporary variable to perform regressions.
> pred <- ts(data = pred, start = c(1966, 1), frequency = 12)
> predtmp <- window(pred, start = c(1966, 1), end = c(1995,
12))
To compare the goodness of the models, the out of sample forecasting performance
will also be evaluated on the time window starting at January 1996 and ending at
December 2005, see below Section 3.2.9.
> predctrl <- window(pred, start = c(1996, 1))
Pay attention to the definition of the variable EXRET, the excess return, which is
expressed as percentage; so in the following analyses it has to be divided by 100.
We now present the methods to implement the four main procedures described by
Verbeek to perform a model selection.

3.2.1

The full model

The full model consists simply in the estimation of the general unrestricted linear
model.

68

Interpreting and comparing Linear Regression Models

> regr3.4f <- lm(EXRET/100 ~ PE_1 + DY_1 + INF_2 +


IP_2 + I3_1 + I3_2 + I12_1 + I12_2 + MB_2 + CS_1 +
WINTER, data = predtmp)
> summary(regr3.4f)
Call:
lm(formula = EXRET/100 ~ PE_1 + DY_1 + INF_2 + IP_2 + I3_1 +
I3_2 + I12_1 + I12_2 + MB_2 + CS_1 + WINTER, data = predtmp)
Residuals:
Min
1Q
-0.194908 -0.023106

Median
0.001566

3Q
0.026069

Max
0.138232

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.020743
0.040228
0.516 0.60644
PE_1
-0.119712
0.129367 -0.925 0.35542
DY_1
0.126504
0.082880
1.526 0.12783
INF_2
-0.163318
0.076788 -2.127 0.03413 *
IP_2
-0.059783
0.061097 -0.978 0.32851
I3_1
0.268687
0.124505
2.158 0.03161 *
I3_2
-0.222916
0.121074 -1.841 0.06645 .
I12_1
-0.505236
0.123478 -4.092 5.33e-05 ***
I12_2
0.388662
0.127934
3.038 0.00256 **
MB_2
-0.043959
0.083836 -0.524 0.60037
CS_1
0.175387
0.109343
1.604 0.10962
WINTER
0.006249
0.004405
1.419 0.15693
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.04022 on 348 degrees of freedom
Multiple R-squared: 0.1698,
Adjusted R-squared: 0.1435
F-statistic: 6.47 on 11 and 348 DF, p-value: 8.4e-10

3.2.2

2 criterion
The max R

2 criterion, denoted in R by all/best subsets regression may be performed


The max R
by invoking the command regsubsets available in the package leaps. The first
argument is the model; in the present case EXRET/100~. includes as regressors all
the variables in the data set specified as third argument, the fourth argument is the
maximum number of subset models to consider and the fifth one an array specifying
the columns to exclude.
The result is an object listing the models ordered by their adjusted R2 values.
> library(leaps)
> regr3.4mr <- regsubsets(EXRET/100 ~ ., data = predtmp,
nvmax = 12, force.out = c(1, 12))

Interpreting and comparing Linear Regression Models

> summary(regr3.4mr)
Subset selection object
Call: regsubsets.formula(EXRET/100
force.out = c(1, 12))
13 Variables (and intercept)
Forced in Forced out
CS_1
FALSE
TRUE
DY_1
FALSE
FALSE
I12_1
FALSE
FALSE
I12_2
FALSE
FALSE
I3_1
FALSE
FALSE
I3_2
FALSE
FALSE
INF_2
FALSE
FALSE
IP_2
FALSE
FALSE
MB_2
FALSE
FALSE
PE_1
FALSE
FALSE
WINTER
FALSE
FALSE
OBS
FALSE
TRUE
TS_1
FALSE
FALSE
1 subsets of each size up to 12
Selection Algorithm: exhaustive
CS_1 DY_1 I12_1 I12_2 I3_1 I3_2
1 "*" " " " "
" "
" " " "
2 " " " " "*"
"*"
" " " "
3 "*" " " "*"
"*"
" " " "
4 "*" "*" "*"
"*"
" " " "
5 "*" "*" "*"
"*"
" " " "
6 "*" "*" "*"
"*"
"*" "*"
7 "*" "*" "*"
"*"
"*" "*"
8 "*" "*" "*"
"*"
"*" "*"
9 "*" "*" "*"
"*"
"*" "*"
10 "*" "*" "*"
"*"
"*" "*"
11 "*" "*" "*"
"*"
"*" "*"

69

~ ., data = predtmp, nvmax = 12,

INF_2
" "
" "
" "
" "
" "
" "
"*"
"*"
"*"
"*"
"*"

IP_2
" "
" "
" "
" "
" "
" "
" "
" "
" "
"*"
"*"

MB_2
" "
" "
" "
" "
" "
" "
" "
"*"
"*"
" "
"*"

PE_1
" "
" "
" "
" "
" "
" "
" "
" "
" "
"*"
"*"

WINTER
" "
" "
" "
" "
"*"
" "
" "
" "
"*"
"*"
"*"

OBS
" "
" "
" "
" "
" "
" "
" "
" "
" "
" "
" "

TS_1
" "
" "
" "
" "
" "
" "
" "
" "
" "
" "
" "

2 , that is the adjusted R2 , for the


With the following instructions we can print R
selected models and obtain a graphical representation, see Fig. 3.1, representing the
2 s.
structure of the models, which have been considered, with their R
> cbind(model = 1:11, "Adjusted R-squared" = summary(regr3.4mr)$adjr2)
model Adjusted R-squared
[1,]
1
0.01899891
[2,]
2
0.08149153
[3,]
3
0.11350781
[4,]
4
0.12806215
[5,]
5
0.13212041
[6,]
6
0.13601132
[7,]
7
0.14031904

70

Interpreting and comparing Linear Regression Models

0.15
0.14
0.14
0.14
adjr2

0.14
0.14
0.13
0.13
0.11
0.081

Figure 3.1

TS_1

OBS

WINTER

PE_1

MB_2

IP_2

I3_2

INF_2

I3_1

I12_2

I12_1

DY_1

CS_1

(Intercept)

0.019

adjusted R squared values for subset regression models

[8,]
8
0.14310437
[9,]
9
0.14496314
[10,]
10
0.14532321
[11,]
11
0.14354387
> plot(regr3.4mr, scale = "adjr2")
We can extract the coefficients5 for the model with max R2
> coef(regr3.4mr, 10)
(Intercept)
CS_1
DY_1
I12_1
I12_2
0.031193369 0.158311342 0.102823200 -0.503855027 0.397589862
I3_1
I3_2
INF_2
IP_2
PE_1
0.258969875 -0.223074248 -0.156767570 -0.068582462 -0.157694683
WINTER
0.006415849
5 This

can also be made with the command coef(regr3.4mr,which.max(summary(regr3.4mr)$adjr2))

Interpreting and comparing Linear Regression Models

71

To compare the parameter estimates of this model with those pertaining the other
selection criteria we have to obtain an lm object with the estimation results.
This may be obtained by performing the following procedure.
1. The names of the variables, included as regressors in the selected model, correspond
to the names of the coefficients we have just obtained but the first element:
> anames <- names(coef(regr3.4mr, 10))[-1]
> anames
[1] "CS_1"
"DY_1"
"I12_1" "I12_2" "I3_1"
[7] "INF_2" "IP_2"
"PE_1"
"WINTER"

"I3_2"

2. The function match returns a vector of the positions of (first) matches of its first
argument in its second.
We can use this function to get the column indices in the data.frame predtmp
(specified as second argument) that match to the vector of elements consisting of
the dependent variable name, EXRET, and the independent variable names, anames,
(first argument).
> a <- match(c("EXRET", anames), colnames(predtmp))
> a
[1] 4 2 3 5 6 7 8 9 10 12 14
3. Finally the data.frames necessary to perform the linear regression and the
out of sample performance evaluation can be defined by selecting the involved
columns/variables in predtmp and predctrl, and the regr3.4mr is obtained as an
lm object.
> predtmpmr <- data.frame(predtmp[, a])
> predctrlmr <- data.frame(predctrl[, a])
> regr3.4mr <- lm(EXRET/100 ~ ., data = predtmpmr)
> summary(regr3.4mr)
Call:
lm(formula = EXRET/100 ~ ., data = predtmpmr)
Residuals:
Min
1Q
-0.195037 -0.023523

Median
0.001951

3Q
0.026865

Max
0.138763

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.031193
0.034907
0.894 0.37214
CS_1
0.158311
0.104272
1.518 0.12986
DY_1
0.102823
0.069422
1.481 0.13947
I12_1
-0.503855
0.123322 -4.086 5.46e-05 ***
I12_2
0.397590
0.126664
3.139 0.00184 **
I3_1
0.258970
0.122990
2.106 0.03595 *
I3_2
-0.223074
0.120947 -1.844 0.06597 .

72

Interpreting and comparing Linear Regression Models

INF_2
-0.156768
IP_2
-0.068582
PE_1
-0.157695
WINTER
0.006416
--Signif. codes: 0 "***"

0.075686
0.058686
0.107072
0.004389

-2.071
-1.169
-1.473
1.462

0.03907 *
0.24335
0.14171
0.14470

0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1

Residual standard error: 0.04018 on 349 degrees of freedom


Multiple R-squared: 0.1691,
Adjusted R-squared: 0.1453
F-statistic: 7.104 on 10 and 349 DF, p-value: 3.412e-10

3.2.3

Stepwise

In the package MASS the instruction dropterm is available, which returns some
statistics for testing the presence of every term appearing in a linear regression model.
An initial model, which we assume to be a general unrestricted model, may then be
recursively improved, by alternating the functions dropterm and update, until no
regressor needs to be excluded. In the next section we report an algorithm to perform
a stepwise backward selection procedure in an automatic way.
The syntax of dropterm consists of two arguments: the first one is an lm object,
that is an object resulting from a linear model estimation; the latter is the test to be
performed, in our case an ANOVA consisting in an F test. We recall that the function
update has two main arguments: the first is the lm object to update; the second one
is the updating formula.
We first present the sequence of steps for the model selection in the current case
study. The variable with the lowest (non significant) F statistic will be excluded at
each step.
> library(MASS)
> dropterm(regr3.4f, test = "F")
Single term deletions
Model:
EXRET/100
I12_2
Df
<none>
PE_1
1
DY_1
1
INF_2
1
IP_2
1
I3_1
1
I3_2
1
I12_1
1
I12_2
1
MB_2
1

~ PE_1 + DY_1 + INF_2 + IP_2 + I3_1 + I3_2 + I12_1 +


+ MB_2 + CS_1 + WINTER
Sum of Sq
RSS
AIC F Value
Pr(F)
0.56287 -2301.9
0.0013850 0.56425 -2303.0 0.8563 0.355416
0.0037682 0.56663 -2301.5 2.3297 0.127832
0.0073167 0.57018 -2299.2 4.5236 0.034133 *
0.0015486 0.56441 -2302.9 0.9574 0.328514
0.0075326 0.57040 -2299.1 4.6571 0.031608 *
0.0054829 0.56835 -2300.4 3.3899 0.066449 .
0.0270789 0.58994 -2287.0 16.7420 5.326e-05 ***
0.0149278 0.57779 -2294.5 9.2293 0.002562 **
0.0004447 0.56331 -2303.6 0.2749 0.600375

Interpreting and comparing Linear Regression Models

CS_1
1 0.0041614 0.56703 -2301.2 2.5729 0.109619
WINTER 1 0.0032546 0.56612 -2301.8 2.0122 0.156931
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
> step_1 <- update(regr3.4f, . ~ . - MB_2)
> dropterm(step_1, test = "F")
Single term deletions
Model:
EXRET/100 ~ PE_1 + DY_1 + INF_2 + IP_2 + I3_1 + I3_2 + I12_1 +
I12_2 + CS_1 + WINTER
Df Sum of Sq
RSS
AIC F Value
Pr(F)
<none>
0.56331 -2303.6
PE_1
1 0.0035011 0.56681 -2303.4 2.1691
0.14171
DY_1
1 0.0035409 0.56685 -2303.3 2.1938
0.13947
INF_2
1 0.0069248 0.57024 -2301.2 4.2903
0.03907 *
IP_2
1 0.0022044 0.56551 -2304.2 1.3657
0.24335
I3_1
1 0.0071561 0.57047 -2301.1 4.4336
0.03595 *
I3_2
1 0.0054907 0.56880 -2302.1 3.4018
0.06597 .
I12_1
1 0.0269434 0.59025 -2288.8 16.6928 5.456e-05 ***
I12_2
1 0.0159032 0.57921 -2295.6 9.8528
0.00184 **
CS_1
1 0.0037206 0.56703 -2303.2 2.3051
0.12986
WINTER 1 0.0034489 0.56676 -2303.4 2.1368
0.14470
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
> step_2 <- update(step_1, . ~ . - IP_2)
> dropterm(step_2, test = "F")
Single term deletions
Model:
EXRET/100 ~ PE_1 + DY_1 + INF_2 + I3_1 + I3_2 + I12_1 + I12_2 +
CS_1 + WINTER
Df Sum of Sq
RSS
AIC F Value
Pr(F)
<none>
0.56551 -2304.2
PE_1
1 0.0027553 0.56827 -2304.4 1.7053 0.192459
DY_1
1 0.0042573 0.56977 -2303.5 2.6349 0.105441
INF_2
1 0.0056055 0.57112 -2302.7 3.4693 0.063355 .
I3_1
1 0.0078989 0.57341 -2301.2 4.8886 0.027680 *
I3_2
1 0.0052475 0.57076 -2302.9 3.2477 0.072385 .
I12_1
1 0.0286570 0.59417 -2288.4 17.7360 3.231e-05 ***
I12_2
1 0.0153555 0.58087 -2296.6 9.5036 0.002213 **
CS_1
1 0.0117877 0.57730 -2298.8 7.2954 0.007249 **
WINTER 1 0.0029908 0.56851 -2304.3 1.8510 0.174540
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1

73

74

Interpreting and comparing Linear Regression Models

> step_3 <- update(step_2, . ~ . - PE_1)


> dropterm(step_3, test = "F")
Single term deletions
Model:
EXRET/100 ~ DY_1 + INF_2 + I3_1 + I3_2 + I12_1 + I12_2 + CS_1 +
WINTER
Df Sum of Sq
RSS
AIC F Value
Pr(F)
<none>
0.56827 -2304.4
DY_1
1 0.0146810 0.58295 -2297.3 9.0679 0.002790 **
INF_2
1 0.0037588 0.57203 -2304.1 2.3217 0.128485
I3_1
1 0.0086626 0.57693 -2301.0 5.3506 0.021293 *
I3_2
1 0.0060113 0.57428 -2302.7 3.7130 0.054798 .
I12_1
1 0.0295562 0.59783 -2288.2 18.2558 2.491e-05 ***
I12_2
1 0.0168036 0.58507 -2296.0 10.3790 0.001394 **
CS_1
1 0.0111500 0.57942 -2299.5 6.8870 0.009062 **
WINTER 1 0.0032091 0.57148 -2304.4 1.9822 0.160048
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
> step_4 <- update(step_3, . ~ . - WINTER)
> dropterm(step_4, test = "F")
Single term deletions
Model:
EXRET/100 ~ DY_1 + INF_2 + I3_1 + I3_2 + I12_1 + I12_2 + CS_1
Df Sum of Sq
RSS
AIC F Value
Pr(F)
<none>
0.57148 -2304.4
DY_1
1 0.016450 0.58793 -2296.2 10.1323 0.001587 **
INF_2
1 0.004495 0.57597 -2303.6 2.7688 0.097007 .
I3_1
1 0.009860 0.58134 -2300.3 6.0733 0.014201 *
I3_2
1 0.005593 0.57707 -2302.9 3.4452 0.064270 .
I12_1
1 0.031646 0.60312 -2287.0 19.4919 1.346e-05 ***
I12_2
1 0.015937 0.58742 -2296.5 9.8162 0.001875 **
CS_1
1 0.012315 0.58379 -2298.8 7.5852 0.006191 **
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
> step_5 <- update(step_4, . ~ . - INF_2)
> dropterm(step_5, test = "F")
Single term deletions
Model:
EXRET/100
Df
<none>
DY_1
1
I3_1
1

~ DY_1 + I3_1 + I3_2 + I12_1 + I12_2 + CS_1


Sum of Sq
RSS
AIC F Value
Pr(F)
0.57597 -2303.6
0.0119841 0.58796 -2298.2 7.3447 0.0070543 **
0.0083165 0.58429 -2300.4 5.0970 0.0245780 *

Interpreting and comparing Linear Regression Models

I3_2
I12_1
I12_2
CS_1
--Signif.

1
1
1
1

0.0072565
0.0303013
0.0201211
0.0129920

codes:

0.58323
0.60628
0.59610
0.58897

75

-2301.1 4.4473 0.0356604 *


-2287.2 18.5709 2.126e-05 ***
-2293.2 12.3317 0.0005031 ***
-2297.6 7.9624 0.0050461 **

0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1

The final model can be estimated by applying the function lm to the formula defined
at step 5.
> regr3.4sw <- lm(step_5)
> summary(regr3.4sw)
Call:
lm(formula = step_5)
Residuals:
Min
1Q
-0.208944 -0.025168

Median
0.000683

3Q
0.027499

Max
0.128369

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.013180
0.009379 -1.405 0.160847
DY_1
0.130442
0.048132
2.710 0.007054 **
I3_1
0.273531
0.121158
2.258 0.024578 *
I3_2
-0.252307
0.119641 -2.109 0.035660 *
I12_1
-0.528130
0.122553 -4.309 2.13e-05 ***
I12_2
0.435283
0.123954
3.512 0.000503 ***
CS_1
0.239039
0.084712
2.822 0.005046 **
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.04039 on 353 degrees of freedom
Multiple R-squared: 0.1505,
Adjusted R-squared: 0.136
F-statistic: 10.42 on 6 and 353 DF, p-value: 1.213e-10

3.2.4

>
>
>
>

An algorithm to perform a stepwise backward elimination of


regressors
check <- 0
step <- 0
regr3.4sw <- regr3.4f
while (check == 0) {
step <- step + 1
a <- dropterm(regr3.4sw, test = "F")
ind <- which.min(as.numeric(a[[5]]))

76

Interpreting and comparing Linear Regression Models

ifelse(a[[5]][ind] <= 1.96^2, {


vartodrop <- row.names(a[5])[ind]
print(paste("at step ", step, " variable ",
vartodrop, " is excluded", collapse = ""))
regr3.4sw <- update(regr3.4sw, as.formula(paste(".~.-",
vartodrop, collapse = "")))
}, check <- 1)
}
[1] "at step 1 variable MB_2 is excluded"
[1] "at step 2 variable IP_2 is excluded"
[1] "at step 3 variable PE_1 is excluded"
[1] "at step 4 variable WINTER is excluded"
[1] "at step 5 variable INF_2 is excluded"
> summary(regr3.4sw)
Call:
lm(formula = EXRET/100 ~ DY_1 + I3_1 + I3_2 + I12_1 + I12_2 +
CS_1, data = predtmp)
Residuals:
Min
1Q
-0.208944 -0.025168

Median
0.000683

3Q
0.027499

Max
0.128369

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.013180
0.009379 -1.405 0.160847
DY_1
0.130442
0.048132
2.710 0.007054 **
I3_1
0.273531
0.121158
2.258 0.024578 *
I3_2
-0.252307
0.119641 -2.109 0.035660 *
I12_1
-0.528130
0.122553 -4.309 2.13e-05 ***
I12_2
0.435283
0.123954
3.512 0.000503 ***
CS_1
0.239039
0.084712
2.822 0.005046 **
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.04039 on 353 degrees of freedom
Multiple R-squared: 0.1505,
Adjusted R-squared: 0.136
F-statistic: 10.42 on 6 and 353 DF, p-value: 1.213e-10

3.2.5

AIC

The command stepAIC, available in the package MASS, performs a stepwise model
selection by the Akaike Information Criterion. The first argument is the general
unrestricted linear model the procedure will be applied to; the option trace=0
suppresses the output of the procedure. See the help ?stepAIC for more information
on this function.

Interpreting and comparing Linear Regression Models

> library(MASS)
> regr3.4aic <- stepAIC(regr3.4f, trace = 0)
> regr3.4aic$anova
Stepwise Model Path
Analysis of Deviance Table
Initial Model:
EXRET/100 ~ PE_1 + DY_1 + INF_2 + IP_2 + I3_1 + I3_2 + I12_1 +
I12_2 + MB_2 + CS_1 + WINTER
Final Model:
EXRET/100 ~ DY_1 + INF_2 + I3_1 + I3_2 + I12_1 + I12_2 + CS_1 +
WINTER

Step Df
Deviance Resid. Df Resid. Dev
AIC
1
348 0.5628657 -2301.895
2 - MB_2 1 0.0004446865
349 0.5633103 -2303.610
3 - IP_2 1 0.0022043540
350 0.5655147 -2304.204
4 - PE_1 1 0.0027552908
351 0.5682700 -2304.455
> summary(regr3.4aic)
Call:
lm(formula = EXRET/100 ~ DY_1 + INF_2 + I3_1 + I3_2 + I12_1 +
I12_2 + CS_1 + WINTER, data = predtmp)
Residuals:
Min
1Q
-0.202784 -0.023496

Median
0.002058

3Q
0.026805

Max
0.136163

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.021813
0.010225 -2.133 0.03360 *
DY_1
0.166210
0.055195
3.011 0.00279 **
INF_2
-0.106879
0.070144 -1.524 0.12848
I3_1
0.283111
0.122393
2.313 0.02129 *
I3_2
-0.232290
0.120551 -1.927 0.05480 .
I12_1
-0.524830
0.122834 -4.273 2.49e-05 ***
I12_2
0.406251
0.126100
3.222 0.00139 **
CS_1
0.222507
0.084787
2.624 0.00906 **
WINTER
0.006159
0.004375
1.408 0.16005
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.04024 on 351 degrees of freedom
Multiple R-squared: 0.1618,
Adjusted R-squared: 0.1427
F-statistic: 8.47 on 8 and 351 DF, p-value: 1.518e-10

77

78

Interpreting and comparing Linear Regression Models

3.2.6

BIC

By considering in the command stepAIC, available in the package MASS, the argument
k=log(n) specifying the penalty parameter6 , where n is the length of the data set, it is
possible to perform a stepwise model selection by the Bayesian Information Criterion
(BIC) or Schwarz Bayesian Criterion (SBC). The other argument in stepAIC, we
recall, is the linear model the procedure applies to (the option trace=0 suppresses
the output of the procedure).
See the help ?stepAIC for more information on this function.
> regr3.4bic <- stepAIC(regr3.4f, k = log(length(regr3.4f$res)),
trace = 0)
> regr3.4bic$anova
Stepwise Model Path
Analysis of Deviance Table
Initial Model:
EXRET/100 ~ PE_1 + DY_1 + INF_2 + IP_2 + I3_1 + I3_2 + I12_1 +
I12_2 + MB_2 + CS_1 + WINTER
Final Model:
EXRET/100 ~ DY_1 + I12_1 + I12_2 + CS_1

Step Df
Deviance Resid. Df Resid. Dev
1
348 0.5628657
2
- MB_2 1 0.0004446865
349 0.5633103
3
- IP_2 1 0.0022043540
350 0.5655147
4
- PE_1 1 0.0027552908
351 0.5682700
5 - WINTER 1 0.0032091205
352 0.5714791
6 - INF_2 1 0.0044952452
353 0.5759744
7
- I3_2 1 0.0072564641
354 0.5832308
8
- I3_1 1 0.0013361484
355 0.5845670
> summary(regr3.4bic)
Call:
lm(formula = EXRET/100 ~ DY_1 + I12_1 + I12_2 +
Residuals:
Min
1Q
-0.214771 -0.025708

Median
0.001165

3Q
0.027578

AIC
-2255.261
-2260.863
-2265.343
-2269.480
-2273.338
-2276.404
-2277.783
-2282.845

CS_1, data = predtmp)

Max
0.132124

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.01230
0.00941 -1.308 0.19187
DY_1
0.12378
0.04698
2.635 0.00879 **
6k

is by default set equal to 2 giving the AIC criterion.

Interpreting and comparing Linear Regression Models

I12_1
-0.27563
I12_2
0.20603
CS_1
0.22539
--Signif. codes: 0 "***"

0.05083
0.05298
0.08369

79

-5.423 1.09e-07 ***


3.889 0.00012 ***
2.693 0.00742 **

0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1

Residual standard error: 0.04058 on 355 degrees of freedom


Multiple R-squared: 0.1378,
Adjusted R-squared: 0.1281
F-statistic: 14.18 on 4 and 355 DF, p-value: 9.511e-11

3.2.7

A better output to compare the results

To compare the parameter estimates in the previous models, we can make use of the
function mtable available7 in the package memisc.
> library(memisc)
> mtable3.4 <- mtable(full = regr3.4f, "max adj R2" = regr3.4mr,
stepwise = regr3.4sw, "min AIC" = regr3.4aic,
"min BIC" = regr3.4bic)
> mtable3.4 <- relabel(mtable3.4, "(Intercept)" = "constant",
PE_1 = "pe_{t-1}", DY_1 = "dy_{t-1}", INF_2 = "infl_{t-1}",
IP_2 = "ip_{t-2}", I3_1 = "i3_{t-1}", I3_2 = "i3_{t-2}",
I12_1 = "i12_{t-1}", I12_2 = "i12_{t-2}", MB_2 = "mb_{t-2}",
CS_1 = "cs_{t-1}", WINTER = "winter_t")
> mtable3.4
Calls:
full: lm(formula = EXRET/100 ~ PE_1 + DY_1 + INF_2 + IP_2 + I3_1 +
I3_2 + I12_1 + I12_2 + MB_2 + CS_1 + WINTER, data = predtmp)
max adj R2: lm(formula = EXRET/100 ~ ., data = predtmpmr)
stepwise: lm(formula = EXRET/100 ~ DY_1 + I3_1 + I3_2 + I12_1 + I12_2+
CS_1, data = predtmp)
min AIC: lm(formula = EXRET/100 ~ DY_1 + INF_2 + I3_1 + I3_2 + I12_1 +
I12_2 + CS_1 + WINTER, data = predtmp)
min BIC: lm(formula = EXRET/100 ~ DY_1+I12_1+I12_2+CS_1,data=predtmp)
======================================================================
full
max adj R2 stepwise
min AIC
min BIC
---------------------------------------------------------------------constant
0.021
0.031
-0.013
-0.022*
-0.012
(0.040)
(0.035)
(0.009)
(0.010)
(0.009)
pe_{t-1}
-0.120
-0.158
(0.129)
(0.107)
dy_{t-1}
0.127
0.103
0.130**
0.166**
0.124**
7 Observe the use of the double quotes in the mtable call to specify the names of the lm objects
in the output: they are needed only when spaces are present in the name assigned to the lm object.

80

Interpreting and comparing Linear Regression Models

(0.083)
(0.069)
(0.048)
(0.055)
(0.047)
-0.163*
-0.157*
-0.107
(0.077)
(0.076)
(0.070)
ip_{t-2}
-0.060
-0.069
(0.061)
(0.059)
i3_{t-1}
0.269*
0.259*
0.274*
0.283*
(0.125)
(0.123)
(0.121)
(0.122)
i3_{t-2}
-0.223
-0.223
-0.252*
-0.232
(0.121)
(0.121)
(0.120)
(0.121)
i12_{t-1}
-0.505*** -0.504*** -0.528*** -0.525*** -0.276***
(0.123)
(0.123)
(0.123)
(0.123)
(0.051)
i12_{t-2}
0.389**
0.398**
0.435***
0.406**
0.206***
(0.128)
(0.127)
(0.124)
(0.126)
(0.053)
mb_{t-2}
-0.044
(0.084)
cs_{t-1}
0.175
0.158
0.239**
0.223**
0.225**
(0.109)
(0.104)
(0.085)
(0.085)
(0.084)
winter_t
0.006
0.006
0.006
(0.004)
(0.004)
(0.004)
---------------------------------------------------------------------R-squared
0.170
0.169
0.150
0.162
0.138
adj. R-squared
0.144
0.145
0.136
0.143
0.128
sigma
0.040
0.040
0.040
0.040
0.041
F
6.470
7.104
10.419
8.470
14.182
p
0.000
0.000
0.000
0.000
0.000
Log-likelihood
652.129
651.987
647.985
650.409
645.320
Deviance
0.563
0.563
0.576
0.568
0.585
AIC
-1278.259 -1279.975 -1279.971 -1280.819 -1278.640
BIC
-1227.740 -1233.341 -1248.882 -1241.958 -1255.323
N
360
360
360
360
360
======================================================================
infl_{t-1}

3.2.8

Some remarks on the AIC and BIC values

The values of the AIC statistics given by R for the estimated models in section 3.2.5
differ from those reported in the mtable output and also from the ones in Verbeek.

In Verbeek formula (3.17) AIC is defined as:


AIC = log

N
1 X 2 2k
e +
N i=1 i
N

where N is the dimension of the data set and k is the number of the unknown
parameters in the model.

In R the function extractAIC is available, which computes the AIC as:


AIC = 2 log(L) + 2 k

(3.1)

Interpreting and comparing Linear Regression Models

81

where L is the likelihood, but in case of a linear model with unknown scale
parameter, if RSS denotes the residual sum of squares then extractAIC uses
N log(RSS/N ) for 2 log(L), then we have:
AIC = N log

N
1 X 2
e + 2k
N i=1 i

(3.2)

Thus (3.2) results N times the value obtained according to Verbeeks


relationship (3.17).

In the report obtained with the function mtable the AIC and BIC are computed
by using the function AIC, according to relationship (3.1) by considering the
estimate of the logLikelihood given by the function logLik, (k is multiplied by
2 in the case of AIC and by log(N ) in the case of BIC).

Observe that if k is the number of the parameters in a linear regression model, when
the maximum likelihood estimation method is applied to a linear regression model,
k is substituted by k + 1 since the maximum likelihood estimation method considers
also the variance of the error as a parameter to estimate.
So with regard to the computation of AIC, e.g., for the full model we have that the
value given by mtable may be obtained as:
> -2 * logLik(regr3.4f) + 2 * (12 + 1)
'log Lik.' -1278.259 (df=13)
or, simply, as:
> AIC(regr3.4f)
[1] -1278.259
which divided by N = 360 is quite close to the value reported in Verbeeks Table 3.4
at page 73 (The exact correspondence can be created by substituting 2 * 12 instead
of 2 * (12 + 1))
> AIC(regr3.4f)/360
[1] -3.550719
The value given by extractAIC, computed according to (3.2), is
> extractAIC(regr3.4f)
[1]
12.000 -2301.895
and is equivalent to
> 360 * log(sum(residuals(regr3.4f)^2)/360) + 2 * 12
[1] -2301.895
and dividing by 360 we obtain the AIC value according to Verbeeks formula (3.17)
> log(sum(residuals(regr3.4f)^2)/360) + 2 * 12/360
[1] -6.394152

82

Interpreting and comparing Linear Regression Models

3.2.9

Out of sample forecasting performance (Table 3.5)

Verbeek presents in Table 3.5 the results of the out of sample forecasting performance,
which may be obtained by applying proper functions, here coded in the following
function outsamplefit, to the actual excess returns and the predicted ones8 .
> actual <- predctrl[, "EXRET"]/100
> outsamplefit <- function(actual, predict, name = "") {
mad <- sum(abs(predict - actual))/length(predict)
mape <- sum(abs(predict - actual)/actual)/length(predict)
rmse <- (sum((predict - actual)^2)/length(predict))^0.5
r2os1 <- 1 - sum((predict - actual)^2)/sum((mean(predtmp[,
"EXRET"]/100) - actual)^2)
r2os2 <- (cor(predict, actual))^2
hit <- sum(sign(predict) == sign(actual))/length(predict)
output <- rbind(RMSE = rmse, MAD = mad, MAPE = mape,
r2os1 = r2os1, r2os2 = r2os2, hit = hit)
colnames(output) <- name
output
}
> pr_f <- predict(regr3.4f, predctrl)
> pr_mr <- predict(regr3.4mr, predctrlmr)
> pr_sw <- predict(regr3.4sw, predctrl)
> pr_aic <- predict(regr3.4aic, predctrl)
> pr_bic <- predict(regr3.4bic, predctrl)
> outofsamplefit <- cbind(full = outsamplefit(actual,
predict=pr_f,name="full"),"max adj R2"=outsamplefit(actual,
predict=pr_mr,name="max adj R2"),stepwise=outsamplefit(actual,
predict=pr_sw,name="stepwise"),"min AIC"=outsamplefit(actual,
predict=pr_aic,name="min AIC"),"min BIC"=outsamplefit(actual,
predict=pr_bic,name="min BIC"))
> outofsamplefit[c(1:2, 6), ] <- 100 * outofsamplefit[c(1:2,
6), ]
> round(outofsamplefit, 4)
full max adj R2 stepwise min AIC min BIC
RMSE
4.8332
4.9362
4.8421 4.8843 4.7903
MAD
3.7913
3.8994
3.8040 3.8519 3.7480
MAPE
0.6998
0.6634
0.7736 0.9517 0.4830
r2os1 -0.1583
-0.2082 -0.1626 -0.1830 -0.1379
r2os2 0.0094
0.0105
0.0003 0.0003 0.0000
hit
50.0000
49.1667 48.3333 46.6667 47.5000
Pay attention to the different meaning of the values in the last output: results in the
1st, 2nd and 6th rows are expressed as percentages.
8 Observe that the predicted values for the max adjusted R2 model are based on the data.frame
predctrlmr, defined at the step 4. of the procedure described in Section 3.2.2

Interpreting and comparing Linear Regression Models

3.3

83

Explaining Individual Wages (Section 3.6)

Data may be read by means of the function read.table, having extracted the file
bwages.dat from the compressed file ch03.zip.
The summary statistics mean and standard deviation can be obtained for a single
variable, say WAGE, EDUC or EXPER, conditional on the levels of the variable MALE by
using the function tapply. The first argument is the variable we want to study (a
column of a data.frame); the second argument is a conditioning variable; the third
argument is the function used to study the variable in the first argument.
> indwages <- read.table(unzip("ch03.zip", "Chapter 3/bwages.dat"),
header = T)
> tapply(indwages$WAGE, indwages$MALE, mean)
0
1
10.26154 11.56223
We can combine the use of sapply to choose which variables the mean and standard
deviations are computed for
> means <- sapply(c(1, 3, 4), function(i) tapply(indwages[,
i], indwages$MALE, mean))
> stdevs <- sapply(c(1, 3, 4), function(i) tapply(indwages[,
i], indwages$MALE, sd))
> meanandstd <- array(c(means = means, stdevs = stdevs),
c(2, 3, 2))
> dimnames(meanandstd) <- list(c("females", "males"),
names(indwages)[c(1, 3, 4)], c("means", "stdevs"))
> meanandstd
, , means
WAGE
EDUC
EXPER
females 10.26154 3.587219 15.20380
males
11.56223 3.243001 18.52296
, , stdevs
WAGE
EDUC
EXPER
females 3.808585 1.086521 9.704987
males
4.753789 1.257386 10.251041
We have omitted from the summary analysis the variables MALE, LNWAGE, LNEXPER
and LNEDUC.

3.3.1

Linear Models (Section 3.6.1)

We estimate the parameter in the linear model, Table 3.7 in Verbeek


WAGE = 1 + 2 MALE + 3 EDUC + 4 EXPER + ERROR

84

Interpreting and comparing Linear Regression Models

> indwages3.7 <- lm(WAGE ~ MALE + EDUC + EXPER, data = indwages)


> summary(indwages3.7)
Call:
lm(formula = WAGE ~ MALE + EDUC + EXPER, data = indwages)
Residuals:
Min
1Q
-13.5294 -1.9686

Median
-0.3124

3Q
1.5679

Max
30.7015

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.213692
0.386895
0.552
0.581
MALE
1.346144
0.192736
6.984 4.32e-12 ***
EDUC
1.986090
0.080640 24.629 < 2e-16 ***
EXPER
0.192275
0.009583 20.064 < 2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 3.548 on 1468 degrees of freedom
Multiple R-squared: 0.3656,
Adjusted R-squared: 0.3643
F-statistic:
282 on 3 and 1468 DF, p-value: < 2.2e-16
To include the effect of the squared number of years of experience, see Table 3.8:
WAGE = 1 + 2 MALE + 3 EDUC + 4 EXPER + 5 EXPER2 + ERROR
use
> indwages3.8 <- lm(WAGE ~ MALE + EDUC + EXPER + I(EXPER^2),
data = indwages)
> summary(indwages3.8)
Call:
lm(formula = WAGE ~ MALE + EDUC + EXPER + I(EXPER^2), data = indwages)
Residuals:
Min
1Q
-12.7246 -1.9519

Median
-0.3107

3Q
1.5117

Max
30.5951

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.8924849 0.4329127 -2.062
0.0394
MALE
1.3336935 0.1908668
6.988 4.23e-12
EDUC
1.9881267 0.0798526 24.897 < 2e-16
EXPER
0.3579993 0.0316566 11.309 < 2e-16
I(EXPER^2) -0.0043692 0.0007962 -5.487 4.80e-08
---

*
***
***
***
***

Interpreting and comparing Linear Regression Models

Signif. codes:

85

0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1

Residual standard error: 3.514 on 1467 degrees of freedom


Multiple R-squared: 0.3783,
Adjusted R-squared: 0.3766
F-statistic: 223.2 on 4 and 1467 DF, p-value: < 2.2e-16
In presence of a linear model object lm we can automatically produce 4 plots, see
Figure 3.2, by means of the function plot using as first argument the object resulting
from a linear model and specifying as the second, which, a number from 1 to 4, see
Faraway (2002) Chapter 7:
1. if which = 1 we obtain the plot of residuals versus fitted values;
2. if which = 2 we obtain the normal Q-Q plot of the standardized residuals;
3. if which = 3 we obtain the plot of the square root of absolute standardized
residuals versus fitted values;
4. if which = 4 we obtain the plot of the Cook statistic to identify influential
observations, see Faraway (2002), p. 78;
The function layout(matrix) creates a multifigure environment, the numbers in the
matrix (in our instance a 2 2 matrix) define the pointer sequence specifying the
order the different graphs will appear.
> a <- matrix(1:4, 2, 2)
> a
[,1] [,2]
[1,]
1
3
[2,]
2
4
> layout(a)
> for (i in 1:4) plot(indwages3.8, which = i)

3.3.2

Loglinear Models (Section 3.6.2)

We have to estimate the model:


log(WAGE) = 1 + 2 MALE + 3 log(EDUC) + 4 log(EXPER) + 5 (log(EXPER))2 + ERROR
Observe that when a value of EXPER is null, the corresponding value of log(EXPER) is
not defined: Verbeek has constructed in the dataset wages in Belgium the variable
LNEXPER as log(EXPER+1) to avoid this situation.
> indwages3.9 <- lm(LNWAGE ~ MALE + LNEDUC + LNEXPER +
I(LNEXPER^2), data = indwages)
> summary(indwages3.9)
Call:
lm(formula = LNWAGE ~ MALE + LNEDUC + LNEXPER + I(LNEXPER^2),
data = indwages)

86

Interpreting and comparing Linear Regression Models

10

3.0

ScaleLocation
1446

2.0

15

Fitted values

0.08

264
1404

0.04

1446

0.00

Cook's distance

0 2 4 6 8
4

Standardized residuals

1165
1404

15

Cook's distance
1446

10
Fitted values

Normal QQ

1165

1404

1.0

1165
1404

0.0

1446

Standardized residuals

10 20 30
10

Residuals

Residuals vs Fitted

Theoretical Quantiles

500

1000

1500

Obs. number

Figure 3.2 Graphs that can be obtain directly from the lm object indwages3.8 related to
the linear model

Residuals:
Min
1Q
-1.75085 -0.15921

Median
0.00618

3Q
0.17145

Max
1.10533

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
1.26271
0.06634 19.033 < 2e-16 ***
MALE
0.11794
0.01557
7.574 6.35e-14 ***
LNEDUC
0.44218
0.01819 24.306 < 2e-16 ***
LNEXPER
0.10982
0.05438
2.019
0.0436 *
I(LNEXPER^2) 0.02601
0.01148
2.266
0.0236 *
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.2862 on 1467 degrees of freedom
Multiple R-squared: 0.3783,
Adjusted R-squared: 0.3766

Interpreting and comparing Linear Regression Models

87

Residuals vs Fitted

1.0
0.5

1.0

0.5

0.0

Residuals

1.5

462
312

2.0

677

1.4

1.6

1.8

2.0

2.2

2.4

2.6

2.8

Fitted values
lm(LNWAGE ~ MALE + LNEDUC + LNEXPER + I(LNEXPER^2))

Figure 3.3

Residuals against fitted values, loglinear model

F-statistic: 223.1 on 4 and 1467 DF,

p-value: < 2.2e-16

Figure 3.3 shows that heteroscedasticity is much less pronounced than for the additive
model. It may be obtained by means of the following instruction.
> plot(indwages3.9, which = 1)
Model without the effect of LNEXPER and of LNEXPER2
To check the joint effect of log(EXPER) and (log(EXPER))2 , we consider the parameter
estimation of the model:
log(WAGE) = 1 + 2 MALE + 3 log(EDUC) + ERROR

> indwages3.9restr <- update(indwages3.9, . ~ . - LNEXPER I(LNEXPER^2))


> anova(indwages3.9restr, indwages3.9)
Analysis of Variance Table

88

Interpreting and comparing Linear Regression Models

Model 1: LNWAGE
Model 2: LNWAGE
Res.Df
RSS
1
1469 158.57
2
1467 120.20
--Signif. codes:

~ MALE + LNEDUC
~ MALE + LNEDUC + LNEXPER + I(LNEXPER^2)
Df Sum of Sq
F
Pr(>F)
2

38.362 234.09 < 2.2e-16 ***

0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1

Model do not including the effect of LNEXPER2


We consider the parameter estimation of the model:
log(WAGE) = 1 + 2 MALE + 3 log(EDUC) + 4 log(EXPER) + ERROR

> indwages3.10 <- update(indwages3.9, . ~ . - I(LNEXPER^2))


> summary(indwages3.10)
Call:
lm(formula = LNWAGE ~ MALE + LNEDUC + LNEXPER, data = indwages)
Residuals:
Min
1Q
-1.70476 -0.15862

Median
0.00485

3Q
0.17366

Max
1.11815

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.14473
0.04118 27.798 < 2e-16 ***
MALE
0.12008
0.01556
7.715 2.22e-14 ***
LNEDUC
0.43662
0.01805 24.188 < 2e-16 ***
LNEXPER
0.23065
0.01073 21.488 < 2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.2867 on 1468 degrees of freedom
Multiple R-squared: 0.3761,
Adjusted R-squared: 0.3748
F-statistic:
295 on 3 and 1468 DF, p-value: < 2.2e-16
Model with education considered as a factor
We can now consider the education as a factor to study the effects of the different
education levels against the first level of education.
The model matrix (hereafter p1), that is the matrix X containing the values of the
regressors, some of which are dummy variables recoding the factor education, can
be obtained by using the function model.matrix whose first argument is the second
member of the formula defining the linear model, while its second argument is the
data.frame containing the variables involved in the linear model (some rows of p1
are reported).

Interpreting and comparing Linear Regression Models

> indwages$EDUC <- as.factor(indwages$EDUC)


> p1 <- model.matrix(~MALE + EDUC + LNEXPER,
> p1[c(1:2, 100:101, 365:366, 785:786), ]
(Intercept) MALE EDUC2 EDUC3 EDUC4 EDUC5
1
1
1
0
0
0
0
2
1
0
0
0
0
0
100
1
1
1
0
0
0
101
1
1
1
0
0
0
365
1
1
0
1
0
0
366
1
1
0
1
0
0
785
1
1
0
0
1
0
786
1
0
0
0
1
0

89

data = indwages)
LNEXPER
3.178054
2.772589
2.890372
3.218876
3.663562
1.791759
2.772589
1.609438

In defining the formula for the linear model we can express log(indwages$WAGE) as a
function of the model matrix p1 we have just obtained; we have to remember to drop
the constant since it already appears in the model matrix p1.
> indwages3.11 <- lm(log(indwages$WAGE) ~ -1 + p1)
> summary(indwages3.11)
Call:
lm(formula = log(indwages$WAGE) ~ -1 + p1)
Residuals:
Min
1Q
-1.65548 -0.15138

Median
0.01324

3Q
0.16998

Max
1.11684

Coefficients:
Estimate Std. Error t value Pr(>|t|)
p1(Intercept) 1.27189
0.04483 28.369 < 2e-16 ***
p1MALE
0.11762
0.01546
7.610 4.88e-14 ***
p1EDUC2
0.14364
0.03336
4.306 1.77e-05 ***
p1EDUC3
0.30487
0.03202
9.521 < 2e-16 ***
p1EDUC4
0.47428
0.03301 14.366 < 2e-16 ***
p1EDUC5
0.63910
0.03322 19.237 < 2e-16 ***
p1LNEXPER
0.23022
0.01056 21.804 < 2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.282 on 1465 degrees of freedom
Multiple R-squared: 0.9858,
Adjusted R-squared: 0.9858
F-statistic: 1.455e+04 on 7 and 1465 DF, p-value: < 2.2e-16
Note that since EDUC was recoded as a factor the above result may be also obtained
with
> indwages3.11 <- lm(log(indwages$WAGE) ~ MALE + EDUC +
LNEXPER, data = indwages)
> summary(indwages3.11)

90

Interpreting and comparing Linear Regression Models

Call:
lm(formula = log(indwages$WAGE)~MALE+EDUC+LNEXPER, data=indwages)
Residuals:
Min
1Q
-1.65548 -0.15138

Median
0.01324

3Q
0.16998

Max
1.11684

Coefficients:
Estimate Std. Error t value
(Intercept) 1.27189
0.04483 28.369
MALE
0.11762
0.01546
7.610
EDUC2
0.14364
0.03336
4.306
EDUC3
0.30487
0.03202
9.521
EDUC4
0.47428
0.03301 14.366
EDUC5
0.63910
0.03322 19.237
LNEXPER
0.23022
0.01056 21.804
--Signif. codes: 0 "***" 0.001 "**" 0.01

Pr(>|t|)
< 2e-16
4.88e-14
1.77e-05
< 2e-16
< 2e-16
< 2e-16
< 2e-16

***
***
***
***
***
***
***

"*" 0.05 "." 0.1 " " 1

Residual standard error: 0.282 on 1465 degrees of freedom


Multiple R-squared: 0.3976,
Adjusted R-squared: 0.3951
F-statistic: 161.1 on 6 and 1465 DF, p-value: < 2.2e-16
without having to drop the constant. The recourse to the model.matrix function
allows different parametrizations for the factor variable to be considered (via the
contrasts argument).

3.3.3

The Effects of Gender (Section 3.6.3)

To study the effects of gender we have to include the interactions MALE:EDUC and
MALE:LNEXPER. Remember that with the (*) the direct effects of the arguments
it applies and their interactions (which are usually defined by the : operator) are
included. So we have two ways for defining the model matrix:

model.matrix(~MALE+EDUC+LNEXPER+MALE:EDUC+MALE:LNEXPER,indwages)

model.matrix(~MALE*EDUC+MALE*LNEXPER,indwages)

We will use the second method which is more compact, directly within the lm formula.
> indwages3.12 <- lm(log(indwages$WAGE) ~ MALE * EDUC +
MALE * LNEXPER, data = indwages)
> summary(indwages3.12)
Call:
lm(formula = log(indwages$WAGE) ~ MALE * EDUC + MALE * LNEXPER,
data = indwages)

Interpreting and comparing Linear Regression Models

Residuals:
Min
1Q
-1.63955 -0.15328

Median
0.01225

3Q
0.16647

91

Max
1.11698

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
1.21584
0.07768 15.652 < 2e-16 ***
MALE
0.15375
0.09522
1.615 0.106595
EDUC2
0.22411
0.06758
3.316 0.000935 ***
EDUC3
0.43319
0.06323
6.851 1.08e-11 ***
EDUC4
0.60191
0.06280
9.585 < 2e-16 ***
EDUC5
0.75491
0.06467 11.673 < 2e-16 ***
LNEXPER
0.20744
0.01655 12.535 < 2e-16 ***
MALE:EDUC2
-0.09651
0.07770 -1.242 0.214381
MALE:EDUC3
-0.16677
0.07340 -2.272 0.023215 *
MALE:EDUC4
-0.17236
0.07440 -2.317 0.020663 *
MALE:EDUC5
-0.14616
0.07551 -1.935 0.053123 .
MALE:LNEXPER 0.04063
0.02149
1.891 0.058875 .
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.2811 on 1460 degrees of freedom
Multiple R-squared: 0.4032,
Adjusted R-squared: 0.3988
F-statistic: 89.69 on 11 and 1460 DF, p-value: < 2.2e-16
We can perform an ANOVA analysis to evaluate if the joint effect of the interactions
MALE:EDUC and MALE:LNEXPER is significant.
> anova(indwages3.11, indwages3.12)
Analysis of Variance Table
Model 1: log(indwages$WAGE) ~ MALE + EDUC +
Model 2: log(indwages$WAGE) ~ MALE * EDUC +
Res.Df
RSS Df Sum of Sq
F Pr(>F)
1
1465 116.47
2
1460 115.37 5
1.0957 2.7732 0.01683
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*"

LNEXPER
MALE * LNEXPER

*
0.05 "." 0.1 " " 1

A Model without the interaction between male and education


As before we can define the proper model matrix which will be used in the linear
model call.
> indwages3.13 <- lm(log(indwages$WAGE) ~ MALE + EDUC *
LNEXPER, data = indwages)
> summary(indwages3.13)
Call:
lm(formula = log(indwages$WAGE) ~ MALE + EDUC * LNEXPER, data = indwages)

92

Interpreting and comparing Linear Regression Models

Residuals:
Min
1Q
-1.63623 -0.15046

Median
0.00831

3Q
0.16713

Max
1.12415

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
1.48891
0.21203
7.022 3.34e-12 ***
MALE
0.11597
0.01548
7.493 1.16e-13 ***
EDUC2
0.06727
0.22628
0.297
0.7663
EDUC3
0.13525
0.21889
0.618
0.5367
EDUC4
0.20495
0.21946
0.934
0.3505
EDUC5
0.34130
0.21808
1.565
0.1178
LNEXPER
0.16312
0.06539
2.494
0.0127 *
EDUC2:LNEXPER 0.01933
0.07049
0.274
0.7839
EDUC3:LNEXPER 0.04988
0.06821
0.731
0.4647
EDUC4:LNEXPER 0.08784
0.06877
1.277
0.2017
EDUC5:LNEXPER 0.09996
0.06822
1.465
0.1430
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.2815 on 1461 degrees of freedom
Multiple R-squared: 0.4012,
Adjusted R-squared: 0.3971
F-statistic: 97.9 on 10 and 1461 DF, p-value: < 2.2e-16

4
Heteroscedasticity and
Autocorrelation
4.1

Explaining Labour Demand (Section 4.5)

We import data from the file labour2.wf1, which is a work file of EViews.
We have first to invoke the package hexView and next the command readEViews.
The function unzip extracts a file from a compressed archive.
> library(hexView)
> labour <- readEViews(unzip("ch04.zip", "Chapter 4/labour2.wf1"))
Skipping boilerplate variable
Skipping boilerplate variable
Recall that by using the functions head(labour), tail(labour) and summary(labour) it is possible to explore the beginning, the final section and to obtain
summary statistics for all the variables included in the data-frame.

4.1.1

Linear Model

In Verbeek, see Table 4.1, it is first proposed the estimation of the following linear
model:
LABOR = 1 + 2 WAGE + 3 OUTPUT + 4 CAPITAL + ERROR
This can be made by using the funtion lm, see Appendix A.5.
> labour4.1 <- lm(LABOR ~ WAGE + OUTPUT + CAPITAL,
data = labour)
> summary(labour4.1)
Call:
lm(formula = LABOR ~ WAGE + OUTPUT + CAPITAL, data = labour)
Residuals:
Min
1Q
-1267.04
-54.11

Median
-14.02

3Q
37.20

Max
1560.48

94

Heteroscedasticity and Autocorrelation

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 287.7186
19.6418
14.65
<2e-16 ***
WAGE
-6.7419
0.5014 -13.45
<2e-16 ***
OUTPUT
15.4005
0.3556
43.30
<2e-16 ***
CAPITAL
-4.5905
0.2690 -17.07
<2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 156.3 on 565 degrees of freedom
Multiple R-squared: 0.9352,
Adjusted R-squared: 0.9348
F-statistic: 2716 on 3 and 565 DF, p-value: < 2.2e-16

4.1.2

Breusch-Pagan test - construction

The Breusch-Pagan test (Table 4.2) can be used to check for the presence of
heteroscedasticity. To perform the test we have first to regress the squared residuals
of the preceding regression on the predictors present in the model.
RES2 = 1 + 2 WAGE + 3 OUTPUT + 4 CAPITAL + ERROR
The residuals can be extracted from the object labour4.1 by means of the instruction
labour4.1$res. Observe that to define the squared residuals in the first member of
the model formula, we do not have to make recourse to the function as.is I(), which
instead must necessarily be invoked when a squared variable is included as regressor,
see Appendix A.4.
> labour4.2 <- lm(labour4.1$res^2 ~ WAGE + OUTPUT +
CAPITAL, data = labour)
> summary(labour4.2)
Call:
lm(formula = labour4.1$res^2 ~ WAGE + OUTPUT + CAPITAL, data = labour)
Residuals:
Min
1Q
-500023 -12448

Median
2722

3Q
Max
13354 1193685

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -22719.5
11838.9 -1.919
0.0555 .
WAGE
228.9
302.2
0.757
0.4492
OUTPUT
5362.2
214.4 25.016
<2e-16 ***
CAPITAL
-3543.5
162.1 -21.858
<2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1

Heteroscedasticity and Autocorrelation

95

Residual standard error: 94180 on 565 degrees of freedom


Multiple R-squared: 0.5818,
Adjusted R-squared: 0.5796
F-statistic:
262 on 3 and 565 DF, p-value: < 2.2e-16
We can perform the Breusch-Pagan test by multiplying the R2 of the preceding
regression, available by using the code summary(labour4.2)$r.squared, by N = 569,
the number of observations, which may be obtained with length(labour4.2)$res.
Under the null hypothesis of homoscedasticity the distribution of the statistic
is a 2 with 3 degrees of freedom. The number of d.o.f. is equal to the one
in the numerator of the F statistic, which appears in the regression output; it
may be automatically obtained by extracting the second element from the object
summary(labour4.2)$fstatistic.
> bpstat <- length(labour4.2$res) * summary(labour4.2)$r.squared
> df <- summary(labour4.2)$fstatistic[2]
> paste("Breusch-Pagan stat = ", round(bpstat, 4),
" df = ", df, " p-value = ", 1 - pchisq(bpstat,
df), collapse = "")
[1] "Breusch-Pagan stat = 331.0653
df = 3
p-value = 0"

4.1.3

Breusch-Pagan test - direct function

We can also obtain the Breusch-Pagan result, without performing the preceding
regression, by directly invoking the function bptest available in the package lmtest.
> library(lmtest)
> bptest(labour4.1)
studentized Breusch-Pagan test
data: labour4.1
BP = 331.0653, df = 3, p-value < 2.2e-16

4.1.4

Loglinear model

Verbeeks Table 4.3 reports the OLS estimation results for the loglinear model:
log(LABOR) = 1 + 2 log(WAGE) + 3 log(OUTPUT) + 4 log(CAPITAL) + ERROR (4.1)
which may be obtained as:
> labour4.3 <- lm(log(LABOR) ~ log(WAGE) + log(OUTPUT) +
log(CAPITAL), data = labour)
> summary(labour4.3)

96

Heteroscedasticity and Autocorrelation

Call:
lm(formula = log(LABOR) ~ log(WAGE) + log(OUTPUT) + log(CAPITAL),
data = labour)
Residuals:
Min
1Q
-3.9744 -0.1641

Median
0.1079

3Q
0.2595

Max
1.9466

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
6.177290
0.246211 25.089
<2e-16 ***
log(WAGE)
-0.927764
0.071405 -12.993
<2e-16 ***
log(OUTPUT)
0.990047
0.026410 37.487
<2e-16 ***
log(CAPITAL) -0.003697
0.018770 -0.197
0.844
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.4653 on 565 degrees of freedom
Multiple R-squared: 0.843,
Adjusted R-squared: 0.8421
F-statistic: 1011 on 3 and 565 DF, p-value: < 2.2e-16
The corresponding Breusch-Pagan statistic is:
> bptest(labour4.3)
studentized Breusch-Pagan test
data: labour4.3
BP = 7.7269, df = 3, p-value = 0.05201

4.1.5

White Heteroscedasticity test

To perform the White Heteroscedasticity test and obtain the results in Verbeeks
Table 4.4 we have to consider the estimation of the following linear model:
RES2 = 1 + 2 log(WAGE) + 3 log(OUTPUT) + 4 log(CAPITAL) +
+ 22 log(WAGE) + 33 log(OUTPUT) + 44 log(CAPITAL) +
+ 23 log(WAGE) log(OUTPUT) + 24 log(WAGE log(CAPITAL) +
+ 34 log(OUTPUT) log(CAPITAL) + ERROR

(4.2)

To write this formula, have first a check to the variables that result as regressors by
applying the following regression statement1 , see A.4 for the instruction lm.
1 The function coeftest in the package lmtest performs the t tests on the estimated coefficients.
In the present situation it is used to have a look at which variables are present in the model.

Heteroscedasticity and Autocorrelation

97

> temp <- lm(labour4.3$res^2 ~ (log(WAGE) + log(OUTPUT) +


log(CAPITAL))^2, data = labour)
> library(lmtest)
> coeftest(temp)
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
0.416498
0.816906 0.5098 0.61036
log(WAGE)
-0.039661
0.230421 -0.1721 0.86340
log(OUTPUT)
-0.916128
0.476087 -1.9243 0.05482
log(CAPITAL)
0.777534
0.368707 2.1088 0.03540
log(WAGE):log(OUTPUT)
0.247144
0.130601 1.8924 0.05896
log(WAGE):log(CAPITAL)
-0.239506
0.101826 -2.3521 0.01901
log(OUTPUT):log(CAPITAL) 0.019062
0.013475 1.4146 0.15774
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1

.
*
.
*

We observe that the interaction terms appear in the model specification (4.2), but
the squared predictors are not included; so we have to adjust the regression formula
in the following way:
> labour4.4 <- lm(labour4.3$res^2 ~ (log(WAGE) + log(OUTPUT) +
log(CAPITAL))^2 + I(log(WAGE)^2) + I(log(OUTPUT)^2) +
I(log(CAPITAL)^2), data = labour)
> summary(labour4.4)
Call:
lm(formula = labour4.3$res^2~(log(WAGE)+log(OUTPUT)+log(CAPITAL))^2+
I(log(WAGE)^2)+I(log(OUTPUT)^2)+I(log(CAPITAL)^2),data=labour)
Residuals:
Min
1Q Median
-2.2664 -0.1650 -0.0724

3Q
Max
0.0212 15.2247

Coefficients:
(Intercept)
log(WAGE)
log(OUTPUT)
log(CAPITAL)
I(log(WAGE)^2)
I(log(OUTPUT)^2)
I(log(CAPITAL)^2)
log(WAGE):log(OUTPUT)
log(WAGE):log(CAPITAL)
log(OUTPUT):log(CAPITAL)

Estimate Std. Error t value Pr(>|t|)


2.54460
3.00278
0.847 0.397126
-1.29900
1.75274 -0.741 0.458929
-0.90372
0.55985 -1.614 0.107045
1.14205
0.37582
3.039 0.002486 **
0.19274
0.25895
0.744 0.457003
0.13820
0.03565
3.877 0.000118 ***
0.08954
0.01399
6.401 3.27e-10 ***
0.13804
0.16256
0.849 0.396168
-0.25178
0.10497 -2.399 0.016782 *
-0.19160
0.03687 -5.197 2.84e-07 ***

98

Heteroscedasticity and Autocorrelation

--Signif. codes:

0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1

Residual standard error: 0.8507 on 559 degrees of freedom


Multiple R-squared: 0.1029,
Adjusted R-squared: 0.08845
F-statistic: 7.124 on 9 and 559 DF, p-value: 8.334e-10
The White statistic is defined by N R2 , that is N times the Multiple R-squared, and
the corresponding p-value may be obtained by considering the probability that a 2
random variable with 9 degrees of freedom (number of predictors in the preceding
regression) exceeds the White statistic.
> whitestat <- length(labour4.2$res) * summary(labour4.4)$r.squared
> df <- summary(labour4.4)$fstatistic[2]
> paste("White stat = ", round(whitestat, 4), " df = ",
df, " p-value = ", round(1 - pchisq(whitestat,
df), 4), collapse = "")
[1] "White stat = 58.5443
df = 9
p-value = 0"
The hypothesis of homoscedasticity is rejected.
The result can also be obtained by means of the function bptest; we have to specify
the auxiliary regression statement in the varformula argument, and the data.frame
in the data argument.
> length(labour4.2$res) * summary(labour4.4)$r.squared
[1] 58.54435
> bptest(labour4.3, varformula = ~(log(WAGE) + log(OUTPUT) +
log(CAPITAL))^2 + I(log(WAGE)^2) + I(log(OUTPUT)^2) +
I(log(CAPITAL)^2), data = labour)
studentized Breusch-Pagan test
data: labour4.3
BP = 58.5443, df = 9, p-value = 2.555e-09

4.1.6

Heteroscedasticity consistent covariance matrix

We can obtain the Heteroscedasticity consistent covariance matrix2 of the parameter


estimates, by invoking the command vcovHC(x, type) in the package sandwich. The
first argument is a fitted model object, the second one specifies the estimation type
for the covariance matrix, with "HC1" the corrected White estimate is obtained.
> library(sandwich)
> round(vcovHC(labour4.3, type = "HC1"), 4)
2 The command vcovHC provides 5 estimators of the covariance matrix, denoted in the literature
as HC0 to HC4, see next Section 4.1.8 and Zeileis (2004).

Heteroscedasticity and Autocorrelation

(Intercept)
log(WAGE)
log(OUTPUT)
log(CAPITAL)

99

(Intercept) log(WAGE) log(OUTPUT) log(CAPITAL)


0.0864
-0.0250
0.0004
0.0025
-0.0250
0.0075
-0.0008
-0.0004
0.0004
-0.0008
0.0022
-0.0015
0.0025
-0.0004
-0.0015
0.0014

We may then perform the heteroscedasticity consistent parameter inference, by having


recourse to the function coeftest, whose first argument is the lm object; by using
the second argument it is possible to specify the type of heteroscedasticity consistent
matrix to be considered in computing the parameter standard errors.
> library(lmtest)
> labour4.5 <- coeftest(labour4.3, vcov = vcovHC(labour4.3,
type = "HC1"))
> labour4.5
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
6.1772896 0.2938869 21.0193
<2e-16
log(WAGE)
-0.9277642 0.0866604 -10.7057
<2e-16
log(OUTPUT)
0.9900474 0.0467902 21.1593
<2e-16
log(CAPITAL) -0.0036975 0.0378770 -0.0976
0.9223
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "."

***
***
***

0.1 " " 1

2 assume
The standard error of the regression (residual standard error), R2 and R
the same values as in the preceding regression, see Section 4.1.5. To obtain the value
of the F test adjusted for the presence of heteroscedasticity it is possible to use the
function waldtest, available in the package lmtest by specifying as first argument
the lm object labour4.3 resulting from the preceding call to linear model, and as
second argument the object containing the estimate of the White covariance matrix.
> waldtest(labour4.3, vcov = vcovHC(labour4.3, type = "HC1"))
Wald test
Model 1: log(LABOR) ~ log(WAGE) + log(OUTPUT) + log(CAPITAL)
Model 2: log(LABOR) ~ 1
Res.Df Df
F
Pr(>F)
1
565
2
568 -3 544.73 < 2.2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1

4.1.7

Estimated Generalized Least Squares

When some assumptions on the form of heteroscedasticity can be made, it is possible


to have recourse to the Estimated Generalized Least Squares (EGLS). In Verbeeks

100

Heteroscedasticity and Autocorrelation

(4.36) it is considered the multiplicative form (with zi = xi ). With regard to the


present example and to the multiplicative heteroscedasticity hypothesis, we have to
estimate the parameters in the following model:
log(e2i ) = 0 + 1 log(WAGE) + 2 log(OUTPUT) + 3 log(CAPITAL) + ERROR

(4.3)

where ei are the residuals from model (4.1).


> labour4.6 <- lm(log(labour4.3$res^2) ~ log(WAGE) +
log(OUTPUT) + log(CAPITAL), data = labour)
> summary(labour4.6)
Call:
lm(formula = log(labour4.3$res^2) ~ log(WAGE) + log(OUTPUT) +
log(CAPITAL), data = labour)
Residuals:
Min
1Q
-11.7445 -0.7645

Median
0.3281

3Q
1.1430

Max
6.7871

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.25382
1.18545 -2.745 0.006247 **
log(WAGE)
-0.06105
0.34380 -0.178 0.859112
log(OUTPUT)
0.26695
0.12716
2.099 0.036231 *
log(CAPITAL) -0.33069
0.09037 -3.659 0.000277 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 2.24 on 565 degrees of freedom
Multiple R-squared: 0.02449,
Adjusted R-squared:
F-statistic: 4.728 on 3 and 565 DF, p-value: 0.002876

0.01931

Verbeek checks if the preceding form of heteroscedasticity is not too restrictive by


estimating a version of the model including also the three squared terms:
log(e2i ) = 0 + 1 log(WAGE) + 2 log(OUTPUT) + 3 log(CAPITAL)
+ 4 (log(WAGE))2 + 5 (log OUTPUT))2 + 6 (log(CAPITAL))2 + ERROR

> labour4.6sq <- update(labour4.6, . ~ . + I(log(WAGE)^2) +


I(log(OUTPUT)^2) + I(log(CAPITAL)^2))
> summary(labour4.6sq)
Call:
lm(formula = log(labour4.3$res^2) ~ log(WAGE) + log(OUTPUT) +
log(CAPITAL) + I(log(WAGE)^2) + I(log(OUTPUT)^2) +
I(log(CAPITAL)^2), data = labour)

Heteroscedasticity and Autocorrelation

Residuals:
Min
1Q
-11.6861 -0.8002

Median
0.3633

3Q
1.1849

101

Max
6.6993

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
5.819683
6.530195
0.891 0.373205
log(WAGE)
-4.942304
3.561094 -1.388 0.165729
log(OUTPUT)
0.187647
0.188814
0.994 0.320738
log(CAPITAL)
-0.331626
0.090318 -3.672 0.000264 ***
I(log(WAGE)^2)
0.653428
0.486332
1.344 0.179625
I(log(OUTPUT)^2)
0.001372
0.047232
0.029 0.976834
I(log(CAPITAL)^2) 0.030694
0.026799
1.145 0.252569
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 2.235 on 562 degrees of freedom
Multiple R-squared: 0.03404,
Adjusted R-squared:
F-statistic: 3.301 on 6 and 562 DF, p-value: 0.003355

0.02372

An analysis of variance confirms that the initial model for heteroscedasticity, see (4.3),
cannot be rejected.
> anova(labour4.6, labour4.6sq)
Analysis of Variance Table
Model 1: log(labour4.3$res^2) ~ log(WAGE) + log(OUTPUT) +
log(CAPITAL)
Model 2: log(labour4.3$res^2) ~ log(WAGE) + log(OUTPUT) +
log(CAPITAL) + I(log(WAGE)^2) + I(log(OUTPUT)^2) +
I(log(CAPITAL)^2)
Res.Df
RSS Df Sum of Sq
F Pr(>F)
1
565 2836.1
2
562 2808.3 3
27.756 1.8515 0.1367
To obtain EGLS we have to transform the variables, by considering also the constant,
and perform the initial regression on the transformed variables. This can be made
directly in the linear model formula.
> hhat <- exp(fitted(labour4.6))
Observe that hhat contains a possible estimate for the variances of the error related
to each statistical unit, V ar(i |xi ) see Verbeeks relationship (4.36); that is of the
elements on the diagonal of 2 , the covariance matrix of the errors. appears in
the generalized least squares (GLS) estimator of (), see Verbeeks relationship (4.9):
= (X 0 1 X)1 X 0 1 y.

102

Heteroscedasticity and Autocorrelation

In the present case is assumed a diagonal matrix, that is the errors are assumed to
be uncorrelated.
> labour4.7 <- lm(log(LABOR)/hhat^0.5 ~ -1 + I(1/hhat^0.5) +
I(log(WAGE)/hhat^0.5) + I(log(OUTPUT)/hhat^0.5) +
I(log(CAPITAL)/hhat^0.5), data = labour)
> summary(labour4.7)
Call:
lm(formula = log(LABOR)/hhat^0.5 ~ -1 + I(1/hhat^0.5) +
I(log(WAGE)/hhat^0.5) + I(log(OUTPUT)/hhat^0.5) +
I(log(CAPITAL)/hhat^0.5), data = labour)
Residuals:
Min
1Q
-29.1086 -0.7875

Median
0.4852

3Q
1.2394

Max
10.4219

Coefficients:
Estimate Std. Error t value Pr(>|t|)
I(1/hhat^0.5)
5.89536
0.24764 23.806 < 2e-16 ***
I(log(WAGE)/hhat^0.5)
-0.85558
0.07188 -11.903 < 2e-16 ***
I(log(OUTPUT)/hhat^0.5)
1.03461
0.02731 37.890 < 2e-16 ***
I(log(CAPITAL)/hhat^0.5) -0.05686
0.02158 -2.636 0.00863 **
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 2.509 on 565 degrees of freedom
Multiple R-squared: 0.9903,
Adjusted R-squared: 0.9902
F-statistic: 1.44e+04 on 4 and 565 DF, p-value: < 2.2e-16
which reproduces Verbeeks Table 4.7. Note that it is also possible to specify, in a
simpler way, the latter model, by including the weights option in the function lm,
thus performing weighted least squares.
> labour4.7 <- lm(log(LABOR) ~ log(WAGE) + log(OUTPUT) +
log(CAPITAL), data = labour, weights = hhat^-1)
> summary(labour4.7)
Call:
lm(formula = log(LABOR) ~ log(WAGE) + log(OUTPUT) + log(CAPITAL),
data = labour, weights = hhat^-1)
Weighted Residuals:
Min
1Q
Median
-29.1086 -0.7875
0.4852

3Q
1.2394

Max
10.4219

Coefficients:
Estimate Std. Error t value Pr(>|t|)

Heteroscedasticity and Autocorrelation

103

(Intercept)
5.89536
0.24764 23.806 < 2e-16 ***
log(WAGE)
-0.85558
0.07188 -11.903 < 2e-16 ***
log(OUTPUT)
1.03461
0.02731 37.890 < 2e-16 ***
log(CAPITAL) -0.05686
0.02158 -2.636 0.00863 **
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 2.509 on 565 degrees of freedom
Multiple R-squared: 0.8509,
Adjusted R-squared: 0.8501
F-statistic: 1074 on 3 and 565 DF, p-value: < 2.2e-16
The goodness of fit statistics reported in the last output differ from those in the
preceding one.
To obtain, as suggested by Verbeek, R2 = corr2 (yi , yi ) use the code
> cor(log(labour$LABOR), fitted(labour4.7))^2
[1] 0.8404098

4.1.8

Types of Heteroscedasticity consistent covariance matrices

The command vcovHC(x, type) provides 5 estimators3 for the Heteroscedasticity


consistent covariance matrix to make consistent inference on the parameters of a
linear model summarized in an lm object x. The types are denoted in the literature
as HC0 to HC4, see Zeileis (2004), and correspond to substitute to e2i in Verbeeks
relationship (4.30):

e2i with HC or HC0,

n
2
nk ei

e2i
1hi

with HC1,

with HC2,

e2i
(1hi )2
e2i
(1hi )i

with HC3,
with HC4,

where hi is the ith element on the main diagonal of the so-called Hat matrix
H = X(X 0 X)1 X 0 . The estimator HC0 was proposed by White (1980), HC1-HC3 by
MacKinnon and White (1985) to improve the performance in small samples. Long and
Ervin (2000) suggested HC3. The estimator HC4 by Cribari-Neto (2004) should improve
the small sample performance, especially in the presence of influential observations.
An observation is defined influential when its presence considerably alters the
parameter estimates.
Let consider the estimation of the linear model
y = 1 + 2 x +
in presence of an artificial dataset.
3 By setting type="const" the usual homoscedastic estimator for the covariance matrix of the
parameter estimates is selected.

104

Heteroscedasticity and Autocorrelation

> set.seed(123456)
> x <- runif(49, 10, 20)
> y <- 10 + 3 * x + rnorm(49)
The data have been simulated by considering for the variable x a sample, with
dimension 49, from a uniform distribution with values in the interval 10, 20; by setting
1 = 10, 2 = 3 and by considering for the errors realizations from a standardized
Normal random variable.
The OLS estimator may be obtained as:
> lm49 <- lm(y ~ x)
> summary(lm49)
Call:
lm(formula = y ~ x)
Residuals:
Min
1Q
Median
-2.69955 -0.70519 -0.04837

3Q
0.63405

Max
2.19068

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 10.06620
0.83438
12.06 5.36e-16 ***
x
2.98002
0.05264
56.61 < 2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 1.1 on 47 degrees of freedom
Multiple R-squared: 0.9855,
Adjusted R-squared:
F-statistic: 3205 on 1 and 47 DF, p-value: < 2.2e-16

0.9852

The parameter estimates are quite close to their theoretical values.


Let now assume that a new observation is present in our data with values 26 for x
and 60 for y. See the scatterplot in Fig. 4.1: the new observation is the point isolated
on the right side.
> x <- c(x, 26)
> y <- c(y, 60)
The parameter of the linear model can be re-estimated in presence of the new
observation and the p-values, computed without taking into account the presence
of any type of heteroscedasticity, give evidence that the estimates are significantly
different from 0.
> lm50 = lm(y ~ x)
> summary(lm50)
Call:
lm(formula = y ~ x)

Heteroscedasticity and Autocorrelation

Residuals:
Min
1Q
-21.6949 -1.4477

Median
0.9419

3Q
1.8599

105

Max
4.2664

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 17.6932
2.5530
6.93 9.4e-09 ***
x
2.4616
0.1584
15.54 < 2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 3.693 on 48 degrees of freedom
Multiple R-squared: 0.8342,
Adjusted R-squared:
F-statistic: 241.5 on 1 and 48 DF, p-value: < 2.2e-16

0.8307

We cannot definitively trust in previous results, due to the presence of the anomalous
case we added to the initial data: this observation has an influence on the parameter
estimates: both on the intercept and on the slope of the linear model.
Figure 4.1 shows the data and the regression lines we have estimated, the dotted one
is referred to the 50 data (with the anomalous presence).
> plot(y ~ x)
> abline(lm49)
> abline(lm50, lty = 2)
The presence of influential data may be detected by performing a Heteroscedasticity
consistent covariance inference by setting the option type="HC4". We have:
>
>
>
t

library(sandwich)
library(lmtest)
coeftest(lm50, vcov = vcovHC(lm50, type = "HC4"))
test of coefficients:

Estimate Std. Error t value Pr(>|t|)


(Intercept) 17.6932
9.7824 1.8087 0.0767688 .
x
2.4616
0.6639 3.7078 0.0005416 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
In this case the t value for the intercept has changed, being the intercept no more
significantly different from 0. It is possible to analize the data set to detect which data
are influential: we can check the values on the diagonal of the so-called hat matrix:
X(X 0 X)1 X 0 .
The matrix X of the regressors may be extracted by applying the function
model.matrix to the lm object lm50 and the diagonal of the hat matrix may be
obtained by using the function hat.

106

70

Heteroscedasticity and Autocorrelation

65

60

55

50

45

15

20

25

Figure 4.1 Scatter plot diagram of the data and regression lines without considering the
anomalous case (plain line) and considering the anomalous case (dotted line)

> X = model.matrix(lm50)
> round(hat(X), 3)
[1] 0.029 0.026 0.026 0.030
[11] 0.029 0.020 0.040 0.037
[21] 0.052 0.065 0.056 0.050
[31] 0.021 0.036 0.042 0.059
[41] 0.043 0.025 0.039 0.035

0.029
0.052
0.022
0.031
0.021

0.046
0.039
0.023
0.066
0.024

0.020
0.037
0.037
0.046
0.020

0.063
0.047
0.037
0.036
0.025

0.051
0.031
0.034
0.023
0.022

0.051
0.027
0.051
0.020
0.212

The values are plotted in Fig 4.2.


> plot(hat(X))
The last value gives evidence that the new observation is influential with respect to
other data. Observe that the problem, due to the presence of the anomalous datum,
cannot be detected so clearly by using the other types of heteroscedasticity treatment
available with the instruction vcovHC.
> coeftest(lm50, vcov = vcovHC(lm50, type = "HC0"))

Heteroscedasticity and Autocorrelation

0.10

hat(X)

0.15

0.20

0.05

20

10

30

40

50

Index

Figure 4.2

Influential observations detection for the artificial data example

t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 17.69316
6.15375 2.8752 0.006003 **
x
2.46161
0.41681 5.9059 3.49e-07 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
> coeftest(lm50, vcov = vcovHC(lm50, type = "HC1"))
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 17.6932
6.2806 2.8171 0.007015 **
x
2.4616
0.4254 5.7865 5.303e-07 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
> coeftest(lm50, vcov = vcovHC(lm50, type = "HC2"))

107

108

Heteroscedasticity and Autocorrelation

t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 17.69316
6.90622 2.5619
0.0136 *
x
2.46161
0.46802 5.2596 3.312e-06 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
> coeftest(lm50, vcov = vcovHC(lm50, type = "HC3"))
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 17.69316
7.75586 2.2813
0.02701 *
x
2.46161
0.52584 4.6813 2.365e-05 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Also with reference to the Labour Demand Example it is possible to detect the
presence of anomalous data. See Fig. 4.3.
> X = model.matrix(labour4.3)
> plot(hat(X))

4.2

The Demand for Ice Cream (Section 4.8)

Data can be read by means of the function read.table, having exctracted the
file icecream.dat from the compressed archive ch04.zip. As usual it is possible
to check for the consistency of the data with the information contained in the file
icecream.txt available in the zip file by means of the functions summary, head and
tail.
> icecream <- read.table(unzip("ch04.zip", "Chapter 4/icecream.dat"),
header = TRUE)
The following variables, with four-weekly observations from March 18, 1951 to July
11, 1953, (30 observations) are present:

cons: consumption of ice cream per head (in pints)4 ;

income: average family income per week (in US Dollars);

price: price of ice cream (per pint);

temp: average temperature (in Fahrenheit);

time: index from 1 to 30.

Figure 4.4 shows the evolution of the time series Consumption, Temperature/100
and Price (cfr. Verbeeks Fig. 4.3). It may be obtained by first transforming the
4 Expenditures

on ice cream, not actual consumption are reported.

Heteroscedasticity and Autocorrelation

109

0.06

0.05

0.03

hat(X)

0.04

0.00

0.01

0.02

100

200

300

400

500

Index

Figure 4.3

Influential observations detection for Verbeek firm example

data.frame icecream in the multiple time series icecream1. One has then to
rescale the temperature by 0.01. Remember that a time series object is not a
data.frame, so the values of the temperature cannot be extracted with the command
icecream1$temp: the code icecream1[,4] will return the temperature values since
the variable temp is the fourth time series in icecream1 (the variable is stored in the
fourth column of the object). A time series object can thus be treated like a matrix.
> icecream1 <- ts(icecream)
> icecream1[, 4] <- icecream1[, 4]/100
To assign proper names to the time series in icecream1 have first a check to the
structure of the object by means of the function str.
> str(icecream1)
mts [1:30,1:5] 0.386 0.374 0.393 0.425 0.406 0.344 0.327 0.288 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:5] "cons" "income" "price" "temp" ...

110

Heteroscedasticity and Autocorrelation

- attr(*, "tsp")= num [1:3] 1 30 1


- attr(*, "class")= chr [1:3] "mts" "ts" "matrix"
The names of the time series are stored as the second element of the attribute
dimnames of the mts object icecream1. Since dimnames is a list it is possible to
make use of the following code to change the names:
> dimnames(icecream1)[[2]] <- c("Consumption", "Income",
"Price", "Temperature", "time")
The graph is finally produced by using the function xyplot of the package lattice.
The first argument is the multiple time series object containing the series to be plotted;
the option superpose allows one to plot the time series in an unique graph (TRUE) or
in separate subgraphs (FALSE which is the default); the graphical parameters type and
pch define respectively the type of line and the symbols to be used: in the present case
with type="o" a solid line is produced for all the series and the lines are decorated
with the symbols specified in the list associated to pch, see the help on ?par and
?lattice::xyplot for more details.
With the auto.key=list(points=TRUE,lines=FALSE) option it is possible to insert
a legend in the graph associating a symbol to the name of each series5 .
> library(lattice)
> xyplot(icecream1[, c(1, 3, 4)], superpose = TRUE,
type = "o", pch = list(21, 24, 22), auto.key=list(points=TRUE,
lines = FALSE))
In Verbeeks Table 4.9 the OLS estimation results for the following model are reported
const = 1 + 2 pricet + 3 incomet + 4 tempt + t

(4.4)

Observe that no lagged variables are present among the regressors.


The parameter estimates may be obtained by using the function lm.
> icecream4.9 <- lm(cons ~ price + income + temp, data = icecream)
> summary(icecream4.9)
Call:
lm(formula = cons ~ price + income + temp, data = icecream)
Residuals:
Min
1Q
-0.065302 -0.011873

Median
0.002737

3Q
0.015953

Max
0.078986

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.1973151 0.2702162
0.730 0.47179
price
-1.0444140 0.8343573 -1.252 0.22180
5 With the option auto.key=list(points=FALSE,lines=TRUE), which defines the default label and
can thus be omitted, each series is identified by a coloured segment.

Heteroscedasticity and Autocorrelation

Consumption
Price
Temperature

111

0.6

0.7

0.5

0.4

0.3

10

15

20

25

30

Time

Figure 4.4

Evolution of the time series Consumption, Temperature/100 and Price

income
0.0033078 0.0011714
2.824 0.00899 **
temp
0.0034584 0.0004455
7.762 3.1e-08 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.03683 on 26 degrees of freedom
Multiple R-squared: 0.719,
Adjusted R-squared: 0.6866
F-statistic: 22.17 on 3 and 26 DF, p-value: 2.451e-07

4.2.1

The Durbin-Watson statistic - construction

The Durbin-Watson statistic to test for the presence of the first-order autocorrelation
in the residual series may be obtained by implementing Verbeeks relationship (4.51):
PT
(et et1 )2
dW = i=2PT
.
2
i=1 et

112

Heteroscedasticity and Autocorrelation

> dwstat <- sum(diff(icecream4.9$res)^2)/sum(icecream4.9$res^2)


> dwstat
[1] 1.02117
Note that the function diff applied to a vector x = [x1 , x2 , . . . , xn ] of length n results
in the vector [x2 x1 , . . . , xn xn1 ] of length n 1.

4.2.2

The Durbin-Watson statistic - direct function

The Durbin-Watson statistic can be also directly obtained by making use of the
function dwtest available in the package lmtest which produces also the significance
level of the test. In the present case the hypothesis of no first-order autocorrelation
of the errors has to be rejected.
> library(lmtest)
> dwtest(icecream4.9, alternative = "greater")
Durbin-Watson test
data: icecream4.9
DW = 1.0212, p-value = 0.0003024
alternative hypothesis: true autocorrelation is greater than 0
By specifying the argument alternative in the function dwtest it is possible to
define the direction of the test; by default alternative is set to "greater", that is
the autocorrelation of first order is greater than 0.
> dwtest(icecream4.9, alternative = "two.sided")
Durbin-Watson test
data: icecream4.9
DW = 1.0212, p-value = 0.0006048
alternative hypothesis: true autocorrelation is not 0
> dwtest(icecream4.9, alternative = "less")
Durbin-Watson test
data: icecream4.9
DW = 1.0212, p-value = 0.9997
alternative hypothesis: true autocorrelation is less than 0
Figure 4.5 shows the actual and fitted values for the ice cream consumption giving
evidence of the presence of a pattern in the residual behaviour. To obtain this graph
we define first a time series object icecream1 containing the consumption fitted values
from the regression model (4.4) and the real ice cream consumption values. The fitted
values may be derived by applying the function fitted to the lm object icecream4.9.
> icecream1 <- ts(cbind("Fitted Consumption" = fitted(icecream4.9),
Consumption = icecream$cons))

0.55

Heteroscedasticity and Autocorrelation

113

0.50

0.45

0.40

0.35

Consumption

0.30

0.25

10

15

20

25

30

Time

Figure 4.5

Actual and fitted values for the ice cream consumption

We may then apply the function xyplot to the multiple time series
> xyplot(icecream1, type = list("l", "p"), pch = 19,
xlab = "Time", ylab = "Consumption", superpose = TRUE,
auto.key = FALSE)
The list("l","p") in the argument type specifies that the first time series is plotted
with a line "l", while the second one with points "p"; use pch=19 for plain bullets.
xlab and ylab specify the axis labels and auto.key=FALSE suppresses the legend.

4.2.3

Estimation of the first-order autocorrelation coefficient

The first-order autocorrelation coefficient of the residuals, corresponding to the


parameter in the linear model (without intercept)
t = t1 + vt
may be estimated by applying the function lm to the series of the residuals.
Two different options may be adopted.

114

Heteroscedasticity and Autocorrelation

A time window starting at time 2 and ending at time 30 (the length of the
residual series) can be considered: in this way the dependent variable in the
linear model is given by the series of the residuals without its first element,
while the independent one is the series of the residuals without its last element.

The alternative is to consider the complete series of the residuals as dependent


variable and the lagged series of the residuals completed by setting to 0 the only
one missing pre-sample value as the independent one.

Let resautocorr and resautocorr0 denote the lm objects obtained according to the
first and second option respectively.
> resautocorr <- lm(icecream4.9$res[-1] ~ -1 + icecream4.9$res[-30])
> resautocorr0 <- lm(icecream4.9$res ~ -1 + c(0, icecream4.9$res[-30]))
> summary(resautocorr)
Call:
lm(formula = icecream4.9$res[-1] ~ -1 + icecream4.9$res[-30])
Residuals:
Min
1Q
Median
-0.063581 -0.014006 -0.000714

3Q
0.009123

Max
0.080090

Coefficients:
Estimate Std. Error t value Pr(>|t|)
icecream4.9$res[-30]
0.4006
0.1774
2.258
0.0319 *
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.03023 on 28 degrees of freedom
Multiple R-squared: 0.1541,
Adjusted R-squared: 0.1238
F-statistic: 5.099 on 1 and 28 DF, p-value: 0.03192
> summary(resautocorr0)
Call:
lm(formula = icecream4.9$res ~ -1 + c(0, icecream4.9$res[-30]))
Residuals:
Min
1Q
Median
-0.063581 -0.013547 -0.000351

3Q
0.012530

Max
0.080090

Coefficients:
Estimate Std. Error t value Pr(>|t|)
c(0, icecream4.9$res[-30])
0.4006
0.1907
2.101
0.0444 *
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.03249 on 29 degrees of freedom

Heteroscedasticity and Autocorrelation

Multiple R-squared: 0.1321,


F-statistic: 4.415 on 1 and 29 DF,

Adjusted R-squared:
p-value: 0.04444

115

0.1022

The autocorrelation estimate is the same for the two models. In both outputs the
Multiple R-squared is produced as uncentered R-squared since an intercept term is
not present in the proposed models.
> 1
[1]
> 1
[1]

- sum(resautocorr$res^2)/sum(residuals(icecream4.9)[-1]^2)
0.1540577
- sum(resautocorr0$res^2)/sum(icecream4.9$res^2)
0.1321176

For the resautocorr0 model the centered and uncentered R-squared do coincide
since we have considered as dependent variable the complete series of residuals from
equation (4.4), which has zero mean. In Verbeek the first version of the model
(resautocorr) is considered but the centered R-squared has been computed. The
difference with the value reported here is due to the fact that the residual series
without its first element hasnt zero mean.
> (VerRsq<-1-sum(resautocorr$res^2)/sum((residuals(icecream4.9)[-1]mean(residuals(icecream4.9)[-1]))^2))
[1] 0.1491856
or
> (VerRsq<-1-sum(resautocorr$res^2)/(length(residuals(icecream4.9)[-1])1)/var(residuals(icecream4.9)[-1]))
[1] 0.1491856
The asymptotic test on

suggests to reject the hypothesis of no first-order autocorrelation


> (test<-length(icecream4.9$res)^0.5*as.numeric(coef(resautocorr)))
[1] 2.194355
> (pvalue <- 1 - pnorm(test))
[1] 0.01410495

4.2.4

The Breusch-Godfrey test to test the presence of autocorrelation


- construction

The assumption we have used in the second model formulation (resatocorr0) in


the preceding section to consider the complete residual series as dependent variable
and to set to zero any presample values of the residuals is the one usually adopted for
defining the auxiliary regression necessary to compute the Breusch-Godfrey statistic,
according to what proposed by Davidson and MacKinnon (1993).

116

Heteroscedasticity and Autocorrelation

The Breusch-Godfrey statistic (for testing the first-order correlation) is obtained by


multiplying the length of the time series by the Multiple R-squared of the following
auxiliary regression:
et = x0t + et1 + vt = 1 + 2 price + 3 income + 4 temp + et1 + vt .
where et are the residuals from the linear model yt = x0t , in our case model (4.4).
The auxiliary regression thus contains the regressors of the reference model and the
lagged residual series (terms corresponding to the residual series lagged until order p
also appear as regressors in the auxiliary regression when autocorrelations of order 1
to p are jointly tested).
The auxiliary regression takes into account also the fact that et may be correlated
with some lagged regressors, possibly also the lagged yt .
> shift.res <- c(0, residuals(icecream4.9)[-30])
> resautocorrint <- update(icecream4.9, icecream4.9$res ~
. + shift.res)
> summary(resautocorrint)
Call:
lm(formula = icecream4.9$res ~ price + income + temp + shift.res,
data = icecream)
Residuals:
Min
1Q
Median
-0.064280 -0.014474 -0.002083

3Q
0.008912

Max
0.081859

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.0615530 0.2571651
0.239
0.8128
price
-0.1476412 0.7918621 -0.186
0.8536
income
-0.0001158 0.0011085 -0.104
0.9176
temp
-0.0002033 0.0004328 -0.470
0.6426
shift.res
0.4282815 0.2112149
2.028
0.0534 .
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.03481 on 25 degrees of freedom
Multiple R-squared: 0.1412,
Adjusted R-squared: 0.003833
F-statistic: 1.028 on 4 and 25 DF, p-value: 0.4123
> length(icecream4.9$res) * summary(resautocorrint)$r.squared
[1] 4.237064
The test statistic must be compared with the quantile of a Chi-squared random
variable with p degrees of freedom. In the present example, since only the presence of
the first-order autocorrelation is tested, we have p = 1 and 21,0.95 = 3.84.

Heteroscedasticity and Autocorrelation

4.2.5

117

The Breusch-Godfrey test to test the presence of autocorrelation


- direct function

The Breusch-Godfrey test with its significance level may be directly obtained by
having recourse to the function bgtest available in the package lmtest.
> library(lmtest)
> bgtest(icecream4.9)
Breusch-Godfrey test for serial correlation of order up to 1
data: icecream4.9
LM test = 4.2371, df = 1, p-value = 0.03955
Observe that by applying the function coeftest to the object resulting from bgtest
the coefficients from the auxiliary regression including lagged residuals may be
obtained.
> coeftest(bgtest(icecream4.9))
z test of coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept)
0.06155297 0.25716506 0.2394 0.81083
price
-0.14764118 0.79186210 -0.1864 0.85209
income
-0.00011579 0.00110852 -0.1045 0.91681
temp
-0.00020333 0.00043284 -0.4698 0.63852
lag(resid)_1 0.42828155 0.21121490 2.0277 0.04259 *
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
You will observe that inference on parameter estimates is made through the Normal
distribution (z values and not t values are reported). This because OLS assumptions
are not satisfied and only a normal limiting distribution may be derived for the OLS
parameter estimators, see Johnston and Di Nardo (1997) and Mann and Wald (1943).

4.2.6

Some remarks on the procedure presented by Verbeek on page 113

Observe that there are no lagged regressors in equation (4.4) and the residuals have
been assumed to be uncorrelated with the regressors in the model, so Verbeek defines
the auxiliary regression for computing the Breusch-Godfrey statistic by including only
the intercept term and the lagged values of the residuals, see Verbeek p. 120, omitting
any other regressors present in equation (4.4).
In consequence of this particular formulation of the model, the value of the statistic
will be different from the one (usually provided by standard software), which we
reported above and which takes into account the possible presence of autocorrelation
between yt and the lagged regressor components of xt , see Johnston and Di Nardo
(1997) p. 191 (6.54).
In Verbeek the estimation of the following auxiliary regression has been considered.
t = const. + t1 + vt

118

Heteroscedasticity and Autocorrelation

We have:
> resautocorrint <- lm(icecream4.9$res ~ c(0, icecream4.9$res[-30]))
> length(icecream4.9$res) * summary(resautocorrint)$r.squared
[1] 3.992121
which is equivalent to invoke the function bgtest for a linear model where the series
et of the residuals depends only on the constant term.
> bgtest(lm(icecream4.9$res ~ 1))
Breusch-Godfrey test for serial correlation of order up to 1
data: lm(icecream4.9$res ~ 1)
LM test = 3.9921, df = 1, p-value = 0.04571
In Verbeek the statistic is actually computed by considering the Multiple R-squared
from model (4.4)
> (length(icecream4.9$res) - 1) * VerRsq
[1] 4.326382
Both formulations of the Breusch-Godfrey statistic reject the hypothesis of no firstorder autocorrelation presence, since the 0.95 quantile of a Chi-squared distribution
with 1 degree of freedom is 3.84.

4.2.7

The EGLS (iterative Cochrane-Orcutt) procedure

The estimation by means of the iterative Cochrane-Orcutt procedure may be obtained


by applying the following algorithm:
>
>
>
>
>
>
>

temp <- lm(cons ~ price + income + temp, data = icecream)


pred <- model.matrix(temp)
temp0 <- temp$coef
resid <- temp$residuals
T <- length(resid)
k <- 0
while (k == 0) {
rhoest <- lm(resid ~ -1 + c(0, resid[-T]))
rho <- as.numeric(rhoest$coef)
pred1 <- pred[-1, ] - rho * pred[-T, ]
temp1 <- lm(icecream$cons[-1] - rho * icecream$cons[-T] ~
-1 + pred1)
ctrl <- sum((temp1$coef - temp0)^2)
resid <- icecream$cons - model.matrix(temp) %*% temp1$coef
temp0 <- temp1$coef
if (ctrl < 1e-09)
k <- 1
}
> summary(temp1)

Heteroscedasticity and Autocorrelation

119

Call:
lm(formula = icecream$cons[-1] - rho * icecream$cons[-T] ~ -1 +
pred1)
Residuals:
Min
1Q
Median
-0.061510 -0.013400 -0.000524

3Q
0.013603

Max
0.082052

Coefficients:
Estimate Std. Error t value Pr(>|t|)
pred1(Intercept) 0.1571429 0.2896285
0.543
0.5922
pred1price
-0.8923919 0.8108501 -1.101
0.2816
pred1income
0.0032028 0.0015460
2.072
0.0488 *
pred1temp
0.0035584 0.0005547
6.415 1.02e-06 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.03191 on 25 degrees of freedom
Multiple R-squared: 0.9823,
Adjusted R-squared: 0.9795
F-statistic:
347 on 4 and 25 DF, p-value: < 2.2e-16
> summary(rhoest)
Call:
lm(formula = resid ~ -1 + c(0, resid[-T]))
Residuals:
Min
1Q
-0.061510 -0.013163

Median
0.001124

3Q
0.014793

Max
0.082052

Coefficients:
Estimate Std. Error t value Pr(>|t|)
c(0, resid[-T])
0.4009
0.1923
2.085
0.046 *
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.03266 on 29 degrees of freedom
Multiple R-squared: 0.1304,
Adjusted R-squared: 0.1004
F-statistic: 4.348 on 1 and 29 DF, p-value: 0.04597

The results are reported in Verbeeks Table 4.10. We observe some differences with
Verbeeks standard errors. In this case, as remarked by Verbeek, the Durbin-Watson
statistic is not appropriate since it would be referred to the transformed model.

120

Heteroscedasticity and Autocorrelation

4.2.8

The model with the lagged temperature

Verbeeks Table 4.11 reports the estimation results for the following model, including
the lagged value of the temperature, and the corresponding Durbin-Watson statistic.
const = 1 + 2 pricet + 3 incomet + 4 tempt + 4 tempt1 + t .

(4.5)

The parameter estimates may be obtained by applying the functions lm and then the
function dwtest to the lm object resulting from the following instruction.
> icecream4.11 <- lm(cons[-1] ~ price[-1] + income[-1] + temp[-1] +
temp[-T], data = icecream)
> summary(icecream4.11)
Call:
lm(formula = cons[-1] ~ price[-1] + income[-1] + temp[-1] + temp[-T],
data = icecream)
Residuals:
Min
1Q
Median
-0.049070 -0.015391 -0.006745

3Q
0.014766

Max
0.080892

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.1894822 0.2323170
0.816 0.42274
price[-1]
-0.8383023 0.6880209 -1.218 0.23490
income[-1]
0.0028673 0.0010533
2.722 0.01189 *
temp[-1]
0.0053321 0.0006704
7.953 3.5e-08 ***
temp[-T]
-0.0022039 0.0007307 -3.016 0.00597 **
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.02987 on 24 degrees of freedom
Multiple R-squared: 0.8285,
Adjusted R-squared: 0.7999
F-statistic: 28.98 on 4 and 24 DF, p-value: 7.1e-09
We can check for the sign of a possible first order autocorrelation, see Section 4.2.3:
> signcheck <- lm(icecream4.11$res ~ -1 + c(0, icecream4.11$res[-29]))
> summary(signcheck)
Call:
lm(formula = icecream4.11$res ~ -1 + c(0, icecream4.11$res[-29]))
Residuals:
Min
1Q
Median
-0.045799 -0.014388 -0.007997

3Q
0.014110

Max
0.082036

Coefficients:
Estimate Std. Error t value Pr(>|t|)

Heteroscedasticity and Autocorrelation

c(0, icecream4.11$res[-29])

0.07432

0.22636

0.328

Residual standard error: 0.0276 on 28 degrees of freedom


Multiple R-squared: 0.003835,
Adjusted R-squared:
F-statistic: 0.1078 on 1 and 28 DF, p-value: 0.7451
The asymptotic test on

121

0.745

-0.03174

suggests to accept the hypothesis of no first-order autocorrelation.


> (test <- length(icecream4.11$res)^0.5 * as.numeric(coef(signcheck)))
[1] 0.4002376
> (pvalue <- 1 - pnorm(test))
[1] 0.3444907
So we can perform a two sided Durbin-Watson test
> dwtest(icecream4.11, alternative = "two.sided")
Durbin-Watson test
data: icecream4.11
DW = 1.5822, p-value = 0.05751
alternative hypothesis: true autocorrelation is not 0
and conclude for the absence of autocorrelation. In Verbeek it is observed that the
value of the Durbin-Watson statistic is in the inconclusive region; according to its
p-value we conclude that the statistic is on the boundary of the acceptance region.
Observe that the last model can be formulated in a easier manner by having recourse
to the operators available in the package dynlm (Dynamic Linear Regression): in
particular by means of the function L(x,k) (equivalent to the function lag(x,lag=k) available in the base system) the series x is lagged of k time units (by default k is
set to 1).
To estimate the linear model parameters we have then to invoke the function dynlm,
which is characterized by the same structure of the function lm, we have used in
the preceding Sections for the estimation of linear models. Note that the data set
is required to have the structure of a multiple time series: the function as.ts is
used here to convert the data.frame in an undated time series.
> library(dynlm)
> icecream4.11 <- dynlm(cons ~ price + income + temp +
L(temp), data = as.ts(icecream))
> summary(icecream4.11)
Time series regression with "ts" data:
Start = 2, End = 30
Call:
dynlm(formula=cons~price+income+temp+L(temp),data=as.ts(icecream))

122

Heteroscedasticity and Autocorrelation

Residuals:
Min
1Q
Median
-0.049070 -0.015391 -0.006745

3Q
0.014766

Max
0.080892

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.1894822 0.2323170
0.816 0.42274
price
-0.8383023 0.6880209 -1.218 0.23490
income
0.0028673 0.0010533
2.722 0.01189 *
temp
0.0053321 0.0006704
7.953 3.5e-08 ***
L(temp)
-0.0022039 0.0007307 -3.016 0.00597 **
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.02987 on 24 degrees of freedom
Multiple R-squared: 0.8285,
Adjusted R-squared: 0.7999
F-statistic: 28.98 on 4 and 24 DF, p-value: 7.1e-09

4.3

Risk Premia in Foreign Exchange Markets (Section 4.11)

Data can be read by means of the function read.table, having exctracted the file
forward2c.dat from the compressed archive ch04.zip. Use the functions summary,
head and tail to check for the consistency of the imported data with the information
contained in the file forward2c.txt available in the zip archive.
> riskpremia <- read.table(unzip("ch04.zip", "Chapter 4/forward2.dat"),
header = TRUE)
The data.frame contains 276 observations from January 1979 to December 2001
taken from DATASTREAM on the following variables6 .

EXUSBP: exchange rate USDollar/British Pound Sterling

EXUSEUR: exchange rate USDollar/Euro

EXEURBP: exchange rate Euro/Pound

F1USBP: 1 month forward rate USD/Pound

F1USEUR: 1 month forward rate USD/Euro

F1EURBP: 1 month forward rate Euro/Pound

F3USBP: 3 month forward rate USD/Pound

F3USEUR: 3 month forward rate USD/Euro

F3EURBP: 3 month forward rate Euro/Pound.

6 Verbeek observes that none of the variables is expressed in logs and that pre-Euro rates are
based on exchange rates against the German mark.

Heteroscedasticity and Autocorrelation

123

1.5

2.0

2.5

EXUSBP

0.6

0.8

1.0

1.2

1.4

EXUSEUR

1980

1985

1990

1995

2000

Time

Figure 4.6

Evolution of the US$/BP and US$/EUR exchange rates

A multiple time series object may be defined by using the information that data were
collected with a monthly frequence starting from 1979, January.
> riskpremia <- ts(data = riskpremia, start = c(1979,
1), frequency = 12)
Graphical representations may be obtained by using the function xyplot available in
the package lattice (cfr. ?lattice::lattice and Longhow Lam (2010) for more
information), with separate graphs or a unique graph for each of the time series, see
Figures 4.6 and 4.7.
> library(lattice)
> xyplot(riskpremia[, 1:2])
> xyplot(riskpremia[, 1:2], superpose = TRUE)
Figure 4.8 shows the evolution of the forward discounts obtained as the difference
between the logarithms of the spot rates and of the forward rates (1 month): the
computed series are first combined in the matrix rp, which preserves the multiple
time series, mts, attribute, and the column names are also specified. The matrix is
then plotted by means of the function xyplot.

124

Heteroscedasticity and Autocorrelation

0.5

1.0

1.5

2.0

2.5

EXUSBP
EXUSEUR

1980

1985

1990

1995

2000

Time

Figure 4.7

Evolution of the US$/BP and US$/EUR exchange rates

> rp <- cbind("US$/GBP" = log(riskpremia[, 1]) - log(riskpremia[,


4]), "US$/EUR" = log(riskpremia[, 2]) - log(riskpremia[,
5]))
> xyplot(rp, superpose = TRUE)

4.3.1

Tests for Risk Premia in the 1 month Market

We have to estimate the parameters in the following model, cfr. equation (4.70) in
Verbeek:
st ft1 = 1 + 2 (st1 ft1 ) + et .
log(EXUSBPt ) log(F1USBPt1 ) = 1 + 2 [log(EXUSBPt1 ) log(F1USBPt1 )] + et
(4.6)
To define lagged variables within the formula of a linear model it is possible to make
use of the operators available in the package dynlm (Dynamic Linear Regression): in
particular by means of the function L(x,k) (equivalent to the function lag(x,lag=k) available in the base system) the series x is lagged of k time units, by default k is

Heteroscedasticity and Autocorrelation

125

0.015

0.010

0.005

0.000

0.005

US$/GBP
US$/EUR

1980

1985

1990

1995

2000

Time

Figure 4.8

Forward discount, US$/BP and US$/EUR

set equal to 1.
To estimate the linear model parameters we have then to invoke the function dynlm,
which is characterized by the same structure of the function lm, we have used in the
preceding Sections for the estimation of linear models.
> library(dynlm)
> riskpremia4.12 <- dynlm(log(EXUSBP) - log(L(F1USBP)) ~
L(log(EXUSBP) - log(F1USBP)), data = riskpremia)
> summary(riskpremia4.12)
Time series regression with "ts" data:
Start = 1979(2), End = 2001(12)
Call:
dynlm(formula = log(EXUSBP) - log(L(F1USBP)) ~ L(log(EXUSBP) log(F1USBP)), data = riskpremia)

126

Heteroscedasticity and Autocorrelation

Residuals:
Min
1Q
-0.14766 -0.01909

Median
0.00073

3Q
0.02082

Max
0.12527

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
-0.005112
0.002365 -2.162 0.031514 *
L(log(EXUSBP) - log(F1USBP)) 3.212170
0.817474
3.929 0.000108 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.03154 on 273 degrees of freedom
Multiple R-squared: 0.05353,
Adjusted R-squared: 0.05006
F-statistic: 15.44 on 1 and 273 DF, p-value: 0.000108
The Breusch-Godfrey statistic may be computed with regard to the two tests for the
presence of first and up to twelfth-order autocorrelation. We can invoke the function
bgtest, we presented in Section 4.2.5.
> library(lmtest)
> bgtest(riskpremia4.12)
Breusch-Godfrey test for serial correlation of order up
to 1
data: riskpremia4.12
LM test = 0.2179, df = 1, p-value = 0.6406
> bgtest(riskpremia4.12, order = 12)
Breusch-Godfrey test for serial correlation of order up
to 12
data: riskpremia4.12
LM test = 10.2603, df = 12, p-value = 0.5931
Both tests reject the null hypotheses of no serial correlation of the residuals. We
remind that the 0.95 quantiles of the two Chi-squared distributions with 1 and 12
degrees of freedom may be obtained as:
> qchisq(0.95, c(1, 12))
[1] 3.841459 21.026070
An ANOVA test may be performed to verify if the intercept and the coefficient in the
linear model (4.6) may jointly be assumed both equal to 0. It is necessary to produce
an lm object related to the simpler model, the one under the null hyphotesis that
1 = 2 = 0. This model contains no regressors, so in defining the formula we have
to exclude the intercept.
> riskpremia4.12anov <- dynlm(log(EXUSBP) - log(L(F1USBP)) ~
-1, data = riskpremia)
> anova(riskpremia4.12anov, riskpremia4.12)

Heteroscedasticity and Autocorrelation

127

Analysis of Variance Table


Model 1: log(EXUSBP) - log(L(F1USBP)) ~ -1
Model 2: log(EXUSBP) - log(L(F1USBP)) ~ L(log(EXUSBP) - log(F1USBP))
Res.Df
RSS Df Sum of Sq
F
Pr(>F)
1
275 0.28699
2
273 0.27159 2 0.015406 7.7433 0.0005359 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
The F statistic gives evidence that the hypothesis 1 = 2 = 0, equivalent to no risk
premium, has to be rejected.
The t tests presented in the regression output are asymptotically valid only if the
errors t are not serially correlated and heteroscedasticity is not present. To check
the latter hypothesis the Breusch-Pagan test can be performed, which rejects the
hypothesis of homoscedastic errors.
> bptest(riskpremia4.12)
studentized Breusch-Pagan test
data: riskpremia4.12
BP = 7.2569, df = 1, p-value = 0.007063
The significance of the coefficients has thus to be checked by making recourse to a
heteroscedasticity consistent parameter inference.
> coeftest(riskpremia4.12, vcov = vcovHC(riskpremia4.12, type = "HC1"))
t test of coefficients:
Estimate Std. Error t value
(Intercept)
-0.0051118 0.0021386 -2.3903
L(log(EXUSBP) - log(F1USBP)) 3.2121699 0.9826770 3.2688
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 "

Pr(>|t|)
0.017513 *
0.001218 **
" 1

USD/Euro currency ratio


The above analysis is proposed for the USD/Euro currency ratio.
> riskpremiauseur <- dynlm(log(EXUSEUR) - log(L(F1USEUR)) ~
L(log(EXUSEUR) - log(F1USEUR)), data = riskpremia)
> summary(riskpremiauseur)
Time series regression with "ts" data:
Start = 1979(2), End = 2001(12)

128

Heteroscedasticity and Autocorrelation

Call:
dynlm(formula = log(EXUSEUR) - log(L(F1USEUR)) ~ L(log(EXUSEUR) log(F1USEUR)), data = riskpremia)
Residuals:
Min
1Q
Median
-0.103024 -0.021487 -0.000015

3Q
0.020975

Max
0.088699

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
-0.002280
0.003149 -0.724
0.470
L(log(EXUSEUR) - log(F1USEUR)) 0.484791
0.766435
0.633
0.528
Residual standard error: 0.03368 on 273 degrees of freedom
Multiple R-squared: 0.001463,
Adjusted R-squared: -0.002194
F-statistic: 0.4001 on 1 and 273 DF, p-value: 0.5276

> bgtest(riskpremiauseur)
Breusch-Godfrey test for serial correlation of order up
to 1
data: riskpremiauseur
LM test = 0.1176, df = 1, p-value = 0.7316
> bgtest(riskpremiauseur, order = 12)
Breusch-Godfrey test for serial correlation of order up
to 12
data: riskpremiauseur
LM test = 14.1237, df = 12, p-value = 0.2929
As observed by Verbeek no risk premium is found for the USD/Euro rate, namely
both the regression coefficients are not significantly different from zero; furthermore
the hyphoteses of no first-order and up to the twelfth-order autocorrelation are not
rejected. The Breusch-Pagan test gives evidence of the presence of heteroscedasticity
but the inference based upon heteroscedasticity consistent standard errors confirms
the previous conclusions.
> bptest(riskpremiauseur)
studentized Breusch-Pagan test
data: riskpremiauseur
BP = 3.965, df = 1, p-value = 0.04646

> coeftest(riskpremiauseur,vcov=vcovHC(riskpremiauseur,type="HC1"))

Heteroscedasticity and Autocorrelation

129

t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
-0.0022795 0.0030643 -0.7439
0.4576
L(log(EXUSEUR)-log(F1USEUR)) 0.4847906 0.8420818 0.5757
0.5653

4.3.2

Tests for Risk Premia using Overlapping Samples

The parameters in the following model, cfr. Verbeek equation (4.72), have to be
estimated:
3
3
st ft3
= 1 + 2 (st3 ft3
) + et
log(EXUSBPt ) log(F3USBPt3 ) = 1 + 2 (log(EXUSBPt3 ) log(F3USBPt3 )) + et
(4.7)
The function dynlm may be invoked to estimate a linear model in presence of lagged
variables. Remember that lagged variables can be introduced in the model formula
with the operator L(.).
> library(dynlm)
> riskpremiaoverlUSBP <- dynlm(log(EXUSBP) - log(L(F3USBP,
3)) ~ L(log(EXUSBP) - log(F3USBP), 3), data = riskpremia)
> summary(riskpremiaoverlUSBP)
Time series regression with "ts" data:
Start = 1979(4), End = 2001(12)
Call:
dynlm(formula = log(EXUSBP) - log(L(F3USBP, 3)) ~ L(log(EXUSBP) log(F3USBP), 3), data = riskpremia)
Residuals:
Min
1Q
-0.285511 -0.025561

Median
0.001782

3Q
0.029698

Max
0.176615

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
-0.013566
0.004216 -3.218 0.00145 **
L(log(EXUSBP)-log(F3USBP),3) 3.135215
0.529277
5.924 9.53e-09 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.05647 on 271 degrees of freedom
Multiple R-squared: 0.1146,
Adjusted R-squared: 0.1114
F-statistic: 35.09 on 1 and 271 DF, p-value: 9.525e-09
The Breusch-Godfrey statistic is then computed to check for the presence of serially
correlated errors; in particular there is evidence of a strong autocorrelation with

130

Heteroscedasticity and Autocorrelation

reference to the first- and the twelfth-order, but as Verbeek observes the conclusions
are incorrect due to the fact that monthly data for 3 months contracts are considered
and, though see Verbeek relationship (4.73) t may be assumed to be uncorrelated
with xt3 , t may be possibly correlated with t1 and with t2 .
> bgtest(riskpremiaoverlUSBP)
Breusch-Godfrey test for serial correlation of order up to 1
data: riskpremiaoverlUSBP
LM test = 119.6924, df = 1, p-value < 2.2e-16
> bgtest(riskpremiaoverlUSBP, order = 12)
Breusch-Godfrey test for serial correlation of order up to 12
data: riskpremiaoverlUSBP
LM test = 173.672, df = 12, p-value < 2.2e-16
The Breusch-Godfrey statistic must then be computed only with reference to the
autocorrelations of order 3,4, until 12. The auxiliary equation referred to (4.7) is:
3
et = 1 + 2 (st3 ft3
) + 3 et3 + + 12 et12

According to what suggested by Davidson and MacKinnon (1993) we have to set to 0


any presample value of the residuals to compute the Breusch-Godfrey statistic. This
can be obtained in the following way by creating a proper matrix for the regressors.
>
>
>
>
>
>
>

reUSBP <- residuals(riskpremiaoverlUSBP)


re <- lag(reUSBP)
for (o in 3:12) re <- cbind(re, lag(reUSBP, -o))
re <- window(re, start = c(1979, 4), end = c(2001, 12))
re[is.na(re)] <- 0
re <- re[, -1]
dimnames(re)[[2]] <- paste("USBPlag", 3:12, sep = "")

The matrix re is an mts (multiple time series) object obtained by binding the time
series of the residuals and its proper lagged versions and preserves the initial start of
the series, but adds information at the end of the series have a look at the object
re once You have created it. The correct time window is to be considered and this is
obtained by means of the function window. Any presample value (identified as a NA
not available case in the matrix) is set to zero and the series of the order 1 lagged
errors is dropped. Proper names are finally assigned to the elements (columns) of the
multiple time series object.
> check <- dynlm(reUSBP ~ L(log(EXUSBP) - log(F3USBP), 3) + re,
data = riskpremia)
The results of the auxiliary regression estimation are reported here only for
completeness, thus the following instruction may be dropped.
> summary(check)

Heteroscedasticity and Autocorrelation

131

Time series regression with "ts" data:


Start = 1979(4), End = 2001(12)
Call:
dynlm(formula = reUSBP ~ L(log(EXUSBP) - log(F3USBP), 3) + re,
data = riskpremia)
Residuals:
Min
1Q
-0.277077 -0.027074

Median
0.003567

3Q
0.033847

Max
0.162478

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
-0.0003768 0.0042542 -0.089
0.9295
L(log(EXUSBP)-log(F3USBP),3) 0.0680321 0.5378860
0.126
0.8994
reUSBPlag3
-0.0372339 0.0970268 -0.384
0.7015
reUSBPlag4
-0.0582779 0.1313641 -0.444
0.6577
reUSBPlag5
0.0615611 0.1312624
0.469
0.6395
reUSBPlag6
-0.1456368 0.1402550 -1.038
0.3001
reUSBPlag7
-0.0228997 0.1470880 -0.156
0.8764
reUSBPlag8
0.1280666 0.1471511
0.870
0.3849
reUSBPlag9
-0.0768684 0.1408519 -0.546
0.5857
reUSBPlag10
-0.0840098 0.1323110 -0.635
0.5260
reUSBPlag11
0.2226356 0.1325330
1.680
0.0942 .
reUSBPlag12
-0.1622903 0.0973472 -1.667
0.0967 .
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.05678 on 261 degrees of freedom
Multiple R-squared: 0.02627,
Adjusted R-squared: -0.01477
F-statistic: 0.6401 on 11 and 261 DF, p-value: 0.7937
The Breusch-Godfrey statistic can finally be computed by multiplying the multiple
R-squared by the number of observations in the auxiliary regression.
> T <- length(reUSBP)
> summary(check)$r.squared * T
[1] 7.170992
> qchisq(0.95, 10)
[1] 18.30704
To obtain the results of Verbeek the above assumption (consisting in setting to 0 any
presample residual values) must not be made, and T must be substituted by T 12:
it then follows a slightly different formulation, though with the same final conclusion.

132

Heteroscedasticity and Autocorrelation

> reUSBP <- residuals(riskpremiaoverlUSBP)


> check <- dynlm(reUSBP ~ L(log(EXUSBP) - log(F3USBP),
3) + L(reUSBP, 3:12), data = riskpremia)
The results of the auxiliary regression estimation are reported only for completeness,
thus the following instruction may be dropped. Observe that the starting and final
times for the following regression are diffferent from the ones appearing in the
preceding output.
> summary(check)
Time series regression with "ts" data:
Start = 1980(4), End = 2001(12)
Call:
dynlm(formula = reUSBP ~ L(log(EXUSBP) - log(F3USBP), 3) + L(reUSBP,
3:12), data = riskpremia)
Residuals:
Min
1Q
-0.276828 -0.027926

Median
0.004025

3Q
0.033975

Max
0.164022

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
-0.002471
0.004304 -0.574
0.5663
L(log(EXUSBP)-log(F3USBP),3) 0.166460
0.535840
0.311
0.7563
L(reUSBP, 3:12)3
-0.016621
0.098831 -0.168
0.8666
L(reUSBP, 3:12)4
-0.060702
0.133137 -0.456
0.6488
L(reUSBP, 3:12)5
0.037894
0.131270
0.289
0.7731
L(reUSBP, 3:12)6
-0.141032
0.141052 -1.000
0.3183
L(reUSBP, 3:12)7
-0.023177
0.146965 -0.158
0.8748
L(reUSBP, 3:12)8
0.122700
0.146858
0.835
0.4042
L(reUSBP, 3:12)9
-0.079221
0.140486 -0.564
0.5733
L(reUSBP, 3:12)10
-0.085884
0.131756 -0.652
0.5151
L(reUSBP, 3:12)11
0.223343
0.131967
1.692
0.0918 .
L(reUSBP, 3:12)12
-0.163520
0.096852 -1.688
0.0926 .
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.05642 on 249 degrees of freedom
Multiple R-squared: 0.03009,
Adjusted R-squared: -0.01276
F-statistic: 0.7022 on 11 and 249 DF, p-value: 0.7361
It follows the value of the Breusch-Godfrey statistic
> T <- length(reUSBP)
> summary(check)$r.squared * (T - 12)
[1] 7.852361

Heteroscedasticity and Autocorrelation

133

Heteroskedasticity and Autocorrelation Consistent (HAC) Covariance


Matrix Estimation (Newey-West)
To take into account the presence of autocorrelation for the parameter inference it
is possible to make recourse to OLS but by computing the corrected (Newey-West)
standard errors, that is the Heteroskedasticity and Autocorrelation Consistent (HAC)
standard errors.
> library(sandwich)
Verbeek suggests to use H=3 in relationships (4.62) and (4.63). Pay attention: with
regard to the instruction NeweyWest, see also Zeileis (2004), the argument lag is the
maximum lag with positive correlation, so here is lag=H-1; Verbeek does not suggest
to use the adjusted estimates for finite samples so adjust=FALSE. The argument
prewhite must be set to FALSE to obtain Newey-West standard errors according to
(4.62) and (4.63).
> coeftest(riskpremiaoverlUSBP, vcov = NeweyWest(riskpremiaoverlUSBP,
lag = 2, adjust = FALSE, sandwich = TRUE, prewhite = FALSE))
t test of coefficients:
Estimate Std. Error t value
(Intercept)
-0.0135664 0.0053729 -2.5250
L(log(EXUSBP)-log(F3USBP),3) 3.1352149 1.0560150 2.9689
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 "

Pr(>|t|)
0.012142 *
0.003256 **
" 1

There seem to be some misprints in the standard errors values reported on Verbeeks
p. 133; according to Verbeeks formulae (4.62) and (4.63) we perform the following
check.
> a <- ts(cbind(model.matrix(riskpremiaoverlUSBP),
riskpremiaoverlUSBP$res), start = c(1979, 4),
frequency = 12)
> dimnames(a)[[2]] <- c("int", "x1", "e")
> first.term.4.70 <- t(a[, 1:2]) %*% (a[, 1:2] * a[,
3]^2)/T
> H <- 3
> cum <- 0
> for (j in 1:(H - 1)) {
w_j <- (1 - j/H)
as <- window(a, start = c(1979, 4 + j), end = c(2001,
12))
aj <- window(lag(a, -j), start = c(1979, 4 +
j), end = c(2001, 12))
cum <- cum + w_j * (t(as[, 1:2] * as[, 3]) %*%
(aj[, 1:2] * aj[, 3]) + t(aj[, 1:2] * aj[,
3]) %*% (as[, 1:2] * as[, 3]))

134

Heteroscedasticity and Autocorrelation

}
second.term.4.70 <- cum/T
Sstar <- first.term.4.70 + second.term.4.70
ext.term.4.69 <- solve(t(a[, 1:2]) %*% a[, 1:2])
vcovbeta <- ext.term.4.69 %*% (T * Sstar) %*% ext.term.4.69
diag(vcovbeta)^0.5
int
x1
0.005372888 1.056015009

>
>
>
>
>

USD/Euro currency ratio


For the USD/Euro currency ratio we have analogous code:
> riskpremiaoverlUSEUR <- dynlm(log(EXUSEUR) - log(L(F3USEUR, 3)) ~
L(log(EXUSEUR) - log(F3USEUR), 3), data = riskpremia)
> summary(riskpremiaoverlUSEUR)
Time series regression with "ts" data:
Start = 1979(4), End = 2001(12)
Call:
dynlm(formula = log(EXUSEUR) - log(L(F3USEUR, 3)) ~ L(log(EXUSEUR) log(F3USEUR), 3), data = riskpremia)
Residuals:
Min
1Q
Median
-0.15097 -0.04598 -0.00358

3Q
0.04268

Max
0.15541

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
-0.010506
0.005983 -1.756
0.0802 .
L(log(EXUSEUR)-log(F3USEUR),3) 0.006050
0.534784
0.011
0.9910
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.06059 on 271 degrees of freedom
Multiple R-squared: 4.722e-07,
Adjusted R-squared:
F-statistic: 0.000128 on 1 and 271 DF, p-value: 0.991

-0.00369

> bgtest(riskpremiaoverlUSEUR)
Breusch-Godfrey test for serial correlation of order up
to 1
data: riskpremiaoverlUSEUR
LM test = 130.1647, df = 1, p-value < 2.2e-16
> bgtest(riskpremiaoverlUSEUR, order = 12)

Heteroscedasticity and Autocorrelation

135

Breusch-Godfrey test for serial correlation of order up


to 12
data: riskpremiaoverlUSEUR
LM test = 177.7619, df = 12, p-value < 2.2e-16
> qchisq(0.95, c(1, 12))
[1] 3.841459 21.026070
> reUSEUR <- residuals(riskpremiaoverlUSEUR)
> check <- dynlm(reUSEUR ~ L(log(EXUSEUR) - log(F3USEUR),
3) + L(reUSEUR, 3:12), data = riskpremia)
> summary(check)$r.squared * (T - 12)
[1] 9.040125
> coeftest(riskpremiaoverlUSEUR, vcov = NeweyWest(riskpremiaoverlUSEUR,
lag = 2, adjust = FALSE, sandwich = TRUE, prewhite = FALSE))
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
-0.0105060 0.0082893 -1.2674
0.2061
L(log(EXUSEUR)-log(F3USEUR),3) 0.0060495 0.7667389 0.0079
0.9937

5
Endogeneity, Instrumental
Variables and GMM
5.1

Estimating the Returns to Schooling (Section 5.4)

Data are available in the file schooling.wf1, which is a work file of EViews.
First invoke the package hexView and next the command readEViews to read data.
The function unzip extracts the file from the compressed archive ch05.zip.
> library(hexView)
> schooling <- readEViews(unzip("ch05.zip", "Chapter 5/schooling.wf1"))
Skipping boilerplate variable
Skipping boilerplate variable
Recall that by using the functions head(), tail() and summary() it is possible to
explore the beginning, the final section and to obtain summary statistics for all the
variables included in the data-frame.
The files schooling contain data taken from the National Longitudinal Survey of
Young Men (NLSYM) concerning the United States. The analysis focuses on 1976
but uses some variables that date back to earlier years. The following variables (many
are dummy variables) are present:

SMSA66 1 if lived in smsa in 1966

SMSA76 1 if lived in smsa in 1976

NEARC2 grew up near 2-year college

NEARC4 grew up near 4-year college

NEARC4A grew up near 4-year public college

NEARC4B grew up near 4-year private college

ED76 education in 1976

ED66 education in 1966

AGE76 age in 1976

DADED dads education (imputed avg if missing)

NODADED 1 if dads education imputed

138

Endogeneity, Instrumental Variables and GMM

MOMED mothers education

NOMOMED 1 if moms education imputed

MOMDAD14 1 if lived with mom and dad at age 14

SINMOM14 1 if single mom at age 14

STEP14 1 if step parent at age 14

SOUTH66 1 if lived in south in 1966

SOUTH76 1 if lived in south in 1976

LWAGE76 log wage in 1976 (outliers trimmed)

FAMED mom-dad education class (1-9)

BLACK 1 if black

WAGE76 wage in 1976 (raw, cents per hour)

ENROLL76 1 if enrolled in 1976

KWW the kww score

IQSCORE a normed IQ score

MAR76 marital status in 1976 (1 if married)

LIBCRD14 1 if library card in home at age 14

EXP76 experience in 1976

EXP762 EXP76 squared

The estimation of a linear model explaining the log wage in 1976 is proposed:
LWAGE76 = 1 + 2 ED76 + 3 EXP76 + 4 EXP762 + 5 BLACK + 6 SMSA76 + 7 SOUTH76 + ERROR
The parameter estimates appearing in Verbeeks Table 5.1 may be obtained by using
the function lm.
> schooling5.1 <- lm(LWAGE76 ~ ED76 + EXP76 + EXP762 +
BLACK + SMSA76 + SOUTH76, data = schooling)
> summary(schooling5.1)
Call:
lm(formula = LWAGE76 ~ ED76 + EXP76 + EXP762 + BLACK + SMSA76 +
SOUTH76, data = schooling)
Residuals:
Min
1Q
-1.59297 -0.22315

Median
0.01893

3Q
0.24223

Max
1.33190

Coefficients:
(Intercept)
ED76
EXP76

Estimate Std. Error t value Pr(>|t|)


4.7336644 0.0676026 70.022 < 2e-16 ***
0.0740090 0.0035054 21.113 < 2e-16 ***
0.0835958 0.0066478 12.575 < 2e-16 ***

Endogeneity, Instrumental Variables and GMM

EXP762
-0.0022409
BLACK
-0.1896315
SMSA76
0.1614230
SOUTH76
-0.1248615
--Signif. codes: 0 "***"

139

0.0003178 -7.050 2.21e-12 ***


0.0176266 -10.758 < 2e-16 ***
0.0155733 10.365 < 2e-16 ***
0.0151182 -8.259 < 2e-16 ***
0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1

Residual standard error: 0.3742 on 3003 degrees of freedom


Multiple R-squared: 0.2905,
Adjusted R-squared: 0.2891
F-statistic: 204.9 on 6 and 3003 DF, p-value: < 2.2e-16
The average return on wage of schooling is approximately equal to 7.4% per year of
instruction. As Verbeek observes since schooling is endogenous also the experience and
its square have to be considered endogenous; so we need at least three instruments.
Age and its square are chosen as instruments for the corresponding experience
variables; live near a college may be chosen as instrument for schooling if this variable
affects schooling conditional on the other variables in the initial model (Verbeek
underlines this is a necessary but not sufficient condition for the variable to be a
valid instrument).
The estimation of the following reduced form model is then performed to check for
the relevance of the proposed instruments:
ED76 = 0 + 1 AGE76 + 2 AGE762 + 3 BLACK + 4 SMSA76 + 5 SOUTH76 + 6 NEARC4 + ERROR

> schooling5.2 <- lm(ED76 ~ AGE76 + I(AGE76^2) + BLACK +


SMSA76 + SOUTH76 + NEARC4, data = schooling)
> summary(schooling5.2)
Call:
lm(formula = ED76 ~ AGE76 + I(AGE76^2) + BLACK + SMSA76 + SOUTH76 +
NEARC4, data = schooling)
Residuals:
Min
1Q
-12.511 -1.722

Median
-0.296

3Q
1.876

Max
7.199

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.869524
4.298357 -0.435 0.663638
AGE76
1.061441
0.301398
3.522 0.000435 ***
I(AGE76^2) -0.018760
0.005231 -3.586 0.000341 ***
BLACK
-1.468367
0.115443 -12.719 < 2e-16 ***
SMSA76
0.835403
0.109252
7.647 2.76e-14 ***
SOUTH76
-0.459700
0.102434 -4.488 7.47e-06 ***
NEARC4
0.347105
0.106997
3.244 0.001191 **
---

140

Endogeneity, Instrumental Variables and GMM

Signif. codes:

0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1

Residual standard error: 2.516 on 3003 degrees of freedom


Multiple R-squared: 0.1185,
Adjusted R-squared: 0.1168
F-statistic: 67.29 on 6 and 3003 DF, p-value: < 2.2e-16
We can now obtain instrumental variables (IV) estimates by invoking the function
tsls, available in the package sem, which performs the two stage least squares
estimation. The function tsls has three arguments: the first is the linear model
formula, the second one is the same formula with the endogenous variables replaced
by their respective instruments. Observe that both formulae are expressed according
to the lm convention, see Appendix A.4. The third argument is the data.frame where
the involved variables are present.
> library(sem)
> schooling5.3 <- tsls(LWAGE76 ~ ED76 + EXP76 + EXP762 +
BLACK + SMSA76 + SOUTH76, ~NEARC4 + AGE76 + I(AGE76^2) +
BLACK + SMSA76 + SOUTH76, data = schooling)
> summary(schooling5.3)
2SLS Estimates
Model Formula: LWAGE76 ~ ED76+EXP76+EXP762+BLACK+SMSA76+SOUTH76
Instruments: ~NEARC4 + AGE76 + I(AGE76^2) + BLACK+SMSA76+SOUTH76
Residuals:
Min. 1st Qu.
-1.82400 -0.25250

Median
0.02286

Mean
0.00000

3rd Qu.
0.26350

Estimate
Std. Error
(Intercept) 4.0656681454 0.6084960487
ED76
0.1329471988 0.0513793955
EXP76
0.0559613854 0.0259944248
EXP762
-0.0007956595 0.0013403005
BLACK
-0.1031403726 0.0773729097
SMSA76
0.1079848759 0.0497398928
SOUTH76
-0.0981751843 0.0287645065
--Signif. codes: 0 "***" 0.001 "**" 0.01

Max.
1.31600

t value
6.68150
2.58756
2.15282
-0.59364
-1.33303
2.17099
-3.41307

Pr(>|t|)
2.8078e-11
0.00971243
0.03141204
0.55279589
0.18262324
0.03000991
0.00065087

***
**
*

*
***

"*" 0.05 "." 0.1 " " 1

Residual standard error: 0.4031655 on 3003 degrees of freedom


The same result may also be obtained by implementing the formula
= (Z 0 X)1 Z 0 Y,
see Verbeeks Section 5.3.4.

Endogeneity, Instrumental Variables and GMM

141

>
>
>
>

Y <- schooling$LWAGE76
X <- model.matrix(schooling5.1)
Z <- X
Z[, 2:4] <- cbind(schooling$NEARC4, schooling$AGE76,
schooling$AGE76^2)
> solve(t(Z) %*% X) %*% t(Z) %*% Y
[,1]
(Intercept) 4.0656681823
ED76
0.1329471956
EXP76
0.0559613864
EXP762
-0.0007956596
BLACK
-0.1031403771
SMSA76
0.1079848787
SOUTH76
-0.0981751857
model.matrix(schooling5.1) returns the matrix of the regressors in the first model.
Z[,2:4]<-cbind(schooling$NEARC4,schooling$AGE76,schooling$AGE762 ) replaces
the values of the endogenous variables in the matrix Z with the values of their instruments (note that the matrix Z was initially set equal to X).
%*% is the operator performing matrix multiplication.
solve() returns the inverse of a matrix, when applied to a single element, or in the
following form gives the solution of a linear system of equations, here (Z 0 X) = (Z 0 Y ):
> solve(t(Z) %*% X, t(Z) %*% Y)
[,1]
(Intercept) 4.0656681824
ED76
0.1329471956
EXP76
0.0559613864
EXP762
-0.0007956596
BLACK
-0.1031403771
SMSA76
0.1079848787
SOUTH76
-0.0981751857
Observe that the latter code is computationally more efficient than the former for
solving a linear system of equations.
The function tsls may also be invoked by using the following four arguments: the
response, the matrix of independent variables, the matrix containing the instruments,
and a vector of weights to be used in the fitting process. Here we consider unitary
weights
> a <- tsls(Y, X, Z, w = 1)
The coefficients and their standard errors may be obtained by applying the function
summary to the object a, or by extracting from a the elements coefficients and
their covariance matrix V, (see the structure of a: str(a)):
> a$coefficients

142

Endogeneity, Instrumental Variables and GMM

[1] 4.0656681454 0.1329471988 0.0559613854 -0.0007956595


[5] -0.1031403726 0.1079848759 -0.0981751843
> diag(a$V)^0.5
[1] 0.608496049 0.051379396 0.025994425 0.001340301 0.077372910
[6] 0.049739893 0.028764507

5.2

Example of an application of the Generalized Method of


Moments

The following example by Dieter Rozenich is taken from the R help system of the
function gmm, (Chausse 2010).
For the two parameters of a normal distribution N (, 2 ) we have the following
three moment conditions:

E(X) = 0
E[(X )2 ] 2 = 0

E(X 3 ) (2 + 3 2 ) = 0
The first two moment conditions are directly obtained by the definition of N (, 2 ).
The third moment condition may be derived from the third derivative of the moment
generating function (MGF)



2 t2
MX (t) = E exp t +
2
evaluated at t = 0.
Note that, as is usual in GMM, we have more equations (3) than unknown
parameters (2).
A function, say g, is first defined, in order to establish the moment conditions which
depend on the unknown parameters, collected in a vector, say theta = [1 = , 2 =
2 ] and, of course, on the data x = [x1 , x2 , . . . , xn ].
> g <- function(theta, x) {
m1 <- x - theta[1]
m2 <- (x - theta[1])^2 - theta[2]
m3 <- x^3 - theta[1] * (theta[1]^2 + 3 * theta[2])
f <- cbind(m1, m2, m3)
return(f)
}
In presence of a vector of observations x = [x1 , x2 , . . . , xn ]:

m1 results in an n 1 vector with generic element m1i = xi 1 , i = 1, 2, . . . , n.


n

1 0
1X
1X
1n m1 =
m1i =
(xi 1 ) = 0,
n
n i=1
n i=1
where 1n is the n 1 unitary vector, corresponds to the first moment condition.

Endogeneity, Instrumental Variables and GMM

143

the generic element of the vector m2 is m2i = (xi 1 )2 2 , i = 1, 2, . . . , n.


n

1 0
1X
1X
1n m2 =
m2i =
(xi 1 )2 2 = 0
n
n i=1
n i=1
corresponds to the second moment condition.

analogously for m3,


n

1X
1X 3
1 0
1n m3 =
m3i =
x 1 (12 + 32 ) = 0
n
n i=1
n i=1 i
corresponds to the third moment condition.

f = [m1, m2, m3] is a n 3 matrix resuming the 3 moment conditions.

Let us generate a vector x of 100 pseudo random numbers, distributed according to


a Normal random variable with mean = 3 and variance 2 = 25.
> set.seed(123)
> x <- rnorm(100, mean = 3, sd = 5)
We now invoke the function gmm, available in the package gmm, whose main arguments
are the function, g, defining the moment conditions1 , the data and the starting values
to be passed to the numerical procedure for obtaining the parameter estimates.
> library(gmm)
> gmm(g, x, c(10, 10))
Method
twoStep
Objective function value:
Theta[1]
3.4615

0.0009022213

Theta[2]
20.7162

Convergence code =

The Jacobian related to the moment conditions can also be passed to the function gmm
to define the gradient, possibly improving the efficiency of the minimization algorithm
to solve the GMM problem. In the present case the Jacobian results:

1
0
J = 2 2E(X) 1
32 3 2 3
The function Dg is created to define the Jacobian.
1 The function g can also correspond to a formula when the model is linear (see the R-help
?gmm::gmm).

144

Endogeneity, Instrumental Variables and GMM

> Dg <- function(theta, x) {


jacobian <- matrix(c(-1, 2 * (theta[1] - mean(x)),
-3 * theta[1]^2 - 3 * theta[2], 0, -1, -3 *
theta[1]), 3, 2)
}
Pay attention: the Jacobian is the expected value of the Jacobian matrix, so in writing
its second row, first column, we had to specify mean(x) and not x, as we made in the
code defining the moment conditions by a matrix with typical element gj (, xi ).
The function gmm can be now invoked by taking into account also the argument
grad=Dg in order to specify the Jacobian.
> (estimation <- gmm(g, x, c(10, 10), grad = Dg))
Method
twoStep
Objective function value:
Theta[1]
3.4615

0.0009022213

Theta[2]
20.7162

Convergence code =

The covariance matrix of the parameter estimates can be obtained by means of the
function vcov.gmm.
> vcov.gmm(estimation)
Theta[1]
Theta[2]
Theta[1] 0.20798058 0.05594737
Theta[2] 0.05594737 7.88828621

5.3

Estimating Intertemporal Asset Pricing Models (Section


5.7)

The GMM framework is used to estimate an asset pricing model.


Data are available in the file pricing.dat, which is a text file, and may be read by
using the function read.table.
The function unzip extracts the file pricing.dat from the compressed archive
ch05.zip.
> pricing <- read.table(unzip("ch05.zip", "Chapter 5/pricing.dat"),
header = TRUE)
The data set contains monthly observations from February 1959 to November 1993
(T=418) on the returns on 10 size-based portfolios, the risk free rate and the
consumption growth.
The following variables are present:

Endogeneity, Instrumental Variables and GMM

145

r1: monthly return on portfolio 1 (small firms)

r2: monthly return on portfolio 2

...

r10: monthly return on portfolio 10 (large firms)

rf: risk free rate (return on 3-month T-bill)

cons: real per capita consumption growth based on total US personal


consumption expenditures (nondurables and services)

The portfolios are composed by the Center for Research in Security Prices (CRSP)
and contain stocks listed at NYSE, divided into 10 size-based deciles. For instance,
portfolio 1 contains the 10% smallest firms listed at NYSE.
Observe that the values of cons are relative values; that is they are obtained by
the ratio of total US personal consumption expenditures at times t and t 1.
We can transform the data.frame in a multiple time series. This is useful since it
allows us to work with data collected in a matrix object.
> pricing <- ts(data = pricing, start = c(1959, 2),
frequency = 12)
To apply GMM estimation we have first to recall the moment conditions, see Verbeeks
relationships (5.78) and (5.79):

 

(1
+
r
)
1=0
E CCt+1
f,t+1
t

 

(r

r
)
= 0, j = 1, . . . , J.
E CCt+1
j,t+1
f,t+1
t
We define g, a function of the parameters, resumed by the vector theta = [, ], and
of the data, here represented by the matrix x. The function g returns a n q matrix
with typical element gi (, xt ) for i = 1, . . . , q and t = 1, . . . , n. The columns of this
matrix are then used to build the q sample moment conditions.
> g <- function(theta,
e1 <- theta[1]
11]) - 1
e2 <- theta[1]
x[, 11])
f <- cbind(e1,
return(f)
}

x) {
* x[, 12]^(-theta[2]) * (1 + x[,
* x[, 12]^(-theta[2]) * (x[, 1:10] e2)

e1 contains the elements necessary to the function gmm to define the first moment
condition; the twelfth column, x[,12], of x is assumed to contain the ratio
values of US consumption expenditures (indeed they are stored as the twelfth
variable in the data.frame pricing); the eleventh column, x[,11], of x is

146

Endogeneity, Instrumental Variables and GMM

assumed to contain the risk free rates.


It consists of a vector with generic element:
2
1 x
12(i) (1 + x11(i) ) 1

Note that

1X
2
1 x
12(i) (1 + x11(i) ) 1 = 0
n i=1
is the empirical counterpart of the first moment condition.

e2 defines a matrix whose columns contain the elements necessary for defining
the other moment conditions; the columns of x[,1:10]-x[,11] contain the
differences between the monthly returns on portfolios 1-10 and the risk free rate
(Note that the recycling rule for vector differences has been applied).
So the elements in the generic jth column of e2 are of the type:
2
1 x
12(i) (xj(i) x11(i) ),

j = 1, . . . , 10;

by taking the sample averages we can obtain the empirical counterparts of the
remaining 10 moment conditions, j = 1, . . . , 10.
We now invoke the function gmm to estimate the two parameters using the GMM.
Two-step GMM
> library(gmm)
> pricing5.4_two <- gmm(g, pricing, c(0, 0), type = "twoStep",
wmatrix = "ident")
> summary(pricing5.4_two)
Call:
gmm(g=g,x=pricing,t0=c(0,0),type="twoStep",wmatrix="ident")

Method:

twoStep

Kernel:

Quadratic Spectral

Coefficients:
Estimate
Theta[1] 7.0043e-01
Theta[2] 9.1209e+01

Std. Error
1.4694e-01
3.9654e+01

t value
4.7666e+00
2.3001e+00

Pr(>|t|)
1.8732e-06
2.1442e-02

J-Test: degrees of freedom is 9


J-test
P-value
Test E(g)=0:
7.21050 0.61521
#############
Information related to the numerical optimization
Convergence code = 0

Endogeneity, Instrumental Variables and GMM

147

Function eval. = 129


Gradian eval. = NA
Iterated GMM
> pricing5.4_iter <- gmm(g, pricing, c(0, 0), type = "iterative",
vcov = "iid")
> summary(pricing5.4_iter)
Call:
gmm(g = g, x = pricing, t0 = c(0, 0), type = "iterative", vcov = "iid")

Method:

iterative

Kernel:

Quadratic Spectral

Coefficients:
Estimate
Theta[1] 8.2736e-01
Theta[2] 5.7394e+01

Std. Error
1.1616e-01
3.4221e+01

t value
7.1228e+00
1.6772e+00

Pr(>|t|)
1.0576e-12
9.3508e-02

J-Test: degrees of freedom is 9


J-test
P-value
Test E(g)=0:
5.76334 0.76335
Initial values of the coefficients
Theta[1] Theta[2]
0.700429 91.209310
#############
Information related to the numerical optimization
Convergence code = 0
Function eval. = 63
Gradian eval. = NA
One-step GMM The code to obtain the one-step estimates for general cases seems
not to be present in the package gmm, the method being used only in presence of a
linear model. However it is not difficult to implement it. We have to search for the
values of the parameters and minimizing the following function
 

2 
 
2



Pn
Pn
Ct+1
Ct+1
1
1
(1 + rf,t+1 ) 1 + n t=1 Ct
(rj,t+1 rf,t+1 )
.
t=1
n
Ct
Let us first evaluate the moment conditions for a trial guess of the parameters:
> colMeans(g(c(0.5, 10), x = pricing))
e1
-0.485104512

148

Endogeneity, Instrumental Variables and GMM

e2.(x[, 1:10] - x[, 11]).x[, 1:10].r1


0.004238430
e2.(x[, 1:10] - x[, 11]).x[, 1:10].r2
0.003682385
e2.(x[, 1:10] - x[, 11]).x[, 1:10].r3
0.003324212
e2.(x[, 1:10] - x[, 11]).x[, 1:10].r4
0.003350064
e2.(x[, 1:10] - x[, 11]).x[, 1:10].r5
0.002922254
e2.(x[, 1:10] - x[, 11]).x[, 1:10].r6
0.003062154
e2.(x[, 1:10] - x[, 11]).x[, 1:10].r7
0.002735428
e2.(x[, 1:10] - x[, 11]).x[, 1:10].r8
0.002861830
e2.(x[, 1:10] - x[, 11]).x[, 1:10].r9
0.002327386
e2.(x[, 1:10] - x[, 11]).x[, 1:10].r10
0.001604280
The objective function to be minimized is
> obj.f <- function(theta) sum(colMeans(g(theta, x = pricing))^2)
We can now call the R optimizer optim to search for the values minimizing the
objective function.
> output <- optim(c(0.5, 10), obj.f)
The arguments of the function optim are: par the starting values for the parameters
to be optimized over. fn the function to be minimized (or maximized), with first
argument the vector of parameters over which minimization is to take place. It should
return a scalar result. gr A function to return the gradient for the BFGS, CG
and L-BFGS-B methods. If it is NULL, a finite-difference approximation will be
used. method The method to be used for minimization. lower, upper Bounds on
the variables for the L-BFGS-B method, or bounds in which to search for method
Brent. control A list of control parameters. hessian a logical value specifying if a
numerically differentiated Hessian matrix should be returned. See the R help ?optim
for more information on this function.
The numerical result of the optimization problem can be obtained with
> output$par
[1] 0.6993077 91.4828214
Verbeeks Figure 5.1, see Fig. 5.1, may be obtained by by using the following code:
> it_mrs <- output$par[1] * pricing[, 12]^(-output$par[2])
> pred.mer <- as.numeric(-cov(it_mrs, pricing[, 1:10] -

0.08
0.06
0.04
0.02
0.00

Mean excess return

0.10

0.12

0.14

Endogeneity, Instrumental Variables and GMM

0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

Predicted mean excess return

Figure 5.1

Actual versus predicted mean excess returns of size-based portfolios

pricing[, 11])/mean(it_mrs))
mer <- colMeans(pricing[, 1:10] - pricing[, 11])
pred.mer <- exp(12 * pred.mer) - 1
mer <- exp(12 * mer) - 1
plot(mer ~ pred.mer, xlim = c(0, 0.14), ylim = c(0,
0.14), pch = 17, xlab = "Predicted mean excess return",
ylab = "Mean excess return")
> abline(0, 1)
>
>
>
>

149

6
Maximum Likelihood Estimation
and Specification Tests
In this Chapter the Maximum Likelihood method is applied to obtain the parameter
estimates characterizing some statistical distributions: we will take into consideration
the Normal, the Bernoulli, the Exponential and the Poisson ones. The method will
then be applied to estimate the parameters of a linear model with Gaussian errors.
Let (x1 , . . . , xn ) be a sample from a random variable X, discrete or continuous,
with probability density function
f (x; ).
(6.1)
The n-dimensional random variable, (X1 , . . . , Xn ), associated to (x1 , . . . , xn ), when
the hyphothesis of independence and identical distribution of the components of
(X1 , . . . , Xn ) can be assumed, has the following distribution function.
L(x1 , . . . , xn ; ) = fX1 ,...,Xn (x1 , . . . , xn ; ) =

n
Y

f (xi ; ).

i=1

By definition the Maximum Likelihood estimate of is given, in presence of a sample


(x1 , . . . , xn ), by the value which maximizes
L(|x1 , . . . , xn ) =

n
Y

f (xi ; ).

i=1

Usually one works with the log-Likelihood function:


l(x1 , . . . , xn ; ) = log [L(x1 , . . . , xn ; )] =

n
X

log[f (xi ; )].

i=1

since
argmax L(x1 , . . . , xn ; ) = argmax log [L(x1 , . . . , xn ; )] .

Sometimes the solution to the above maximization problem cannot be derived


algebraically; so one has to rely on numerical solutions. Observe that different
algorithms of optimization may give non-identical numerical solutions to the problem,
possibly depending also on the starting values assigned to the numerical procedure,
expecially when the problem is ill-posed from a computational point of view.

152

0.2
0.0

0.1

density

0.3

0.4

Maximum Likelihood Estimation and Specification Tests

10

10

Figure 6.1

6.1

Density function for X N (, 2 ), {5, 0, 5}, 2 = 1

Normal distribution

The density function of a Normal random variable is




(x )2
1
2
exp
f (x; , ) =
2 2
2
and has a bell-shaped behaviour. and are respectively location and scale
parameters corresponding to the mean and the standard deviation of the distribution.
Figure 6.1 and 6.2 show the behaviour of f for various and .
> plot(function(x) dnorm(x), xlim = c(-10, 10), ylab = "density")
> sapply(c(-5, 5), function(i) curve(dnorm(x, mean = i,
sd = 1), add = TRUE))
> plot(function(x) dnorm(x), xlim = c(-10, 10), ylab = "density")
> sapply(1:5, function(i) curve(dnorm(x, sd = i), add = TRUE))

153

0.2
0.0

0.1

density

0.3

0.4

Maximum Likelihood Estimation and Specification Tests

10

10

Figure 6.2

Density function for X N (, 2 ), = 0, 2 {1, 4, 9, 16, 25}

Lets now generate a sample of n = 100 pseudo-random numbers from a Normal


random variable X N ( = 4, 2 = 9).
By using the function set.seed it is possible to replicate the same realization of the
sample and the reader can reproduce the same results presented below.
The function rnorm(n,mean,sd) generates n pseudo-random numbers from a Normal
random variable with = mean and = sd.
>
>
>
>
>

set.seed(1000)
n <- 100
mean <- 4
sd <- 3
x <- rnorm(n, mean = mean, sd = sd)

We now construct
log (L(x1 , . . . , xn ; )) =

n
X
i=1


ln



1
(xi )2
exp

2 2
(2 2 )1/2

154

Maximum Likelihood Estimation and Specification Tests

that is the opposite of the log-Likelihood function1 , under the assumption that the
observations in the sample are i.i.d. X N (, 2 ) with and 2 unknown parameters.
With the function dnorm(x,mean,sd,log) we can obtain, for the value x, the density
function of the Normal distribution with = mean and = sd when log=FALSE and
the log-density when log=TRUE. Observe that by default the argument log is FALSE.
> ll <- function(theta) -sum(dnorm(x, mean = theta[1],
sd = theta[2]^0.5, log = TRUE))
Here theta is a vector with 2 elements: respectively the mean and the standard
deviation of a Normal distribution.
To invoke the minimization algorithm we need to specify the starting values for the
parameters upon which the objective function depends. Observe that in ill-posed
problems the solution of the minimization algorithm might depend highly on the
choice of the starting parameters.
We can use a Newton type algorithm as minimization algorithm, which is available
in the function nlm. The main arguments of nlm are the function, which has to be
minimized, and the starting values. One can also require to produce the hessian, which
will be used to construct I(, 2 ), the Fisher Information Matrix, and the covariance
matrix of the parameter estimates as the inverse of I(, 2 ). (See the help ?nlm for
more information on the function nlm).
Usually one has to try with different starting values to evaluate the sensitivity of the
solution. Here we propose the following two options:

the median and one half the interquartile range for the location and the scale
parameters respectively;
>
>
>
>

theta.start <- c(median(x), IQR(x)/2)


out <- nlm(ll, theta.start, hessian = TRUE)
theta.hat <- out$estimate
theta.hat

[1] 4.049135 9.027908


> fish <- out$hessian
> fish
[,1]
[,2]
[1,] 11.076763955 -0.000245457
[2,] -0.000245457 0.613226159
> solve(fish)
[,1]
[,2]
[1,] 9.027908e-02 3.613614e-05
[2,] 3.613614e-05 1.630720e+00

the values 0 and 1 respectively for and .

1 This

because R internal optimization routines provide for the solution to minimization problems.

Maximum Likelihood Estimation and Specification Tests

>
>
>
>

155

theta.start <- c(0, 1)


out <- nlm(ll, theta.start, hessian = TRUE)
theta.hat <- out$estimate
theta.hat

[1] 4.049131 9.027900


> fish <- out$hessian
> fish
[,1]
[,2]
[1,] 11.0767726456 -0.0002398594
[2,] -0.0002398594 0.6132280859
> solve(fish)
[,1]
[,2]
[1,] 9.027900e-02 3.531194e-05
[2,] 3.531194e-05 1.630715e+00
In this case the problem is well posed, so the solution does not depend on the initial
starting values. There are only negligible differences.
The behaviour of the logLikelihood function is shown in Figure 6.3 and Figure 6.4
that can be obtained with the code:
>
>
>
>

xx <- seq(-4, 15, l = 50)


yy <- seq(1, 20, l = 50)
grid <- expand.grid(xx, yy)
z <- sapply(1:nrow(grid), function(i) -ll(c(grid[i,
1], grid[i, 2])))
> z <- matrix(z, nrow = length(xx), ncol = length(yy))
> persp(xx, yy, z, theta = 45, phi = 45, shade = 0.45,
xlab = expression(mu), ylab = expression(sigma^2),
zlab = "logLikelihood")
> contour(xx, yy, z, nlevels = 250, xlab = expression(mu),
ylab = expression(sigma^2))
To define the surface of the graph we could not have recourse to the function outer,
see the help ?persp or Longhow Lam (2010), p. 91, since mean and sd are scalar
arguments for the function dnorm.
The object grid obtained by means of the function expand.grid contains the
elements of the cartesian product betwen the two sets xx and yy defining respectively
the supports for the mean and the variance.
The values, z, of the density function are then evaluated for each element in the
product set xx*yy by using sapply.
z is finally re-arranged as a matrix defining the surface to be plotted by the functions

156

Maximum Likelihood Estimation and Specification Tests

logL
sig
m

a^

od

iho
ikel
u
m

Figure 6.3 Estimating the parameters of a Normal distribution.


Loglikelihood function ln(, 2 |x1 , . . . , xn ): perspective plot

persp and contour.


From the analysis of the contour plot it is evident that the estimate for the mean will
be more precise than that for the variance.
The packages stat4 and bbmle have functions devoted to maximum likelihood
estimation, which automatically produce the standard errors for the parameter
estimates. The function mle is present in stat4; the function mle2 in bbmle. In
particular mle2 returns also asymptotic tests.
In both cases the likelihood function must be defined by specifying the arguments as
scalars without collecting them in an array as made above with theta for nlm. So the
starting values have to be collected in a list.
> ll <- function(mi1, s2) -sum(dnorm(x, mean = mi1,
sd = s2^0.5, log = TRUE))
> theta.start <- list(mi1 = 0, s2 = 1)
> library(stats4)
> out <- mle(ll, start = theta.start)

80
290
0

7
6

00

60

8
0

8
2

80

60

60
40
60
8
40
13
20
50
0
14
72
98
208
0
0
0
0
0

0
02

20

2
1

0
28
1 0
48
1

0
46
40
0
40
1
0
76
8 10
1
30
1
0
0
66
220
0
9
40
1

3
0

16
6 78
40
60
00
1
20
4
380
29
15
22
0 1120
0
0
20 2400
000000000
98
86
88
90
92
94
96
72
74
76
78
80
82
84
64
66
68
70
540
62
86
17

334
4
4
4
4
4
4
4
4
4
0

300

80

3
3 20
60

28
4

00

60

80

80

20

38
0

18

20
5

20

10

20

60
7

00

40

6
4

80

26

7
40
9
40
1
8
80
10
0
1
00
0

56

20

10

15

20

Maximum Likelihood Estimation and Specification Tests

00

00

10

15

Figure 6.4 Estimating the parameters of a Normal distribution.


Loglikelihood function ln(, 2 |x1 , . . . , xn ): contour plot

> summary(out)
Maximum likelihood estimation
Call:
mle(minuslogl = ll, start = theta.start)
Coefficients:
Estimate Std. Error
mi1 4.048892 0.3004372
s2 9.026253 1.2762730
-2 log L: 503.8196
> library(bbmle)
> out <- mle2(ll, start = theta.start)
> summary(out)

157

158

Maximum Likelihood Estimation and Specification Tests

Maximum likelihood estimation


Call:
mle2(minuslogl = ll, start = theta.start)
Coefficients:
Estimate Std. Error z value
Pr(z)
mi1 4.04889
0.30044 13.4767 < 2.2e-16 ***
s2
9.02625
1.27627 7.0724 1.523e-12 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
-2 log L: 503.8196
By plotting the result of the function profile applied to an mle2 object it is possible
to investigate the behaviour of the objective function near the solution and obtain
graphical confidence intervals, see Figure 6.5.
> plot(profile(out))
We now consider the behaviour of the Likelihood function


n
Y
(xi )2
1
exp

L(, 2 |x1 , . . . , xn ) =
2 2
(2 2 )1/2
i=1
for different sample sizes. Let generate a sample x of length 250 from X N ( =
4, 2 = 25) and define the function llplot to obtain the perspective and contour
graphs of the Likelihood function for the n initial elements of x.
The function pdf redirects the graph on a pdf (file) device.
dev.cur() returns the name of the current device R is working on.
dev.set() defines to R the device to work on.
dev.off() closes the current open device.
Before invoking the following function llplot two devices are opened to which are
respectively redirected the perspectives and contours plots.
> set.seed(1000)
> x <- rnorm(250, mean = 4, sd = 3)
> llplot <- function(n) {
x <- x[1:n]
xx <- seq(3, 5, l = 50)
yy <- seq(6, 12, l = 50)
grid <- expand.grid(xx, yy)
ll <- function(theta) prod(dnorm(x, mean = theta[1],
sd = theta[2]^0.5))
z <- sapply(1:nrow(grid), function(i) ll(c(grid[i,
1], grid[i, 2])))
z <- matrix(z, nrow = length(xx), ncol = length(yy))
dev.set(dev1)

Maximum Likelihood Estimation and Specification Tests

Likelihood profile: mi1

Likelihood profile: s2

2.5

99%

2.0

2.0

2.5

99%

95%

95%

1.5

90%

1.5

90%

1.0

80%

1.0

80%

0.0

0.0

0.5

50%

0.5

50%

3.5

4.0
mi1

4.5

10

12

s2

Figure 6.5 Estimating the parameters of a Normal distribution.


Likelihood function profiles

persp(xx, yy, z, theta = 45, phi = 45, shade = 0.15,


xlab = expression(mu), ylab = expression(sigma^2),
zlab = "Likelihood", sub = paste("n: ", n,
", Sample mean: ", round(mean(x), 2),
", Sample variance: ", round(var(x),
2), sep = ""))
dev.set(dev2)
contour(xx, yy, z, nlevels = 25, xlab = expression(mu),
ylab = expression(sigma^2), sub = paste("n: ",
n, ", Sample mean: ", round(mean(x),
2), ", Sample variance: ", round(var(x),
2), sep = ""))
}
> pdf("Chapter06-normallikelihoodpersp.pdf")
> dev1 <- dev.cur()
> layout(matrix(1:4, 2, 2, byrow = TRUE))

159

160

Maximum Likelihood Estimation and Specification Tests

a^
2
sig
m

sig
m

u
m

u
m

a^
2

od

liho

o
liho

Like

Like
n: 100, Sample mean: 4.05, Sample variance: 9.12

n: 150, Sample mean: 4.1, Sample variance: 8.51

Like
a^
2
sig
m

sig
m

u
m

u
m

a^
2

od

od

liho

liho

Like
n: 200, Sample mean: 4.18, Sample variance: 8.21

n: 250, Sample mean: 4.03, Sample variance: 8.23

Figure 6.6 Estimating the parameters of a Normal distribution.


Likelihood function L(, 2 |x1 , . . . , xn ), perspectives (n = 100, 150, 200, 250)

>
>
>
>
>
>

pdf("Chapter06-normallikelihoodcontour.pdf")
dev2 <- dev.cur()
layout(matrix(1:4, 2, 2, byrow = TRUE))
sapply(c(100, 150, 200, 250), function(i) llplot(i))
dev.off(dev1)
dev.off(dev2)

Figures 6.6 and 6.7 report respectively the Likelihood function behaviour via
perspective and contour plots of the surface for in the interval [3, 5] and 2 in
the interval [6, 12]. Observe that perspective plots do not have the same scale for the
Likelihood. It is evident that the Likelihood gets more and more concentrated on the
true values = 4, 2 = 9 when the sample size increases. There is a larger uncertainty
in estimating the mean than the variance.
Observe that with R version 2.15.1 64-bit, run on a Windows 7 system, contour plots
are not produced for n 150. The code works with the 32-bit version.

12
11
10
8

2e163
5e164

8e111

2e111

3.0

3.5

4.0

4.5

5.0

3.0

3.5

4.0

4.5

5.0

11
10

70
2

6e

2
15
215

1e

2e

216
215
e

1.6

8e

4e270

10

2e216
6e216

26
9
3e
269
2.6e269

1.2e
26

1.2e215

1.4e

4e216

8e270

11

12

n: 150, Sample mean: 4.1, Sample variance: 8.51

12

n: 100, Sample mean: 4.05, Sample variance: 9.12

7e163

5.
5e
1
3e
63
1
63

0
11

4e
2.

1
1

1.4

3.5e163

11
10
9

1
1

63
1
63
1
4e

0
11

8e

1
11

63

5e

2e

1.
4e

1e163

2.5e1

1.

6e
1
11
10 1
1e1
.2
e
10
11
1

e
0
1.6
10
1
6e
2.

12

Maximum Likelihood Estimation and Specification Tests

2e270

3.0

3.5

4.0

4.5

5.0

n: 200, Sample mean: 4.18, Sample variance: 8.21

3.0

3.5

4.0

4.5

5.0

n: 250, Sample mean: 4.03, Sample variance: 8.23

Figure 6.7 Estimating the parameters of a Normal distribution.


Likelihood function L(, 2 |x1 , . . . , xn ), contour plots (n = 100, 150, 200, 250)

161

162

Maximum Likelihood Estimation and Specification Tests

For a dynamic graph representation allowing a three-dimensional graph to be


examined from different perspectives, by changing the theta angle, use2 :
>
>
>
>
>
>

set.seed(1000)
x <- rnorm(250, mean = 4, sd = 3)
xx <- seq(3, 5, l = 50)
yy <- seq(6, 12, l = 50)
grid <- expand.grid(xx, yy)
ll <- function(theta) prod(dnorm(x, mean = theta[1],
sd = theta[2]^0.5))
> z <- sapply(1:nrow(grid), function(i) ll(c(grid[i,
1], grid[i, 2])))
> z <- matrix(z, nrow = length(xx), ncol = length(yy))
> sapply(1:200, function(i) persp(xx, yy, z, theta = i,
phi = 45, shade = 0.15, xlab = expression(mu),
ylab = expression(sigma^2), zlab = "Likelihood"))

6.2

Bernoulli distribution

Lets generate a sample of n = 100 pseudo random numbers from a Bernoulli


distribution with p = 0.7.
Remind that the Bernoulli distribution has the following probability distribution
function
P (X = x) = px (1 p)1x , x = 0, 1
and may be considered a special case of the Binomial distribution whose probability
distribution function is:
 
n x
P (X = x) =
p (1 p)nx , x = 0, 1, . . . , n.
x
The Bernoulli is obtained for n = 1.
Use the function rbinom to generate a sample of a specified dimension from a
binomial distribution with parameters size = n and prob = p; the arguments of the
function are the dimension, size and probability.
>
>
>
>

set.seed(1000)
n <- 100
p <- 0.7
x <- rbinom(n, size = 1, prob = p)

The opposite of the log-Likelihood function


log [L(x1 , . . . , xn ; )] =

n
X



ln pxi (1 p)1xi

i=1

can be obtained by using the function dbinom, by specifying log=TRUE.


2 To

stop the sequence of pictures press the Escape key

Maximum Likelihood Estimation and Specification Tests

163

> ll <- function(theta) -sum(dbinom(x, size = 1, prob = theta,


log = TRUE))
To start the minimization algorithm, available by invoking nlm, we consider here the
situation of complete uncertainty about p by establishing the starting value p = 0.5.
> theta.start <- 0.5
> out <- nlm(ll, theta.start, hessian = TRUE)
> theta.hat <- out$estimate
> theta.hat
[1] 0.72
> fish <- out$hessian
> fish
[,1]
[1,] 496.2484
> solve(fish)
[,1]
[1,] 0.00201512
The behaviour of the logLikelihood function is shown in Figure 6.8 that can be
obtained with the code:
> xx <- seq(0, 1, l = 500)
> yy <- sapply(1:length(xx), function(i) -ll(xx[i]))
> plot(xx, yy, type = "l", xlab = "p", ylab = "logLikelihood")
The estimation may also be performed by having recourse to the packages stat4 or
bbmle. For the sake of clarity we change the name to the parameter to estimate, to
remark that the starting value has to be specified as an element of a list.
> ll <- function(phat) -sum(dbinom(x, size = 1, prob = phat,
log = TRUE))
> theta.start <- list(phat = 0.5)
> library(stats4)
> out <- mle(ll, start = theta.start)
> summary(out)
Maximum likelihood estimation
Call:
mle(minuslogl = ll, start = theta.start)
Coefficients:
Estimate Std. Error
phat 0.7200001 0.04489945
-2 log L: 118.5907

164

200
300
400

logLikelihood

100

Maximum Likelihood Estimation and Specification Tests

0.0

0.2

0.4

0.6

0.8

1.0

Figure 6.8 Estimating the parameter of a Bernoulli distribution.


Loglikelihood function ln(p|x1 , . . . , xn )

Maximum Likelihood Estimation and Specification Tests

165

> library(bbmle)
> out <- mle2(ll, start = theta.start)
> summary(out)
Maximum likelihood estimation
Call:
mle2(minuslogl = ll, start = theta.start)
Coefficients:
Estimate Std. Error z value
Pr(z)
phat
0.7200
0.0449 16.036 < 2.2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
-2 log L: 118.5907
Here we did not encounter any problem of convergence thought the parameter space
was a subset of <. In case a constrained optimization were necessary, one can make
recourse to the functions nlminb or optim or constrOptim (see the R help system
for more information).
By plotting the result of the function profile applied to an mle2 object it is
possible to investigate the behaviour of the objective function near the solution and
obtain graphical confidence intervals, see Figure 6.9.
> plot(profile(out))
We now consider the behaviour of the Likelihood function
n
Y
pxi (1 p)1xi
L(|x1 , . . . , xn ) =
i=1

for different sample sizes. Let generate a sample x of length 250 from X Be(p = .7)
and define the function llplot to obtain the graph of the Likelihood function for the
n initial elements of x.
> set.seed(1000)
> x <- rbinom(250, size = 1, prob = 0.7)
> llplot <- function(n) {
x <- x[1:n]
xx <- seq(0, 1, l = 500)
ll <- function(theta) prod(dbinom(x, size = 1,
prob = theta))
yy <- sapply(1:length(xx), function(i) ll(xx[i]))
plot(xx, yy, type = "l", xlab = "p", ylab = "Likelihood",
sub = paste("n: ", n, ", Sample mean: ",
round(mean(x), 2), sep = ""))
}
> pdf("Chapter06-bernoullilikelihood.pdf")
> layout(matrix(1:4, 2, 2, byrow = TRUE))

166

Maximum Likelihood Estimation and Specification Tests

Likelihood profile: phat

2.0

2.5

99%

95%

1.5

90%

1.0

80%

0.0

0.5

50%

0.60

0.65

0.70

0.75

0.80

phat

Figure 6.9 Estimating the parameter of a Bernoulli distribution.


Profile of the Likelihood function

> sapply(c(100, 150, 200, 250), function(i) llplot(i))


> dev.off()
Figure 6.10 reports the Likelihood behaviour for p (0, 1). Observe that the graphs
do not have the same scale for the Likelihood. The Likelihood gets more and more
concentrated by increasing the sample size.

6.3

Exponential distribution

The density function of an Exponential distribution is:


f (x; ) = ex ,

x 0.

We have E(X) = 1 and V ar(X) = 12 .


Figure 6.11 shows the behaviour of f for various .
> plot(function(x) dexp(x, rate = 3), xlim = c(0, 10),
ylab = "density")

0e+00
0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

0e+00

0.0e+00

4e55

1.0e68

p
n: 150, Sample mean: 0.71

Likelihood

p
n: 100, Sample mean: 0.72

Likelihood

167

2e40

Likelihood

1.0e26
0.0e+00

Likelihood

Maximum Likelihood Estimation and Specification Tests

0.0

0.2

0.4

0.6

0.8

1.0

0.0

p
n: 200, Sample mean: 0.68

0.2

0.4

0.6

0.8

1.0

p
n: 250, Sample mean: 0.68

Figure 6.10 Estimating the parameter of a Bernoulli distribution.


Likelihood function L(p|x1 , . . . , xn ), perspective plots (n = 100, 150, 200, 250)

> sapply(1/c(2, 4, 5), function(i) curve(dexp(x, rate = i),


add = TRUE))
Lets generate a sample of n = 100 pseudo random numbers from an Exponential
distribution with = 4. The function rexp may be invoked.
>
>
>
>

set.seed(1000)
n <- 100
lambda <- 4
x <- rexp(n, rate = lambda)

The opposite of the log-Likelihood function


log [L(x1 , . . . , xn ; )] =

n
X



ln exi

i=1

can be obtained by using the function dexp with the argument log=TRUE.

168

1.5
0.0

0.5

1.0

density

2.0

2.5

3.0

Maximum Likelihood Estimation and Specification Tests

10

Figure 6.11

Density function for X Exp(), {1, 0.5, 0.25, 0.2}

> ll <- function(theta) -sum(dexp(x, rate = theta,


log = TRUE))
As starting value we use
> theta.start <- 0.5
and invoke nlm to obtain the parameter estimates
> out <- nlm(ll, theta.start, hessian = TRUE)
> theta.hat <- out$estimate
> theta.hat
[1] 4.250685
> fish <- out$hessian
> fish
[,1]
[1,] 5.53344
> solve(fish)

169

200
400

300

logLikelihood

100

Maximum Likelihood Estimation and Specification Tests

10

Figure 6.12 Estimating the parameter of an Exponential distribution.


Loglikelihood function ln(|x1 , . . . , xn )

[,1]
[1,] 0.1807194
The behaviour of the logLikelihood function is shown in Figure 6.12 that can be
obtained with the code:
> xx <- seq(0, 10, l = 500)
> yy <- sapply(1:length(xx), function(i) -ll(xx[i]))
> plot(xx, yy, type = "l", xlab = expression(lambda),
ylab = "logLikelihood")
The estimation may also be performed by having recourse to the packages stat4 or
bbmle. For the sake of clarity we change the name to the parameter to estimate, to
remark that the starting value has to be specified as an element of a list.
> ll <- function(lambdahat) -sum(dexp(x, rate = lambdahat,
log = TRUE))

170

Maximum Likelihood Estimation and Specification Tests

> theta.start <- list(lambdahat = 0.5)


> library(stats4)
> out <- mle(ll, start = theta.start)
> summary(out)
Maximum likelihood estimation
Call:
mle(minuslogl = ll, start = theta.start)
Coefficients:
Estimate Std. Error
lambdahat 4.250685 0.4250685
-2 log L: -89.41605
> library(bbmle)
> out <- mle2(ll, start = theta.start)
> summary(out)
Maximum likelihood estimation
Call:
mle2(minuslogl = ll, start = theta.start)
Coefficients:
Estimate Std. Error z value
Pr(z)
lambdahat 4.25069
0.42507
10 < 2.2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
-2 log L: -89.41605
By plotting the result of the function profile applied to an mle2 object it is possible
to investigate the behaviour of the objective function near the solution and obtain
graphical confidence intervals, see Figure 6.13.
> plot(profile(out))
We now consider the behaviour of the Likelihood function
L(|x1 , . . . , xn ; ) =

n
Y

exi

i=1

for different sample sizes. Let generate a sample x of length 250 from X Exp( = 4)
and define the function llplot to obtain the graph of the Likelihood function for the
n initial elements of x. The function pdf opens a pdf file as graphical device to save
the graph.
> set.seed(1000)
> x <- rexp(250, rate = 4)

Maximum Likelihood Estimation and Specification Tests

Likelihood profile: lambdahat

2.0

2.5

99%

95%

1.5

90%

1.0

80%

0.0

0.5

50%

3.5

4.0

4.5

5.0

5.5

lambdahat

Figure 6.13 Estimating the parameter of an Exponential distribution.


Profile of the Likelihood function

> llplot <- function(n) {


x <- x[1:n]
xx <- seq(0, 10, l = 500)
ll <- function(theta) prod(dexp(x, rate = theta))
yy <- sapply(1:length(xx), function(i) ll(xx[i]))
plot(xx, yy, type = "l", xlab = expression(lambda),
ylab = "Likelihood", sub = paste("n: ", n,
", Sample mean: ", round(mean(x), 2),
sep = ""))
}
> pdf("Chapter06-exponentiallikelihood.pdf")
> layout(matrix(1:4, 2, 2, byrow = TRUE))

> sapply(c(100, 150, 200, 250), function(i) llplot(i))


> dev.off()

171

Likelihood

1.0e+19

0e+00

2e+34

2.0e+19

4e+34

Maximum Likelihood Estimation and Specification Tests

0.0e+00

Likelihood

172

10

10

4e+58
Likelihood

0e+00

2e+58

6.0e+45
0.0e+00

Likelihood

n: 150, Sample mean: 0.22

1.2e+46

n: 100, Sample mean: 0.24

10

n: 200, Sample mean: 0.22

10

n: 250, Sample mean: 0.21

Figure 6.14 Estimating the parameter of an Exponential distribution.


Likelihood function L(|x1 , . . . , xn ), perspective plots (n = 100, 150, 200, 250)

Figure 6.14 reports the Likelihood behaviour for (0, 10). Observe that the graphs
do not have the same scale for the Likelihood. The Likelihood gets more and more
concentrated by increasing the sample size.

6.4

Poisson distribution

The distribution function of a Poisson distribution is:


p(x; ) =

x e
,
x!

x = 0, 1, 2, . . . , .

We have E(X) = V ar(X) = .


Figure 6.15 shows the behaviour of p(x; ) for various .
> layout(matrix(1:4, 2, 2))
> sapply(1:4, function(i) plot(0:15, dpois(0:15, lambda = i),
ylim=c(0,0.4),type="h",xlab="x",ylab=expression(paste(lambda),
"=", i, sep = ""), lwd = 2))

0.4
0.3

0.1
0.0
0

10

15

10

15

10

15

0.3

0.1
0.0

0.0

0.1

0.2

0.2

0.3

0.4

0.4

173

0.2

0.2
0.0

0.1

0.3

0.4

Maximum Likelihood Estimation and Specification Tests

10

15

Figure 6.15

Probability distribution function for X P oisson(), {1, 2, 3, 4}

Lets generate a sample of n = 100 pseudo random numbers from a Poisson


distribution with = 4. The function rpois may be invoked.
>
>
>
>

set.seed(1000)
n <- 100
lambda <- 4
x <- rpois(n, lambda = lambda)

The opposite of the log-Likelihood function


n
X

xi e
log [L(x1 , . . . , xn ; )] =
ln
xi !
i=1


can be obtained by using the function dpois with the argument log=TRUE.
> ll <- function(theta) -sum(dpois(x, lambda = theta,
log = TRUE))
As starting value we use

174

Maximum Likelihood Estimation and Specification Tests

> theta.start <- 0.5


and invoke nlm to obtain the parameter estimates
> out <- nlm(ll, theta.start, hessian = TRUE)
> theta.hat <- out$estimate
> theta.hat
[1] 3.789998
> fish <- out$hessian
> fish
[,1]
[1,] 26.37997
> solve(fish)
[,1]
[1,] 0.03790754
The behaviour of the logLikelihood function is shown in Figure 6.16 that can be
obtained with the code:
> xx <- seq(0, 10, l = 500)
> yy <- sapply(1:length(xx), function(i) -ll(xx[i]))
> plot(xx, yy, type = "l", xlab = expression(lambda),
ylab = "logLikelihood")
The estimation may also be performed by having recourse to the packages stat4 or
bbmle. For the sake of clarity we change the name to the parameter to estimate, to
remark that the starting value has to be specified as an element of a list.
> ll <- function(lambdahat) -sum(dpois(x, lambda = lambdahat,
log = TRUE))
> theta.start <- list(lambdahat = 0.5)
> library(stats4)
> out <- mle(ll, start = theta.start)
> summary(out)
Maximum likelihood estimation
Call:
mle(minuslogl = ll, start = theta.start)
Coefficients:
Estimate Std. Error
lambdahat
3.79 0.1946792
-2 log L: 405.4604
> library(bbmle)
> out <- mle2(ll, start = theta.start)
> summary(out)

1000
1500

logLikelihood

500

Maximum Likelihood Estimation and Specification Tests

10

Figure 6.16 Estimating the parameter of a Poisson distribution.


Loglikelihood function ln(|x1 , . . . , xn )

175

176

Maximum Likelihood Estimation and Specification Tests

Maximum likelihood estimation


Call:
mle2(minuslogl = ll, start = theta.start)
Coefficients:
Estimate Std. Error z value
Pr(z)
lambdahat 3.79000
0.19468 19.468 < 2.2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
-2 log L: 405.4604
By plotting the result of the function profile applied to an mle2 object it is possible
to investigate the behaviour of the objective function near the solution and obtain
graphical confidence intervals, see Figure 6.17.
> plot(profile(out))
We now consider the behaviour of the Likelihood function
L(|x1 , . . . , xn ; ) =

n
Y
xi e
i=1

xi !

for different sample sizes. Let generate a sample x of length 250 from X
P oisson( = 4) and define the function llplot to obtain the graph of the Likelihood
function for the n initial elements of x.
> set.seed(1000)
> x <- rpois(250, lambda = 4)
> llplot <- function(n) {
x <- x[1:n]
xx <- seq(0, 10, l = 500)
ll <- function(theta) prod(dpois(x, lambda = theta))
yy <- sapply(1:length(xx), function(i) ll(xx[i]))
plot(xx, yy, type = "l", xlab = expression(lambda),
ylab = "Likelihood", sub = paste("n: ", n,
", Sample mean: ", round(mean(x), 2),
sep = ""))
}
> pdf("Chapter06-poissonlikelihood.pdf")
> layout(matrix(1:4, 2, 2, byrow = TRUE))
> sapply(c(100, 150, 200, 250), function(i) llplot(i))
> dev.off()
Figure 6.18 reports the Likelihood behaviour for (0, 10). Observe that the graphs
do not have the same scale for the Likelihood. The Likelihood gets more and more
concentrated by increasing the sample size.

Maximum Likelihood Estimation and Specification Tests

177

Likelihood profile: lambdahat

2.0

2.5

99%

95%

1.5

90%

1.0

80%

0.0

0.5

50%

3.4

3.6

3.8

4.0

4.2

lambdahat

Figure 6.17 Estimating the parameter of a Poisson distribution.


Profile of the Likelihood function

6.5

Linear model

Lets generate the data according to a linear model formulation


yi = 1 + 2 x1i + 3 x2i + i
with i iid realizations of a Normal random variable with = 0 and 2 = 4.
The elements of X1 and X2 are obtained as iid realizations from a Normal random
variable with = 2 and 2 = 1.
The parameters are set to 1 = 10, 2 = 2 and 3 = 3.
>
>
>
>
>

set.seed(1000)
n <- 100
beta <- c(10, 2:3)
E <- rnorm(n, mean = 0, sd = 2)
W <- matrix(rnorm(n * (length(beta) - 1), mean = 2,
sd = 1), nrow = n, byrow = TRUE)
> X <- cbind(1, W)
> Y <- X %*% beta + E

178

1.5e134

Likelihood

0.0e+00

4e89
0e+00

Likelihood

8e89

Maximum Likelihood Estimation and Specification Tests

10

10

Likelihood

0e+00

2e181

4e225

4e181

8e225

n: 150, Sample mean: 3.88

0e+00

Likelihood

n: 100, Sample mean: 3.79

n: 200, Sample mean: 4.07

10

10

n: 250, Sample mean: 4.08

Figure 6.18 Estimating the parameter of an Poisson distribution.


Likelihood function L(|x1 , . . . , xn ), perspective plots (n = 100, 150, 200, 250)

The log-Likelihood function, which depend on the parameters 1 , 2 , 3 and 2 ,


collected in the vector = [1 , 2 , 3 , 4 ] is



n
X
(yi (1 + 2 x1i + 3 x2i ))2
1
exp

ln
log (L(x1 , . . . , xn ; )) =
24
(24 )1/2
i=1
and its opposite can be formalized in the following way
> ll <- function(theta) -sum(dnorm(Y - X %*% theta[1:length(beta)],
mean = 0, sd = theta[length(beta) + 1]^0.5, log = TRUE))
Starting values for the minimization algorithm, via nlm, are defined randomly for the
linear model parameters, by only ensuring that the starting value pertaining to the
variance of the error is positive.
>
>
>
>

beta.start <- c(rnorm(length(beta)), runif(1) * 10)


out <- nlm(ll, beta.start, hessian = TRUE)
beta.hat <- out$estimate
beta.hat

Maximum Likelihood Estimation and Specification Tests

179

[1] 9.958192 1.841883 3.207483 3.953898


> fish <- out$hessian
> fish
[,1]
[,2]
[,3]
[,4]
[1,] 25.29149416 52.850482554 49.364456806 -0.003152660
[2,] 52.85048255 133.812630879 103.181750523 -0.003085456
[3,] 49.36445681 103.181750523 117.183959136 -0.004704955
[4,] -0.00315266 -0.003085456 -0.004704955 3.197022477
> solve(fish)
[,1]
[,2]
[,3]
[,4]
[1,] 0.4087678112 -8.929471e-02 -9.357096e-02 1.792117e-04
[2,] -0.0892947085 4.278377e-02 -5.563977e-05 -4.684676e-05
[3,] -0.0935709619 -5.563977e-05 4.799992e-02 -2.168630e-05
[4,] 0.0001792117 -4.684676e-05 -2.168630e-05 3.127911e-01
The estimation may also be performed by having recourse to the packages stat4 or
bbmle. Observe that the minus log-likelihood must be expressed as a function of the
single parameters which are collected in a list to define the starting values.
> ll <- function(beta1, beta2, beta3, sigma2) -sum(dnorm(Y X %*% c(beta1, beta2, beta3), mean = 0, sd = sigma2^0.5,
log = TRUE))
> beta.start <- list(beta1 = rnorm(1), beta2 = rnorm(1),
beta3 = rnorm(1), sigma2 = runif(1) * 10)
> library(stats4)
> out <- mle(ll, start = beta.start)
> summary(out)
Maximum likelihood estimation
Call:
mle(minuslogl = ll, start = beta.start)
Coefficients:
Estimate Std. Error
beta1 9.957560 0.6394234
beta2 1.841974 0.2068663
beta3 3.207276 0.2191142
sigma2 3.954813 0.5594242
-2 log L: 421.258
> library(bbmle)
> out <- mle2(ll, start = beta.start)
> summary(out)
Maximum likelihood estimation
Call:

180

Maximum Likelihood Estimation and Specification Tests

mle2(minuslogl = ll, start = beta.start)


Coefficients:
Estimate Std. Error z value
beta1
9.95756
0.63942 15.5727
beta2
1.84197
0.20687 8.9042
beta3
3.20728
0.21911 14.6375
sigma2 3.95481
0.55942 7.0694
--Signif. codes: 0 "***" 0.001 "**"

Pr(z)
< 2.2e-16
< 2.2e-16
< 2.2e-16
1.556e-12

***
***
***
***

0.01 "*" 0.05 "." 0.1 " " 1

-2 log L: 421.258
The parameter estimation of the linear model can be obtained also via OLS, by means
of the function lm.
> summary(lm(Y ~ W))
Call:
lm(formula = Y ~ W)
Residuals:
Min
1Q
-4.5902 -1.2625

Median
0.1035

3Q
1.0177

Max
5.2219

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
9.9582
0.6492
15.34 < 2e-16 ***
W1
1.8419
0.2100
8.77 6.04e-14 ***
W2
3.2075
0.2225
14.42 < 2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 2.019 on 97 degrees of freedom
Multiple R-squared: 0.7462,
Adjusted R-squared:
F-statistic: 142.6 on 2 and 97 DF, p-value: < 2.2e-16

0.7409

Observe that the parameter standard errors and their p-values provided by Maximum
Likelihood are based on the asymptotic covariance matrix and on the normality
assumption for the asymptotic distributions of the parameter estimators, while those
obtained in the linear model via OLS are based on an unbiased estimate for the
variance of the residuals and on the t distribution (always under the assumption of
normality of the errors).
The estimator for the residual standard error can be obtained, for Maximum
Likelihood, as:
> out@coef[length(beta) + 1]^0.5
sigma2
1.988671

Likelihood profile: beta1

Likelihood profile: beta2

2.5

99%

2.5

99%

2.0

95%

2.0

95%
90%
80%

0.5

50%

0.0

0.0

0.5

50%

8.5 9.0 9.5

10.5

11.5

1.4

1.6

1.8

2.0

2.2

2.4

99%

2.5

Likelihood profile: sigma2


99%

95%

2.0

Likelihood profile: beta3


2.5

beta2

2.0

beta1

95%
90%
80%

1.0

80%

1.5

1.5

90%

1.0

181

1.0

80%

1.5

1.5

90%

1.0

Maximum Likelihood Estimation and Specification Tests

0.5

50%

0.0

0.0

0.5

50%

2.6

2.8

3.0

3.2

3.4

3.6

3.8

beta3

3.0

3.5

4.0

4.5

5.0

5.5

6.0

sigma2

Figure 6.19

Likelihood function profiles

This is a biased estimate; the unbiased estimate that coincides with the OLS one is
> (out@coef[length(beta) + 1] * n/(n - length(beta)))^0.5
sigma2
2.01919
Observe that the function mle in the package bbmle produces as a result an object
of class S4. By executing the instruction str(out), you will notice that it is not a
traditional list, but some of its values are identified with the symbol @, which was
also used above to extract the coefficients.
By plotting the result of the function profile applied to an mle2 object it is
possible to investigate the behaviour of the objective function near the solution and
obtain graphical confidence intervals, see Figure 6.19.
> plot(profile(out))

182

6.6

Maximum Likelihood Estimation and Specification Tests

Individual wages (Section 2.5.5)

The preceding code is finally applied to the estimation of the linear model (2.2), we
considered in Chapter 2.
WAGE = 1 + 2 MALE + 3 SCHOOL + 4 EXPER + ERROR
> wages <- read.table(unzip("wages_in_the_USA.zip",
"wages1.dat"), header = TRUE)
> regr <- lm(WAGE ~ MALE + SCHOOL + EXPER, data = wages)
> summary(regr)
Call:
lm(formula = WAGE ~ MALE + SCHOOL + EXPER, data = wages)
Residuals:
Min
1Q Median
-7.654 -1.967 -0.457

3Q
Max
1.444 34.194

Coefficients:
Estimate Std. Error t value
(Intercept) -3.38002
0.46498 -7.269
MALE
1.34437
0.10768 12.485
SCHOOL
0.63880
0.03280 19.478
EXPER
0.12483
0.02376
5.253
--Signif. codes: 0 "***" 0.001 "**" 0.01

Pr(>|t|)
4.50e-13
< 2e-16
< 2e-16
1.59e-07

***
***
***
***

"*" 0.05 "." 0.1 " " 1

Residual standard error: 3.046 on 3290 degrees of freedom


Multiple R-squared: 0.1326,
Adjusted R-squared: 0.1318
F-statistic: 167.6 on 3 and 3290 DF, p-value: < 2.2e-16
> attach(wages)
> ll <- function(beta) -sum(dnorm(WAGE - model.matrix(regr) %*%
beta[1:4], mean = 0, sd = beta[5]^0.5, log = TRUE))
> beta.start <- c(rnorm(dim(model.matrix(regr))[2]),
10)
> out <- nlm(ll, beta.start, hessian = TRUE)
> beta.hat <- out$estimate
> beta.hat
[1] -3.3797358 1.3443640 0.6387780 0.1248189 9.2677087
> fish <- out$hessian
> round(fish, 4)
[,1]
[,2]
[,3]
[,4]
[,5]
[1,] 355.4274 186.1301 4133.816 2858.8511 -0.0019
[2,] 186.1301 186.1301 2129.761 1549.7899 -0.0013
[3,] 4133.8157 2129.7606 49054.736 32988.8446 -0.2620

Maximum Likelihood Estimation and Specification Tests

183

[4,] 2858.8511 1549.7899 32988.845 24859.3271 -0.1328


[5,]
-0.0019
-0.0013
-0.262
-0.1328 19.1680
> round(solve(fish), 4)
[,1]
[,2]
[,3]
[,4]
[,5]
[1,] 0.2160 -0.0078 -0.0138 -6e-03 -0.0002
[2,] -0.0078 0.0116 0.0003 -3e-04 0.0000
[3,] -0.0138 0.0003 0.0011 1e-04 0.0000
[4,] -0.0060 -0.0003 0.0001 6e-04 0.0000
[5,] -0.0002 0.0000 0.0000 0e+00 0.0522
> ll <- function(beta) -sum(dnorm(WAGE - cbind(1, MALE,
SCHOOL, EXPER) %*% beta[1:4], mean = 0, sd = beta[5]^0.5,
log = TRUE))
> beta.start <- c(rnorm(4), 10)
> out <- nlm(ll, beta.start, hessian = TRUE)
> beta.hat <- out$estimate
> beta.hat
[1] -3.3797292 1.3443662 0.6387777 0.1248184 9.2677158
> fish <- out$hessian
> round(fish, 4)
[,1]
[,2]
[,3]
[,4]
[,5]
[1,] 355.4274 186.1300 4133.812 2858.8491 -0.0018
[2,] 186.1300 186.1300 2129.759 1549.7887 -0.0013
[3,] 4133.8124 2129.7589 49054.698 32988.8189 -0.2620
[4,] 2858.8491 1549.7887 32988.819 24859.3078 -0.1327
[5,]
-0.0018
-0.0013
-0.262
-0.1327 19.1679
> round(solve(fish), 4)
[,1]
[,2]
[,3]
[,4]
[,5]
[1,] 0.2159 -0.0078 -0.0138 -6e-03 -0.0002
[2,] -0.0078 0.0116 0.0003 -3e-04 0.0000
[3,] -0.0138 0.0003 0.0011 1e-04 0.0000
[4,] -0.0060 -0.0003 0.0001 6e-04 0.0000
[5,] -0.0002 0.0000 0.0000 0e+00 0.0522
> ll <- function(beta1, beta2, beta3, beta4, sigma2) -sum(dnorm(WAGE cbind(1, MALE, SCHOOL, EXPER) %*% c(beta1, beta2,
beta3, beta4), mean = 0, sd = sigma2^0.5,
log = TRUE))
> beta.start <- list(beta1 = 1, beta2 = 2, beta3 = 3,
beta4 = 4, sigma2 = 4)
> library(stats4)
> out <- mle(ll, start = beta.start)
> summary(out)
Maximum likelihood estimation
Call:
mle(minuslogl = ll, start = beta.start)

184

Maximum Likelihood Estimation and Specification Tests

Coefficients:
Estimate
beta1 -3.3800069
beta2
1.3443683
beta3
0.6387972
beta4
0.1248248
sigma2 9.2677234

Std. Error
0.46469418
0.10761051
0.03277593
0.02374833
0.22836335

-2 log L: 16682.18
> library(bbmle)
> out <- mle2(ll, start = beta.start)
> summary(out)
Maximum likelihood estimation
Call:
mle2(minuslogl = ll, start = beta.start)
Coefficients:
Estimate Std. Error z value
Pr(z)
beta1 -3.380007
0.464694 -7.2736 3.500e-13 ***
beta2
1.344368
0.107611 12.4929 < 2.2e-16 ***
beta3
0.638797
0.032776 19.4898 < 2.2e-16 ***
beta4
0.124825
0.023748 5.2562 1.471e-07 ***
sigma2 9.267723
0.228363 40.5832 < 2.2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
-2 log L: 16682.18
> logLik(out)
'log Lik.' -8341.091 (df=5)
> round(vcov(out), 4)
beta1
beta2
beta3 beta4 sigma2
beta1
0.2159 -0.0078 -0.0138 -6e-03 0.0000
beta2 -0.0078 0.0116 0.0003 -3e-04 0.0000
beta3 -0.0138 0.0003 0.0011 1e-04 0.0000
beta4 -0.0060 -0.0003 0.0001 6e-04 0.0000
sigma2 0.0000 0.0000 0.0000 0e+00 0.0521
By plotting the result of the function profile applied to an mle2 object it is possible
to investigate the behaviour of the objective function near the solution and obtain
graphical confidence intervals, see Figure 6.20.
> plot(profile(out))

Likelihood profile: beta1

Likelihood profile: beta2

Likelihood profile: beta3

2.5

99%

2.5

99%

2.5

99%

2.0

95%

2.0

95%

2.0

Maximum Likelihood Estimation and Specification Tests

95%

1.5
1.5

beta2

Likelihood profile: beta4

Likelihood profile: sigma2

99%

99%

95%

95%

0.60

0.65

0.70

beta3

1.5

80%
1.0

1.0

80%

0.55

90%

1.5

90%

50%

0.5

1.3

2.5

1.1

2.0

2.5

beta1

2.5

3.5

2.0

4.5

80%

0.0

0.5

50%

0.0

0.0

0.5

50%

90%

1.0

80%
1.0

1.0

80%

1.5

90%

1.5

90%

0.5

50%

0.0

0.0

0.5

50%

0.06

0.10

0.14
beta4

0.18

8.8

9.2

9.6

sigma2

Figure 6.20

Likelihood function profiles

185

7
Models with Limited Dependent
Variables
7.1

The Impact of Unemployment Benefits on Recipiency


(Section 7.1.6)

Data are available in the file BENEFITS.WF1, which is a work file of EViews.
First invoke the package hexView and next the command readEViews to read data.
The file is extracted from the compressed archive ch07.zip with the function unzip.
> library(hexView)
> benefits <- readEViews(unzip("ch07.zip", "Chapter 7/benefits.wf1"))
Skipping boilerplate variable
Skipping boilerplate variable
The files BENEFITS contain a sample of 4877 blue collar workers that got unemployed
in the USA between 1982 and 1991. The following variables (many are dummy
variables) are present:

STATEUR state unemployment rate (in %)

STATEMB state maximum benefit level

STATE state of residence code

AGE age in years

AGE2 age squared and then divided by 10

TENURE years of tenure in job lost

SLACK1 if job lost due to slack work

ABOL 1 if job lost because position abolished

SEASONAL 1 if job lost becasue seasonal job ended

NWHITE 1 if non white

SCHOOL12 1 if more than 12 years of school

MALE 1 if male

188

Models with Limited Dependent Variables

BLUECOL 1 if blue collar worker

SMSA 1 if live in a metropolitan area

MARRIED 1 if married

DKIDS 1 if kids

DYKIDS 1 if young kids (0-5 yrs)

YRDISPL year of job displacement (1982=1,..., 1991=10)

RR replacement rate

RR2 RR squared

HEAD 1 if head of household

Y 1 if applied for (and received) UI benefits

Verbeek proposes the estimation of three different models to explain the


unemployment workers choice to apply for unemployment benefits:

a linear probability model

a Logit model

a Probit model

7.1.1

Estimation of the linear probability model

The linear probability model can be estimated (without making any attempt to
constrain the implied probabilities between 0 and 1) with the function lm, we used
for linear models, see Chapter 2.
> lpmfit <- lm(Y ~ RR + RR2 + AGE + AGE2 + TENURE +
SLACK + ABOL + SEASONAL + HEAD + MARRIED + DKIDS +
DYKIDS + SMSA + NWHITE + YRDISPL + SCHOOL12 +
MALE + STATEMB + STATEUR, data = benefits)
> summary(lpmfit)
Call:
lm(formula = Y ~ RR + RR2 + AGE + AGE2 + TENURE + SLACK + ABOL +
SEASONAL + HEAD + MARRIED + DKIDS + DYKIDS + SMSA + NWHITE +
YRDISPL + SCHOOL12 + MALE + STATEMB + STATEUR, data = benefits)
Residuals:
Min
1Q
-0.9706 -0.5374

Median
0.2231

3Q
0.3347

Max
0.6770

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.0768689 0.1220560 -0.630 0.52887
RR
0.6288584 0.3842068
1.637 0.10174
RR2
-1.0190587 0.4809550 -2.119 0.03416 *

Models with Limited Dependent Variables

AGE
0.0157489
AGE2
-0.0014595
TENURE
0.0056531
SLACK
0.1281283
ABOL
-0.0065206
SEASONAL
0.0578745
HEAD
-0.0437490
MARRIED
0.0485952
DKIDS
-0.0305088
DYKIDS
0.0429115
SMSA
-0.0351950
NWHITE
0.0165889
YRDISPL
-0.0133149
SCHOOL12
-0.0140365
MALE
-0.0363176
STATEMB
0.0012394
STATEUR
0.0181479
--Signif. codes: 0 "***"

0.0047841
0.0006016
0.0012152
0.0142249
0.0248281
0.0357985
0.0166430
0.0161348
0.0174321
0.0197563
0.0140138
0.0187109
0.0030686
0.0168433
0.0178142
0.0002039
0.0030843

3.292
-2.426
4.652
9.007
-0.263
1.617
-2.629
3.012
-1.750
2.172
-2.511
0.887
-4.339
-0.833
-2.039
6.078
5.884

0.00100
0.01530
3.37e-06
< 2e-16
0.79285
0.10601
0.00860
0.00261
0.08016
0.02990
0.01206
0.37534
1.46e-05
0.40468
0.04154
1.31e-09
4.28e-09

189

**
*
***
***

**
**
.
*
*
***
*
***
***

0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1

Residual standard error: 0.4501 on 4857 degrees of freedom


Multiple R-squared: 0.06691,
Adjusted R-squared: 0.06326
F-statistic: 18.33 on 19 and 4857 DF, p-value: < 2.2e-16
The predicted probability for each worker to apply for unemployment benefits can be
extracted from lpmfit with lpmfit$fitted.

7.1.2

Estimation of the Logit model

The parameters in the Logit model can be estimated with the function glm which is
used to fit generalized linear models. These models are specified by giving a symbolic
description of (see Hardin and Hilbe (2007)):
1. the probability distribution function (belonging to the exponential family) of
the dependent variable (response).
2. the linear systematic component relating the predictor, = X, to the product
of the matrix X containing the explanatory variables with the parameters .
3. the link function relating the mean of the response to the linear predictor.
This is made with the two main arguments of the function glm:

a model formula for the linear systematic component with the same structure
used for defining linear models,

the family which is a description of the probability distribution of the


dependent variable, with the specification of the link function.

In case of a logit model we have a binomial distribution Bin(1, p) (Bernoulli


distribution) for the dependent variable and the link can be specified as "logit".

190

Models with Limited Dependent Variables

Finally the data.frame containing the variables in the model may be specified with
the argument data.
By default the estimation method used by R for generalized linear models consists
in iteratively reweighted least squares (IWLS). See the help ?glm for more information
on the features of the glm function and Hardin and Hilbe (2007) for a detailed
presentation of generalized linear models.
> logitfit <- glm(Y ~ RR + RR2 + AGE + AGE2 + TENURE +
SLACK + ABOL + SEASONAL + HEAD + MARRIED + DKIDS +
DYKIDS + SMSA + NWHITE + YRDISPL + SCHOOL12 +
MALE + STATEMB + STATEUR, family = binomial(link = "logit"),
data = benefits)
> summary(logitfit)
Call:
glm(formula = Y ~ RR + RR2 + AGE + AGE2 + TENURE + SLACK + ABOL +
SEASONAL + HEAD + MARRIED + DKIDS + DYKIDS + SMSA + NWHITE +
YRDISPL+SCHOOL12+MALE+STATEMB+STATEUR, family = binomial(link =
"logit"), data = benefits)
Deviance Residuals:
Min
1Q
Median
-2.2024 -1.2216
0.6959

3Q
0.8844

Max
1.6015

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.800498
0.604168 -4.635 3.56e-06 ***
RR
3.068078
1.868226
1.642 0.10054
RR2
-4.890616
2.333522 -2.096 0.03610 *
AGE
0.067697
0.023910
2.831 0.00463 **
AGE2
-0.005968
0.003038 -1.964 0.04950 *
TENURE
0.031249
0.006644
4.703 2.56e-06 ***
SLACK
0.624822
0.070639
8.845 < 2e-16 ***
ABOL
-0.036175
0.117808 -0.307 0.75879
SEASONAL
0.270874
0.171171
1.582 0.11354
HEAD
-0.210682
0.081226 -2.594 0.00949 **
MARRIED
0.242266
0.079410
3.051 0.00228 **
DKIDS
-0.157927
0.086218 -1.832 0.06699 .
DYKIDS
0.205894
0.097492
2.112 0.03470 *
SMSA
-0.170354
0.069781 -2.441 0.01464 *
NWHITE
0.074070
0.092956
0.797 0.42555
YRDISPL
-0.063700
0.014997 -4.247 2.16e-05 ***
SCHOOL12
-0.065258
0.082413 -0.792 0.42845
MALE
-0.179829
0.087535 -2.054 0.03994 *
STATEMB
0.006027
0.001009
5.973 2.33e-09 ***
STATEUR
0.095620
0.015912
6.009 1.86e-09 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1

Models with Limited Dependent Variables

191

(Dispersion parameter for binomial family taken to be 1)


Null deviance: 6086.1
Residual deviance: 5746.4
AIC: 5786.4

on 4876
on 4857

degrees of freedom
degrees of freedom

Number of Fisher Scoring iterations: 4


The predicted probabilities for each worker to apply for unemployment benefits are
stored in the object logitfit as fitted and can be extracted with logitfit$fitted.

7.1.3

Estimation of the Probit model

The parameters in a Probit model can be estimated by using the same function glm,
that allowed the Logit estimates to be obtained.
We only need to specify the "probit" link.
> probitfit <- glm(Y ~ RR + RR2 + AGE + AGE2 + TENURE +
SLACK + ABOL + SEASONAL + HEAD + MARRIED + DKIDS +
DYKIDS + SMSA + NWHITE + YRDISPL + SCHOOL12 +
MALE + STATEMB + STATEUR, family = binomial(link = "probit"),
data = benefits)
> summary(probitfit)
Call:
glm(formula = Y ~ RR + RR2 + AGE + AGE2 + TENURE + SLACK + ABOL +
SEASONAL + HEAD + MARRIED + DKIDS + DYKIDS + SMSA + NWHITE +
YRDISPL+SCHOOL12+MALE+STATEMB+STATEUR, family = binomial(link =
"probit"), data = benefits)
Deviance Residuals:
Min
1Q
Median
-2.2247 -1.2269
0.6988

3Q
0.8884

Max
1.5834

Coefficients:
(Intercept)
RR
RR2
AGE
AGE2
TENURE
SLACK
ABOL
SEASONAL
HEAD
MARRIED

Estimate Std. Error z value Pr(>|z|)


-1.6999888 0.3629508 -4.684 2.82e-06 ***
1.8634727 1.1293243
1.650 0.09893 .
-2.9804349 1.4119428 -2.111 0.03478 *
0.0422140 0.0143142
2.949 0.00319 **
-0.0037741 0.0018118 -2.083 0.03724 *
0.0176943 0.0038476
4.599 4.25e-06 ***
0.3754931 0.0423881
8.858 < 2e-16 ***
-0.0223136 0.0718629 -0.311 0.75618
0.1612070 0.1040951
1.549 0.12147
-0.1247463 0.0490620 -2.543 0.01100 *
0.1454763 0.0478152
3.042 0.00235 **

192

Models with Limited Dependent Variables

DKIDS
-0.0965778
DYKIDS
0.1236097
SMSA
-0.1001520
NWHITE
0.0517937
YRDISPL
-0.0384797
SCHOOL12
-0.0415517
MALE
-0.1067168
STATEMB
0.0036399
STATEUR
0.0568271
--Signif. codes: 0 "***"

0.0518420
0.0586377
0.0418419
0.0558335
0.0090509
0.0497219
0.0527404
0.0006065
0.0094328

-1.863 0.06247 .
2.108 0.03503 *
-2.394 0.01668 *
0.928 0.35359
-4.251 2.12e-05 ***
-0.836 0.40333
-2.023 0.04303 *
6.002 1.95e-09 ***
6.024 1.70e-09 ***

0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1

(Dispersion parameter for binomial family taken to be 1)


Null deviance: 6086.1
Residual deviance: 5748.1
AIC: 5788.1

on 4876
on 4857

degrees of freedom
degrees of freedom

Number of Fisher Scoring iterations: 4


To obtain the predicted probabilities of applying for unemployment benefits use
probitfit$fitted like for the Logit model.

7.1.4

A unique table for comparing model estimates

To compare the parameter estimates in the preceding model specifications we can use
the function mtable available in the package memisc. See p. 21 for the use of mtable.
> library(memisc)
> table7.2 <- mtable(LPM = lpmfit, Logit = logitfit,
Probit = probitfit)
> table7.2 <- relabel(table7.2, "(Intercept)" = "constant",
RR = "replacement rate", RR2 = "replacement rate^2",
AGE = "age", AGE2 = "age^2/10", TENURE = "tenure",
SLACK = "slack work", ABOL = "abolished position",
SEASONAL = "seasonal work", HEAD = "head of household",
MARRIED = "married", DKIDS="children",DYKIDS="youngchildren",
SMSA = "live in SMSA", NWHITE = "non white",
YRDISPL = "year of displacement", SCHOOL12=">12yearsofschool",
MALE = "male", STATEMB = "state max. befefits",
STATEUR = "state unempl. befefits")
> table7.2
Calls:
LPM: lm(formula = Y ~ RR + RR2 + AGE + AGE2 + TENURE + SLACK + ABOL +
SEASONAL + HEAD + MARRIED + DKIDS + DYKIDS + SMSA + NWHITE +
YRDISPL + SCHOOL12 + MALE + STATEMB + STATEUR, data = benefits)
Logit: glm(formula = Y ~ RR + RR2 + AGE + AGE2 + TENURE + SLACK + ABOL+

Models with Limited Dependent Variables

193

SEASONAL + HEAD + MARRIED + DKIDS + DYKIDS + SMSA + NWHITE +


YRDISPL + SCHOOL12 + MALE + STATEMB + STATEUR, family =
binomial(link = "logit"), data = benefits)
Probit: glm(formula = Y ~ RR + RR2 + AGE + AGE2 + TENURE + SLACK + ABOL+
SEASONAL + HEAD + MARRIED + DKIDS + DYKIDS + SMSA + NWHITE +
YRDISPL + SCHOOL12 + MALE + STATEMB + STATEUR, family =
binomial(link = "probit"), data = benefits)
=====================================================
LPM
Logit
Probit
----------------------------------------------------constant
-0.077
-2.800*** -1.700***
(0.122)
(0.604)
(0.363)
replacement rate
0.629
3.068
1.863
(0.384)
(1.868)
(1.129)
replacement rate^2
-1.019*
-4.891*
-2.980*
(0.481)
(2.334)
(1.412)
age
0.016**
0.068**
0.042**
(0.005)
(0.024)
(0.014)
age^2/10
-0.001*
-0.006*
-0.004*
(0.001)
(0.003)
(0.002)
tenure
0.006*** 0.031*** 0.018***
(0.001)
(0.007)
(0.004)
slack work
0.128*** 0.625*** 0.375***
(0.014)
(0.071)
(0.042)
abolished position
-0.007
-0.036
-0.022
(0.025)
(0.118)
(0.072)
seasonal work
0.058
0.271
0.161
(0.036)
(0.171)
(0.104)
head of household
-0.044** -0.211** -0.125*
(0.017)
(0.081)
(0.049)
married
0.049**
0.242**
0.145**
(0.016)
(0.079)
(0.048)
children
-0.031
-0.158
-0.097
(0.017)
(0.086)
(0.052)
young children
0.043*
0.206*
0.124*
(0.020)
(0.097)
(0.059)
live in SMSA
-0.035*
-0.170*
-0.100*
(0.014)
(0.070)
(0.042)
non white
0.017
0.074
0.052
(0.019)
(0.093)
(0.056)
year of displacement
-0.013*** -0.064*** -0.038***
(0.003)
(0.015)
(0.009)
> 12 years of school
-0.014
-0.065
-0.042
(0.017)
(0.082)
(0.050)
male
-0.036*
-0.180*
-0.107*

194

Models with Limited Dependent Variables

(0.018)
(0.088)
(0.053)
0.001*** 0.006*** 0.004***
(0.000)
(0.001)
(0.001)
state unempl. befefits
0.018*** 0.096*** 0.057***
(0.003)
(0.016)
(0.009)
----------------------------------------------------R-squared
0.067
adj. R-squared
0.063
sigma
0.450
F
18.331
p
0.000
0.000
0.000
Log-likelihood
-3016.708 -2873.197 -2874.071
Deviance
983.900 5746.393 5748.142
AIC
6075.415 5786.393 5788.142
BIC
6211.753 5916.239 5917.987
N
4877
4877
4877
Aldrich-Nelson R-sq.
0.065
0.065
McFadden R-sq.
0.056
0.056
Cox-Snell R-sq.
0.067
0.067
Nagelkerke R-sq.
0.094
0.094
phi
1.000
1.000
Likelihood-ratio
339.663
337.914
=====================================================
state max. befefits

Observe that the R-squared, adj. R-squared, sigma and F final statistics do not
have any statistical relevance for a linear probability model; so they have not to be
considered.
The estimated marginal effect for TENURE, evaluated at the sample average of the
regressors, can be obtained as:
> xlevels <- apply(logitfit$model[, -1], 2, mean)
> avefitlogit <- c(1, xlevels) %*% logitfit$coef
> exp(avefitlogit)/(1 + exp(avefitlogit))^2 * coef(logitfit)["TENURE"]
[,1]
[1,] 0.00659471
for the logit model and
> avefitprobit <- c(1, xlevels) %*% probitfit$coef
> dnorm(avefitprobit) * coef(probitfit)["TENURE"]
[,1]
[1,] 0.006203453
for the probit model. The estimated marginal effect of being married for the average
person is:
> exp(avefitlogit)/(1 + exp(avefitlogit))^2 * coef(logitfit)["MARRIED"]

Models with Limited Dependent Variables

195

[,1]
[1,] 0.05112677
> dnorm(avefitprobit) * coef(probitfit)["MARRIED"]
[,1]
[1,] 0.05100272
in the two specifications.

7.1.5
The

Some additional goodness of fit measures

Rp2 ,

see Verbeeks relationship (7.21) is defined as:


Rp2 = 1

wr1
wr0

where wr1 and wr0 are the proportions of incorrect predictions respectively for the
considered model and for a model containing only an intercept.
Rp2 (and the HM index, see Verbeeks Section 7.1.5) can be obtained by defining a
function R2p:
> R2p <- function(y, estobject, cutoff = 0.5) {
a <- table(y, (estobject$fitted > cutoff) * 1)
wr_1 <- 1 - sum(diag(a))/sum(a)
phat <- sum(y)/length(y)
wr_0 <- (1 - phat) * (phat > cutoff) + phat *
(phat <= cutoff)
pa <- prop.table(a, 1)
return(list("Cross-tabulation of actual and predicted outcomes" =
a, Rsq_p = round(1 - wr_1/wr_0, 4), HM = round(pa[1,
1] + pa[2, 2], 4)))
}
estobject can be an object of clas lm or glm. Observe that $fitted extracts
two different type of predicted values; in particular for glm they correspond to the
transformation through the Logit and Probit functions of the predicted values from
a linear specification, see Verbeeks relationship (7.15). cutoff is a threshold value:
when the estimated probality is larger than cutoff the fitted response is set equal to
1, otherwise the response is set to 0. Verbeek assumes cutoff = 0.5.
The Rp2 may then be obtained for the preceding models by invoking the function
R2p, specifying the actual outcomes for the y argument, benefits$Y, and the objects
(lpmfit, logitfit and probitfit) resulting respectively from the linear probability,
Logit and Probit model estimation procedures.
> R2p(y = benefits$Y, estobject = lpmfit, cutoff = 0.5)
$"Cross-tabulation of actual and predicted outcomes"
y
0

0
1
184 1358

196

Models with Limited Dependent Variables

130 3205

$Rsq_p
[1] 0.035
$HM
[1] 1.0803
> R2p(benefits$Y, logitfit)
$"Cross-tabulation of actual and predicted outcomes"
y
0
1

0
1
242 1300
171 3164

$Rsq_p
[1] 0.046
$HM
[1] 1.1057
> R2p(benefits$Y, probitfit)
$"Cross-tabulation of actual and predicted outcomes"
y
0
1

0
1
231 1311
162 3173

$Rsq_p
[1] 0.0447
$HM
[1] 1.1012
With the function pR2, available in the package pscl the following pseudo-R2
measures may be produced in presence of a glm object, like the Logit and Probit
ones, that results applying the glm function, see Hardin and Hilbe (2007).

llh: The log-likelihood from the fitted model

llhNull: The log-likelihood from the intercept-only restricted model

G2: Minus two times the difference in the log-likelihoods

McFadden: McFaddens pseudo r-squared

r2ML: Maximum likelihood pseudo r-squared

r2CU: Cragg and Uhlers pseudo r-squared

See the help ?pscl::pR2 and Hardin and Hilbe (2007) for more information.

Models with Limited Dependent Variables

> library(pscl)
> pR2(logitfit)
llh
llhNull
-2.873197e+03 -3.043028e+03
r2ML
r2CU
6.727594e-02 9.436996e-02
> pR2(probitfit)
llh
llhNull
-2.874071e+03 -3.043028e+03
r2ML
r2CU
6.694146e-02 9.390077e-02

G2
3.396629e+02

McFadden
5.581002e-02

G2
3.379143e+02

McFadden
5.552271e-02

197

A cross-tabulation (like the one returned by the function R2p) of actual outcomes
against predicted outcomes for discrete data models, with summary statistics such as
the percentage of correctly predicted under fitted and null models may be obtained
by applying the function hitmiss available in the package pscl to a glm object.
The user can also specify a classification threshold different from 0.5 for the predicted
probabilities by changing the default argument k=0.5.
> hitmiss(logitfit)
Classification Threshold = 0.5
y=0 y=1
yhat=0 242 171
yhat=1 1300 3164
Percent Correctly Predicted = 69.84%
Percent Correctly Predicted = 15.69%,
Percent Correctly Predicted = 94.87%
Null Model Correctly Predicts 68.38%
[1] 69.83802 15.69390 94.87256
> hitmiss(probitfit)
Classification Threshold = 0.5
y=0 y=1
yhat=0 231 162
yhat=1 1311 3173
Percent Correctly Predicted = 69.8%
Percent Correctly Predicted = 14.98%,
Percent Correctly Predicted = 95.14%
Null Model Correctly Predicts 68.38%
[1] 69.79701 14.98054 95.14243

for y = 0
for y = 1

for y = 0
for y = 1

Observe that the log-likelihood for the naive model, where the probability of applying
for benefits is constant, can be obtained, e.g. for the Logit model from the crosstabulation of actual and predicted outcomes, see Verbeeks relationship (7.19):
> a <- R2p(benefits$Y, logitfit)[[1]]
> log0 <- sum(a[2, ]) * log(sum(a[2, ])/sum(a)) + sum(a[1,
]) * log(sum(a[1, ])/sum(a))
> log0

198

Models with Limited Dependent Variables

[1] -3043.028
The function logLik extracts the log-likelihood from a glm object; so the Mc Fadden
pseudo R-squared (returned by the function pR2) can be also computed with:
> 1 - logLik(logitfit)[1]/log0
[1] 0.05581002

7.2

Some remarks on the interpretation of a parameter in


a logit model

According to the logistic link function we have


p=
It follows


log

that is
or

p
1p

exp(X 0 )
.
1 + exp(X 0 )


= 0 + 1 X1 + + p Xp

p
= exp{0 + 1 X1 + + p Xp }
1p
p
= e0 e1 X1 ep Xp .
1p

If x1 (x1 + 1) then, ceteris paribus, p p1 :




p
p
p1

e1 =
1p
1p
1 p1
Lets now study for various p and 1 the relationship between p1 and p, see 7.1
>
>
>
>
>
>
>
>
>

p <- 0:25/25
p <- p[-c(1, length(p))]
odds <- p/(1 - p)
beta <- 0:16/4 - 2
expbeta <- exp(beta)
table <- outer(odds, expbeta, "*")
rownames(table) <- p
colnames(table) <- round(beta, 2)
p1 <- round(table/(1 + table), 3)

If p = 0.4 and 1 = 1.25 when X1 increases of 1 we have p1 = 0.16. Were 1 = 1.25


we would have p1 = 0.70.
The dummy variable case
If X1 is a dummy variable the consequence on p is the same, when considering the
effect for category 1 with respect to category 0.

0.04
0.08
0.12
0.16
0.2
0.24
0.28
0.32
0.36
0.4
0.44
0.48
0.52
0.56
0.6
0.64
0.68
0.72
0.76
0.8
0.84
0.88
0.92
0.96

-1.75

0.01
0.01
0.02
0.03
0.04
0.05
0.06
0.08
0.09
0.10
0.12
0.14
0.16
0.18
0.21
0.24
0.27
0.31
0.35
0.41
0.48
0.56
0.67
0.81

0.01
0.01
0.02
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.10
0.11
0.13
0.15
0.17
0.19
0.22
0.26
0.30
0.35
0.41
0.50
0.61
0.76

0.01
0.02
0.03
0.04
0.05
0.07
0.08
0.10
0.11
0.13
0.15
0.17
0.20
0.22
0.25
0.28
0.32
0.36
0.41
0.47
0.54
0.62
0.72
0.84

-1.5

0.01
0.02
0.04
0.05
0.07
0.08
0.10
0.12
0.14
0.16
0.18
0.21
0.24
0.27
0.30
0.34
0.38
0.42
0.48
0.53
0.60
0.68
0.77
0.87

-1.25
0.01
0.03
0.05
0.06
0.08
0.10
0.12
0.15
0.17
0.20
0.22
0.25
0.28
0.32
0.36
0.40
0.44
0.49
0.54
0.59
0.66
0.73
0.81
0.90

-1
0.02
0.04
0.06
0.08
0.11
0.13
0.15
0.18
0.21
0.24
0.27
0.30
0.34
0.38
0.41
0.46
0.50
0.55
0.60
0.65
0.71
0.78
0.84
0.92

-0.75
0.02
0.05
0.08
0.10
0.13
0.16
0.19
0.22
0.25
0.29
0.32
0.36
0.40
0.44
0.48
0.52
0.56
0.61
0.66
0.71
0.76
0.82
0.88
0.94

-0.5
0.03
0.06
0.10
0.13
0.16
0.20
0.23
0.27
0.30
0.34
0.38
0.42
0.46
0.50
0.54
0.58
0.62
0.67
0.71
0.76
0.80
0.85
0.90
0.95

-0.25
0.04
0.08
0.12
0.16
0.20
0.24
0.28
0.32
0.36
0.40
0.44
0.48
0.52
0.56
0.60
0.64
0.68
0.72
0.76
0.80
0.84
0.88
0.92
0.96

0
0.05
0.10
0.15
0.20
0.24
0.29
0.33
0.38
0.42
0.46
0.50
0.54
0.58
0.62
0.66
0.69
0.73
0.77
0.80
0.84
0.87
0.90
0.94
0.97

0.25
0.06
0.12
0.18
0.24
0.29
0.34
0.39
0.44
0.48
0.52
0.56
0.60
0.64
0.68
0.71
0.75
0.78
0.81
0.84
0.87
0.90
0.92
0.95
0.97

0.5
0.08
0.15
0.22
0.29
0.35
0.40
0.45
0.50
0.54
0.58
0.62
0.66
0.70
0.73
0.76
0.79
0.82
0.84
0.87
0.89
0.92
0.94
0.96
0.98

0.75
0.10
0.19
0.27
0.34
0.40
0.46
0.51
0.56
0.60
0.64
0.68
0.71
0.75
0.78
0.80
0.83
0.85
0.88
0.90
0.92
0.94
0.95
0.97
0.98

1
0.13
0.23
0.32
0.40
0.47
0.52
0.58
0.62
0.66
0.70
0.73
0.76
0.79
0.82
0.84
0.86
0.88
0.90
0.92
0.93
0.95
0.96
0.98
0.99

1.25
0.16
0.28
0.38
0.46
0.53
0.59
0.64
0.68
0.72
0.75
0.78
0.80
0.83
0.85
0.87
0.89
0.90
0.92
0.93
0.95
0.96
0.97
0.98
0.99

1.5
0.19
0.33
0.44
0.52
0.59
0.64
0.69
0.73
0.76
0.79
0.82
0.84
0.86
0.88
0.90
0.91
0.92
0.94
0.95
0.96
0.97
0.98
0.98
0.99

1.75

2
0.23
0.39
0.50
0.58
0.65
0.70
0.74
0.78
0.81
0.83
0.85
0.87
0.89
0.90
0.92
0.93
0.94
0.95
0.96
0.97
0.97
0.98
0.99
0.99

on the rows p and on the columns ; the entries in the table are the new values of p for a unitary variation of x

-2

Table 7.1

Models with Limited Dependent Variables


199

200

Models with Limited Dependent Variables

The logarithm case


When X1 is expressed as a logarithm, say 1 log(X1 ) appears in the model, if
X1 1.01X1 , that is X1 increases 1%, we have:
log(X1 ) log(1.01X1 ) = log(1.01) + log(X1 )

[' 0.01 + log(X1 )]

and
p

1p

p
1p


e

1 log(1.01)

p1
=
1 p1

 


p
p1
0.011
'
e
=
1p
1 p1

Exercise
Produce a table similar to Table 7.1 for the logarithm case. Comment the results.

7.3

Explaining Firms Credit Ratings (Section 7.2.1)

The parameter estimation of an ordered response model is considered and the results
are compared with those pertaining a Logit framework.
Data are available in the file credit.dta, which is in the Stata format and is available
in the compressed archive ch07.zip. To import data, we have first to invoke the
package foreign and next the command read.dta.
> library(foreign)
> credit <- read.dta(unzip("ch07.zip", "Chapter 7/credit.dta"))
The data base contains 921 observations for 2005 for US firms credit ratings, including
a set of firm characteristics. The data are taken from Compustat.
The following variables are available:

booklev ratio of book value of debt / assets

marklev ratio of market value of debt / assets

ebit earnings before interest and taxes / total assets

invgrade dummy, 1 if investment grade, 0 if speculative grade

logsales log firm sales

rating credit rating, from 1 (low) to 7 (high)

reta retained earnings / total assets

wka working capital / total assets

Some summary statistics are first obtained, see Verbeeks Table 7.4, which may be
reproduced with the following code:
> t(apply(credit, 2, function(x) c(mean = mean(x),
median = median(x), min = min(x), max = max(x))))
or

Models with Limited Dependent Variables

201

> t(sapply(credit, function(x) c(mean = mean(x), median = median(x),


min = min(x), max = max(x))))
mean
median
min
max
booklev 0.29318678 0.2639547 0.0000000 0.9992067
ebit
0.09389208 0.0904365 -0.3841692 0.6515085
invgrade 0.47231270 0.0000000 0.0000000 1.0000000
logsales 7.99575383 7.8841529 1.1002775 12.7014179
marklev 0.25472868 0.2111638 0.0000000 0.9648595
rating
3.49945711 3.0000000 1.0000000 7.0000000
reta
0.15699418 0.1804797 -0.9958922 0.9799219
wka
0.14041423 0.1228412 -0.4120839 0.7480223
A binary Logit model is then used to describe the probability to obtain an investment
grade rating, which occurs when the credit rating is 4 or more. To this aim we use
the function glm. The independent variables in the model are the book leverage and
ebit ratios, the log firm sales, the retained earnings over total assets and the working
capital over total assets. We are considering the binomial family with the "logit"
link function.
> logitfit <- glm(invgrade ~ booklev + ebit + logsales +
reta + wka, family = binomial(link = "logit"),
data = credit)
Use summary(logitfit) to produce the output of the estimation procedure.
The ordered logit model may be estimated by having recourse to the function polr
available in the package MASS.
polr fits a logistic or probit regression model to an ordered factor response. So we
have first to recode the variable rating as an ordered factor.
> credit$rating <- ordered(credit$rating)
The default logistic case is proportional odds logistic regression, after which the
function is named. It is however possible to change the link function by fixing the
argument method to one of the following options: "logistic", "probit", "cloglog",
"cauchit".
The main arguments of the function polr are thus:

a model formula with the same structure used for defining linear models, and

the method specification.

> library(MASS)
> orderedlogitfit <- polr(rating ~ booklev + ebit +
logsales + reta + wka, data = credit, method = "logistic")
The results can be resumed in a unique output with the function mtable, available
in the package memisc.

202

Models with Limited Dependent Variables

> library(memisc)
> mtable(logitfit, orderedlogitfit)
Calls:
logitfit: glm(formula = invgrade ~ booklev + ebit + logsales + reta +
wka, family = binomial(link = "logit"), data = credit)
orderedlogitfit: polr(formula = rating ~ booklev + ebit + logsales +
reta + wka, data = credit, method = "logistic")
=====================================================
logitfit
orderedlogitfit
----------------------------------------------------(Intercept)
-8.214***
(0.867)
booklev
-4.427***
-2.752***
(0.771)
(0.477)
ebit
4.355**
4.731***
(1.440)
(0.945)
logsales
1.082***
0.941***
(0.096)
(0.059)
reta
4.116***
3.560***
(0.489)
(0.302)
wka
-4.012***
-2.580***
(0.748)
(0.483)
1|2
-0.370
(0.633)
2|3
4.881***
(0.521)
3|4
7.626***
(0.551)
4|5
9.885***
(0.592)
5|6
12.883***
(0.673)
6|7
14.783***
(0.784)
----------------------------------------------------Aldrich-Nelson R-sq.
0.391
0.484
McFadden R-sq.
0.465
0.309
Cox-Snell R-sq.
0.474
0.608
Nagelkerke R-sq.
0.633
0.639
phi
1.000
Likelihood-ratio
591.796
862.873
p
0.000
0.000
Log-likelihood
-341.078
-965.307
Deviance
682.155
1930.614
AIC
694.155
1952.614

Models with Limited Dependent Variables

203

BIC
723.108
2005.694
N
921
921
=====================================================
Observe that the likelihood ratio test may also be performed by having recourse to
the function lrtest in the package lmtest.
Additional pseudo-R2 measures may be obtained with the function pR2 in the package
pscl.
> library(pscl)
> pR2(logitfit)
llh
llhNull
G2
McFadden
r2ML
-341.0775772 -636.9757787 591.7964028
0.4645360
0.4740549
r2CU
0.6327213
> pR2(orderedlogitfit)
llh
llhNull
G2
McFadden
-965.3071623 -1396.7436930
862.8730614
0.3088874
r2ML
r2CU
0.6081543
0.6389289
To compute the probabilities for the average firm to obtain an investment grade,
having a book leverage of .25 and .75, see Verbeek p. 225, we first obtain the linear
estimates x0i corresponding to the first and third quantiles of booklev by considering
the sample average levels for the other variables:
> xlevels <- apply(credit[, c("ebit", "logsales", "reta",
"wka")], 2, mean)
> avefit <- c(1, quantile(credit$booklev, 0.25), xlevels) %*%
logitfit$coef
> avefit1 <- c(1, quantile(credit$booklev, 0.75), xlevels) %*%
logitfit$coef
and then apply the logistic transformation exp(x0i )/(1 + exp(x0i )), namely:
P {yi 0|xi } = P {i 1 x0i |xi } = P {i 1 + x0i |xi } = F (1 + x0i ) =
=

exp(1 + x0i )
=
1 + exp(1 + x0i )

> 1/(1 + exp(-avefit))


[,1]
[1,] 0.5433236
> 1/(1 + exp(-avefit1))
[,1]
[1,] 0.3119182

1
1
exp(1 +x0i )

+1

1
exp(1 x0i ) + 1

204

Models with Limited Dependent Variables

For the ordered logit by using the logistic transformation:


P {yi 3 |xi } = P {i 3 x0i |xi } = P {i 3 + x0i |xi } = F (3 + x0i ) =
=

exp(3 + x0i )
=
1 + exp(3 + x0i )

1
1
exp(3 +x0i )

+1

1
exp(3 x0i ) + 1

we have
> avefit <- c(quantile(credit$booklev, 0.25), xlevels) %*%
orderedlogitfit$coef
> avefit1 <- c(quantile(credit$booklev, 0.75), xlevels) %*%
orderedlogitfit$coef
> 1/(1 + exp(7.626 - avefit))
[,1]
[1,] 0.5169951
> 1/(1 + exp(7.626 - avefit1))
[,1]
[1,] 0.3701199
According to both models the probability of obtaining an investment grade decreases
when the booking leverage increases.

7.4

Willingness to Pay for Natural Areas (Section 7.2.4)

As an illustration of the ordered probit model in presence of fixed thresholds Verbeek


considers the analysis of how the value of a good that is not traded may be determined
in public economics. Data are available in the file wtp.dat, a text file that may be
read with the command read.table.
> wtp <- read.table(unzip("ch07.zip", "Chapter 7/wtp.dat"),
header = TRUE)
The dataset wtp contains 312 observations with survey results concerning willingness
to pay for the preservation of the Alentejo Natural Park in Portugal. The survey was
held in 1997. The original amounts in escudos were transformed into euros by dividing
through 200. The following variables (some are dummy variables) are present:

BID1 initial bid, in euro

BIDH higher bid

BIDL lower bid

NN dummy, 1 if answers are no,no

NY dummy, 1 if no,yes

YN dummy, 1 if yes,no

YY dummy, 1 if yes,yes

DEPVAR dependent variable from 1 to 4 (NN,NY,YN,YY)

Models with Limited Dependent Variables

AGE age in 6 classes

FEMALE dummy, 1 if female

INCOME income in 8 classes

205

Two ordered probit models are considered for explaining the willingness to pay: the
first one including only an intercept, while the age class, the gender and the income
class are included as explanatory variables for the second model.
To obtain the maximum likelihood parameter estimates we have first to build the
corresponding likelihood functions, see Verbeeks relationships (7.33)-(7.34). Recall
from Chapter 6 that we have to define the opposite of the Log-likelihood function,
since internal optimization routines provide for minimization.
For both models we define a variable regr including the regressors (for the first
model only a vector of ones). In the former model the likelihood depends on b1 and
on sigma. In the latter one it depends also on b2, b3 and b4.
> llI <- function(b1, sigma) {
regr <- as.matrix(rep(1, nrow(wtp)))
s = sigma
-sum(wtp$NN * log(pnorm((wtp$BIDL - regr * b1)/s)) +
wtp$NY * log(pnorm((wtp$BID1 - regr * b1)/s) pnorm((wtp$BIDL - regr * b1)/s)) + wtp$YN *
log(pnorm((wtp$BIDH - regr * b1)/s) - pnorm((wtp$BID1 regr * b1)/s)) + wtp$YY * log(1 - pnorm((wtp$BIDH regr * b1)/s)))
}
> llII <- function(b1, b2, b3, b4, sigma) {
regr <- cbind(1, wtp$AGE, wtp$FEMALE, wtp$INCOME)
s <- sigma
b <- c(b1, b2, b3, b4)
-sum(wtp$NN * log(pnorm((wtp$BIDL - regr %*%
b)/s)) + wtp$NY * log(pnorm((wtp$BID1 - regr %*%
b)/s) - pnorm((wtp$BIDL - regr %*% b)/s)) +
wtp$YN * log(pnorm((wtp$BIDH - regr %*% b)/s) pnorm((wtp$BID1 - regr %*% b)/s)) + wtp$YY *
log(1 - pnorm((wtp$BIDH - regr %*% b)/s)))
}
We can now use the function mle2 in the package bbmle to obtain the parameter
estimates, having defined a list of starting values for the two models.
> library(bbmle)
> b.start <- list(b1 = 10, sigma = 15)
> out <- mle2(llI, start = b.start)
> summary(out)
Maximum likelihood estimation

206

Models with Limited Dependent Variables

Call:
mle2(minuslogl = llI, start = b.start)
Coefficients:
Estimate Std. Error z value
Pr(z)
b1
18.7391
2.4969 7.5049 6.148e-14 ***
sigma 38.6122
2.9332 13.1637 < 2.2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
-2 log L: 818.009
> b.start <- list(b1 = 1, b2 = 1, b3 = 1, b4 = 5, sigma = 30)
> out <- mle2(llII, start = b.start)
> summary(out)
Maximum likelihood estimation
Call:
mle2(minuslogl = llII, start = b.start)
Coefficients:
Estimate Std. Error z value
Pr(z)
b1
30.0730
8.2788 3.6325 0.0002806
b2
-6.9309
1.6656 -4.1613 3.164e-05
b3
-5.1561
4.7135 -1.0939 0.2739901
b4
4.8940
1.9114 2.5604 0.0104549
sigma 36.4774
2.7488 13.2701 < 2.2e-16
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*"

***
***
*
***
0.05 "." 0.1 " " 1

-2 log L: 782.8182
The parameter estimates for the second model and the asymptotic standard errors
for the parameter estimates in both models are similar to those obtained by Verbeek
but not equal. Let us check what happens by changing the starting values in the
minimization procedure. The function coef allows only the paramater estimates to
be extracted from the object resulting by applying mle2.
> b.start <- list(b1 =
> coef(mle2(llI, start
b1
sigma
18.73886 38.61274
> b.start <- list(b1 =
> coef(mle2(llI, start
b1
sigma
18.73940 38.61309
> b.start <- list(b1 =
> coef(mle2(llI, start

1, sigma = 15)
= b.start))

1, sigma = 20)
= b.start))

4, sigma = 50)
= b.start))

Models with Limited Dependent Variables

207

b1
sigma
18.73680 38.61727
> b.start <- list(b1 = 40, sigma = 100)
> coef(mle2(llI, start = b.start))
b1
sigma
18.73239 38.61679
The minimization procedure applied to the first model seems to be quite robust with
respect to different sets of initial starting values.
For the second model we have:
> b.start <- list(b1 = 1, b2 = 1, b3 = 1, b4 = 6,
> coef(mle2(llII, start = b.start))
b1
b2
b3
b4
sigma
30.132584 -6.939768 -5.186392 4.887521 36.489069
> b.start <- list(b1 = 1, b2 = 1, b3 = 1, b4 = 5,
> coef(mle2(llII, start = b.start))
b1
b2
b3
b4
sigma
29.978269 -6.916038 -5.156463 4.908729 36.469106
> b.start <- list(b1 = 1, b2 = 2, b3 = 3, b4 = 4,
> coef(mle2(llII, start = b.start))
b1
b2
b3
b4
sigma
30.058567 -6.925511 -5.157301 4.893616 36.462669
> b.start <- list(b1 = 30, b2 = -7, b3 = 10, b4 =
sigma = 30)
> coef(mle2(llII, start = b.start))
b1
b2
b3
b4
sigma
30.110749 -6.933783 -5.184297 4.886962 36.481068

sigma = 30)

sigma = 14)

sigma = 14)

4,

The shape of the likelihood function seems to be too flat around its minimum; this
issue renders the optimization problem somewhat difficult and the estimation result
is unstable depending on initial starting values. To overcome this situation we have to
specify some parameters to control for the minimization algorithm; in particular we
fix the relative convergence tolerance to reltol = 1015 and the maximum number
of iterations to maxit = 10000. For the first model we have:
> b.start <- list(b1 = 1, sigma =
> coef(mle2(llI, start = b.start,
maxit = 10000)))
b1
sigma
18.73884 38.61273
> b.start <- list(b1 = 1, sigma =
> coef(mle2(llI, start = b.start,
maxit = 10000)))
b1
sigma
18.73884 38.61272

15)
control = list(reltol = 1e-15,

20)
control = list(reltol = 1e-15,

208

Models with Limited Dependent Variables

> b.start <- list(b1 = 4, sigma = 50)


> coef(mle2(llI, start = b.start, control = list(reltol = 1e-15,
maxit = 10000)))
b1
sigma
18.73884 38.61272
> b.start <- list(b1 = 40, sigma = 100)
> coef(mle2(llI, start = b.start, control = list(reltol = 1e-15,
maxit = 10000)))
b1
sigma
18.73884 38.61272
and for the second model we have:
> b.start <- list(b1 = 1, b2 = 1, b3 = 1, b4 = 5, sigma = 30)
> coef(mle2(llII, start = b.start, control = list(reltol = 1e-15,
maxit = 10000)))
b1
b2
b3
b4
sigma
30.109369 -6.933365 -5.184575 4.887612 36.477856
> b.start <- list(b1 = 1, b2 = 1, b3 = 1, b4 = 6, sigma = 30)
> coef(mle2(llII, start = b.start, control = list(reltol = 1e-15,
maxit = 10000)))
b1
b2
b3
b4
sigma
30.109394 -6.933368 -5.184583 4.887607 36.477853
> b.start <- list(b1 = 1, b2 = 1, b3 = 1, b4 = 5, sigma = 14)
> coef(mle2(llII, start = b.start, control = list(reltol = 1e-15,
maxit = 10000)))
b1
b2
b3
b4
sigma
30.109367 -6.933365 -5.184581 4.887613 36.477855
> b.start <- list(b1 = 1, b2 = 2, b3 = 3, b4 = 4, sigma = 14)
> coef(mle2(llII, start = b.start, control = list(reltol = 1e-15,
maxit = 10000)))
b1
b2
b3
b4
sigma
30.109366 -6.933364 -5.184582 4.887613 36.477855
> b.start <- list(b1 = 30, b2 = -7, b3 = 10, b4 = 4,
sigma = 30)
> coef(mle2(llII, start = b.start, control = list(reltol = 1e-15,
maxit = 10000)))
b1
b2
b3
b4
sigma
30.109386 -6.933367 -5.184587 4.887609 36.477855
> b.start <- list(b1 = 5, b2 = -20, b3 = 30, b4 = -10,
sigma = 60)
> coef(mle2(llII, start = b.start, control = list(reltol = 1e-15,
maxit = 10000)))
b1
b2
b3
b4
sigma
30.109372 -6.933365 -5.184584 4.887612 36.477854

Models with Limited Dependent Variables

209

The estimation results, though a bit different from those proposed by Verbeek for the
second model, do no more depend in a significant manner on initial starting values.
So we have the following final output:
> b.start <- list(b1 = 1, sigma = 15)
> outi <- mle2(llI, start = b.start, control = list(reltol = 1e-15,
maxit = 10000))
> summary(outi)
Maximum likelihood estimation
Call:
mle2(minuslogl = llI, start = b.start, control = list(reltol = 1e-15,
maxit = 10000))
Coefficients:
Estimate Std. Error z value
Pr(z)
b1
18.7388
2.4970 7.5047 6.158e-14 ***
sigma 38.6127
2.9333 13.1635 < 2.2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
-2 log L: 818.009
> b.start <- list(b1 = 30, b2 = -7, b3 = 10, b4 = 4,
sigma = 30)
> outc <- mle2(llII, start = b.start, control = list(reltol = 1e-15,
maxit = 10000))
> summary(outc)
Maximum likelihood estimation
Call:
mle2(minuslogl = llII, start = b.start, control = list(reltol = 1e-15,
maxit = 10000))
Coefficients:
Estimate Std. Error z value
Pr(z)
b1
30.1094
8.2791 3.6368 0.000276
b2
-6.9334
1.6657 -4.1625 3.148e-05
b3
-5.1846
4.7138 -1.0999 0.271390
b4
4.8876
1.9113 2.5572 0.010552
sigma 36.4779
2.7489 13.2699 < 2.2e-16
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*"

***
***
*
***
0.05 "." 0.1 " " 1

-2 log L: 782.8181
From the model with only the intercept the proportion of population with a negative
willingness to pay is:

210

Models with Limited Dependent Variables

> mu <- coef(outi)["b1"]


> sigma <- coef(outi)["sigma"]
> (pneg <- pnorm(-mu/sigma))
b1
0.313731
Reinterpreting the latent variable as desired willingness to pay, that is assuming that
the actual willingness to pay is the maximum between 0 and the desired amount, we
have that the expected value for the truncated normal distribution is:
> (expectpos <- mu + sigma * dnorm(mu/sigma)/pnorm(mu/sigma))
b1
38.69165
Then the estimate for expected willingness to pay over the entire sample will be:
> (expect <- 0 * pneg + (1 - pneg) * expectpos)
b1
26.55288
with a total willingness to pay of about
> expect * 3 * 10^6
b1
79658625
With reference to the model with characteristics we have that the expected willingness
to pay for a male in income class 1, aged between 20 and 29 (1st age class) is:
> sum(coef(outc)[c("b1", "b2", "b4")])
[1] 28.06363
and taking into account the censoring we have
> mu <- sum(coef(outc)[c("b1", "b2", "b4")])
> sigma <- coef(outc)["sigma"]
> (pneg <- pnorm(-mu/sigma))
sigma
0.2208477
> (expectpos <- mu + sigma * dnorm(mu/sigma)/pnorm(mu/sigma))
sigma
41.95654
> (expect <- 0 * pneg + (1 - pneg) * expectpos)
sigma
32.69053

Models with Limited Dependent Variables

7.5

211

Patent and R&D Expenditures (Section 7.3.2)

As an illustration for models in presence of count data Verbeek considers the analysis
of the relationship between the number of patents obtained by some firms and their
Research and Development Expenditures.
Data are available in the file patents.dat, a text file that may be read with the
command read.table.
> patents <- read.table(unzip("ch07.zip", "Chapter 7/patents.dat"),
header = TRUE)
The file patents contain data on 181 international manufacturing firms, with their
R&D expenditures, number of patents, industry, etc. for 1990 and 1991. The following
variables are available:

P91 number of patents in 1991

P90 number of patents in 1990

LR91 log R&D expenditures in 1991

LR90 log R&D expenditures in 1990

AEROSP dummy for aerospace industry

CHEMIST dummy for chemistry industry

COMPUTER dummy for computer industry (hardware and software)

MACHINES dummy for machinery and instruments industry

VEHICLES dummy for motor vehicles industry

JAPAN dummy for Japan

US dummy for USA

The relationship between the number of patents (a count variable) and the
expenditures in Research and Development are first analyzed by means of a Poisson
regression model.
The maximum likelihood parameter estimates may be obtained by using the function
glm and specifying poisson as family and "log" as link function.
> poissonfit <- glm(P91 ~ LR91 + AEROSP + CHEMIST +
COMPUTER + MACHINES + VEHICLES + JAPAN + US,
family = poisson(link = "log"), data = patents)
> summary(poissonfit)
Call:
glm(formula = P91 ~ LR91 + AEROSP + CHEMIST + COMPUTER + MACHINES +
VEHICLES + JAPAN + US, family = poisson(link="log"),data=patents)
Deviance Residuals:
Min
1Q
Median
-27.979
-5.246
-1.572

3Q
2.352

Max
29.246

212

Models with Limited Dependent Variables

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.873731
0.065868 -13.27 < 2e-16 ***
LR91
0.854525
0.008387 101.89 < 2e-16 ***
AEROSP
-1.421850
0.095640 -14.87 < 2e-16 ***
CHEMIST
0.636267
0.025527
24.93 < 2e-16 ***
COMPUTER
0.595343
0.023338
25.51 < 2e-16 ***
MACHINES
0.688953
0.038346
17.97 < 2e-16 ***
VEHICLES
-1.529653
0.041864 -36.54 < 2e-16 ***
JAPAN
0.222222
0.027502
8.08 6.46e-16 ***
US
-0.299507
0.025300 -11.84 < 2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 29669.4
Residual deviance: 9081.9
AIC: 9919.6

on 180
on 172

degrees of freedom
degrees of freedom

Number of Fisher Scoring iterations: 5


Goodness of fit statistics may be produced by means of the function pR2 available in
the package pscl. The pseudo R2 in Verbeek Table 7.7 can be found here as McFadden
pseudo R2 ; while the likelihood ratio appears as G2, see Section 7.1.5.
> library(pscl)
> pR2(poissonfit)
llh
llhNull
-4.950789e+03 -1.524456e+04
r2ML
r2CU
1.000000e+00 1.000000e+00

G2
2.058754e+04

McFadden
6.752422e-01

The same results (except1 for the McFadden pseudo R2 ) together with the parameter
estimates may also be obtained by applying the function mtable of the package memisc
to the glm object poissonfit.
> library(memisc)
> mtable(poissonfit)
1 mtable

computes McFadden R2 as
1

deviance(model)
deviance(Saturated M odel)

and for the present case may be obtained with R as


1-poissonfit$deviance/poissonfit$null.deviance
The definition coincides with Verbeeks (and also McFaddens) only in presence of a Generalized
Linear Model with a dichotomous response (Bernoulli, that is Binomial with n = 1) and a logit or
probit link function. So it is better to make reference to the value returned by the function pR2.

Models with Limited Dependent Variables

213

Calls:
poissonfit: glm(formula = P91 ~ LR91 + AEROSP + CHEMIST + COMPUTER +
MACHINES + VEHICLES + JAPAN + US, family = poisson(link = "log"),
data = patents)
===============================
(Intercept)
-0.874***
(0.066)
LR91
0.855***
(0.008)
AEROSP
-1.422***
(0.096)
CHEMIST
0.636***
(0.026)
COMPUTER
0.595***
(0.023)
MACHINES
0.689***
(0.038)
VEHICLES
-1.530***
(0.042)
JAPAN
0.222***
(0.028)
US
-0.300***
(0.025)
------------------------------Aldrich-Nelson R-sq.
0.991
McFadden R-sq.
0.694
Cox-Snell R-sq.
1.000
Nagelkerke R-sq.
1.000
phi
1.000
Likelihood-ratio
20587.541
p
0.000
Log-likelihood
-4950.789
Deviance
9081.901
AIC
9919.578
BIC
9948.364
N
181
===============================
The function coeftest in the package lmtest produces a table of the coefficients with
their robust standard errors, the z statistics and the statistical significance. Robust
standard errors are obtained by using the function vcovHC, available in the package
sandwich, see Section 4.1.6.
> library(sandwich)
> library(lmtest)
> coeftest(poissonfit, vcovHC(poissonfit, type = "HC"))

214

Models with Limited Dependent Variables

z test of coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.873731
0.742962 -1.1760 0.239591
LR91
0.854525
0.093695 9.1203 < 2.2e-16 ***
AEROSP
-1.421850
0.380168 -3.7401 0.000184 ***
CHEMIST
0.636267
0.225359 2.8233 0.004753 **
COMPUTER
0.595343
0.300803 1.9792 0.047796 *
MACHINES
0.688953
0.414664 1.6615 0.096619 .
VEHICLES
-1.529653
0.280693 -5.4496 5.049e-08 ***
JAPAN
0.222222
0.352840 0.6298 0.528819
US
-0.299507
0.273621 -1.0946 0.273689
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Verbeek performs a Wald test to check for the joint effect of the explanatory variables
included in the model.
We have first to estimate the base glm model including only an intercept and
then call the function waldtest, available in the package lmtest. The first and
second arguments are respectively the baseline model and the complete one; with
the argument test it is possible to specify the type of test ("Chisq" or "F") to be
performed and with the argument vcov the covariance matrix, in the present case a
robust estimate of the covariance matrix.
> poissonfit0 <- glm(P91 ~ 1, family = poisson(link = "log"),
data = patents)
> lmtest::waldtest(poissonfit0, poissonfit, test = "Chisq",
vcov = vcovHC(poissonfit, type = "HC"))
Wald test
Model 1: P91 ~ 1
Model 2: P91 ~ LR91 + AEROSP + CHEMIST + COMPUTER + MACHINES +VEHICLES+
JAPAN + US
Res.Df Df Chisq Pr(>Chisq)
1
180
2
172 8 339.97 < 2.2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
The Wald test rejects the hypothesis that the conditional mean is constant and
independent of the explanatory variables.
With regard to the interpretation of the coefficients Verbeek observes that b2 = 0.85,
pertaining the logarithm of the Research & Development expenditures, is to be
interpreted as an elasticity.
We obtain the percentage difference for each industry, ceteris paribus, in the number
of patents with respect to the reference industries (food, fuel, metal and others) by
transforming the parameters b3 , . . . , b9 as 100[exp(bi ) 1], i = 3, . . . , 9

Models with Limited Dependent Variables

> round(100 * (exp(poissonfit$coef[3:9]) - 1), 1)


AEROSP CHEMIST COMPUTER MACHINES VEHICLES
JAPAN
-75.9
88.9
81.4
99.2
-78.3
24.9

215

US
-25.9

The function dispersiontest is available in the package AER to test if we can expect a
dispersion parameter different from 1. The first argument is a Poisson model estimated
with glm. By default the argument alternative is set to "greater" thus testing for
a situation of overdispersion implying a value for the conditional variance larger than
the value of the conditional mean:
2 = dispersion ,

with dispersion > 1

> library(AER)
> dispersiontest(poissonfit)
Overdispersion test
data: poissonfit
z = 3.5861, p-value = 0.0001678
alternative hypothesis: true dispersion is greater than 1
sample estimates:
dispersion
57.36236
A sample estimate of the dispersion parameter is also given. By using the argument
trafo it is possible to specify to 1 or 2 the power k in the following expression
2 = + k
corresponding to the formulation of the variance in the Negative Binomial I and
II models that will be used below, see Kleiber and Zeileis (2008) for more detailed
information. The estimate of the dispersion parameter will be given for .
Note that for k = 1 we have dispersion = (1 + ).
> dispersiontest(poissonfit, trafo = 1)
Overdispersion test
data: poissonfit
z = 3.5861, p-value = 0.0001678
alternative hypothesis: true alpha is greater than 0
sample estimates:
alpha
56.36236
> dispersiontest(poissonfit, trafo = 2)
Overdispersion test

216

Models with Limited Dependent Variables

data: poissonfit
z = 3.8271, p-value = 6.482e-05
alternative hypothesis: true alpha is greater than 0
sample estimates:
alpha
0.4278121
To overcome the problem of overdispersion of the sigma parameter Verbeek considers
the estimation of the two models: NegBinI and NegBinII, both based on a
Negative Binomial distribution for the number of patents but depending on different
specifications for the conditional variance, see e.g. Johnson et. al. (2005) and Rigby
and Stasinopoulos (2009).
Observe that in the statistical literature the name of the distribution functions
pertaining these two models are exchanged. So the model NBI will be used to estimate
what in the econometric literature is known as NegBinII and vice versa; later on the
econometric convention is used for naming model objects.
Negative Binomial II - 1st estimation
The function glm provides also the family negative.binomial(1) that is used to
estimate the NegBinII model. The corresponding call and output are:
> NegBinII <- glm(P91 ~ LR91 + AEROSP + CHEMIST + COMPUTER +
MACHINES + VEHICLES + JAPAN + US, family = negative.binomial(1),
data = patents)
> summary(NegBinII)
Call:
glm(formula = P91 ~ LR91 + AEROSP + CHEMIST + COMPUTER + MACHINES +
VEHICLES + JAPAN + US, family = negative.binomial(1), data = patents)
Deviance Residuals:
Min
1Q
Median
-2.9205 -1.1676 -0.3058

3Q
0.3976

Max
2.9594

Coefficients:
Estimate Std. Error t value
(Intercept) -0.32576
0.55216 -0.590
LR91
0.83129
0.07861 10.575
AEROSP
-1.49826
0.37399 -4.006
CHEMIST
0.48779
0.26256
1.858
COMPUTER
-0.16953
0.27553 -0.615
MACHINES
0.05990
0.27957
0.214
VEHICLES
-1.53392
0.36098 -4.249
JAPAN
0.25361
0.40086
0.633
US
-0.58792
0.28228 -2.083
--Signif. codes: 0 "***" 0.001 "**" 0.01

Pr(>|t|)
0.5560
< 2e-16
9.17e-05
0.0649
0.5392
0.8306
3.51e-05
0.5278
0.0388

***
***
.

***
*

"*" 0.05 "." 0.1 " " 1

(Dispersion parameter for Negative Binomial(1) family taken to be 1.241)

Models with Limited Dependent Variables

Null deviance: 541.47


Residual deviance: 266.65
AIC: 1663.7

on 180
on 172

217

degrees of freedom
degrees of freedom

Number of Fisher Scoring iterations: 12


> pR2(NegBinII)
llh
llhNull
G2
-822.8364665 -960.2437462 274.8145593
r2CU
0.7809380

McFadden
0.1430963

r2ML
0.7809187

Observe that the estimates and their standard errors differ somewhat from those
provided in Verbeeks Table 7.8, but we have to remind that glm uses an estimation
method based upon iteratively reweighted least squares and not on maximum
likelihood.
Verbeeks coefficient estimates2 for the NegBinII model may be reproduced by
applying the function glm.nb available in the package MASS.
Negative Binomial II - 2nd estimation
> library(MASS)
> NegBinII.glm.nb <- glm.nb(P91 ~ LR91 + AEROSP + CHEMIST +
COMPUTER + MACHINES + VEHICLES + JAPAN + US,
data = patents)
> summary(NegBinII.glm.nb)
Call:
glm.nb(formula = P91 ~ LR91 + AEROSP + CHEMIST + COMPUTER + MACHINES +
VEHICLES + JAPAN + US, data = patents, init.theta = 0.7686768238,
link = log)
Deviance Residuals:
Min
1Q
Median
-2.6373 -1.0264 -0.2694

3Q
0.3438

Max
2.5966

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.32462
0.56234 -0.577
0.5638
LR91
0.83148
0.08006 10.386 < 2e-16 ***
AEROSP
-1.49746
0.37691 -3.973 7.10e-05 ***
CHEMIST
0.48861
0.26788
1.824
0.0682 .
COMPUTER
-0.17355
0.28086 -0.618
0.5366
MACHINES
0.05926
0.28429
0.208
0.8349
VEHICLES
-1.53065
0.36852 -4.153 3.27e-05 ***
JAPAN
0.25222
0.40983
0.615
0.5383
US
-0.59050
0.28834 -2.048
0.0406 *
2 Also

the procedures presented below are not based on maximum likelihood.

218

Models with Limited Dependent Variables

--Signif. codes:

0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1

(Dispersion parameter for Negative Binomial(0.7687) family taken to be 1)


Null deviance: 426.38
Residual deviance: 213.94
AIC: 1659.2

on 180
on 172

degrees of freedom
degrees of freedom

Number of Fisher Scoring iterations: 1

Theta:
Std. Err.:
2 x log-likelihood:

0.7687
0.0812
-1639.1910

The estimate for 2 may be retrieved as:


> 1/NegBinII.glm.nb$theta
[1] 1.300937
The package gamlss, (http://gamlss.org), provide tools for estimating Generalized
Additive Models for Location Scale and Shape of which the NegBinI and NegBinII
are special cases.
Negative Binomial I - 1st estimation
> library(gamlss)
> NegBinI <- gamlss(P91 ~ LR91 + AEROSP + CHEMIST +
COMPUTER + MACHINES + VEHICLES + JAPAN + US,
~1, family = NBII, data = patents)
GAMLSS-RS iteration 1: Global Deviance = 1730.599
GAMLSS-RS iteration 2: Global Deviance = 1715.676
GAMLSS-RS iteration 3: Global Deviance = 1705.882
GAMLSS-RS iteration 4: Global Deviance = 1700.497
GAMLSS-RS iteration 5: Global Deviance = 1697.994
GAMLSS-RS iteration 6: Global Deviance = 1696.971
GAMLSS-RS iteration 7: Global Deviance = 1696.595
GAMLSS-RS iteration 8: Global Deviance = 1696.459
GAMLSS-RS iteration 9: Global Deviance = 1696.416
GAMLSS-RS iteration 10: Global Deviance = 1696.401
GAMLSS-RS iteration 11: Global Deviance = 1696.393
GAMLSS-RS iteration 12: Global Deviance = 1696.391
GAMLSS-RS iteration 13: Global Deviance = 1696.391
> summary(NegBinI)
*******************************************************************
Family: c("NBII", "Negative Binomial type II")

Models with Limited Dependent Variables

219

Call:
gamlss(formula = P91 ~ LR91 + AEROSP + CHEMIST + COMPUTER + MACHINES +
VEHICLES + JAPAN + US, sigma.formula = ~1, family = NBII,
data = patents)
Fitting method: RS()
------------------------------------------------------------------Mu link function: log
Mu Coefficients:
Estimate Std. Error t value
Pr(>|t|)
(Intercept)
0.6962
0.50450
1.3799 1.694e-01
LR91
0.5779
0.06712
8.6095 4.652e-15
AEROSP
-0.7873
0.33707 -2.3359 2.066e-02
CHEMIST
0.7321
0.18525
3.9518 1.133e-04
COMPUTER
0.1440
0.20644
0.6978 4.863e-01
MACHINES
0.1549
0.25490
0.6076 5.443e-01
VEHICLES
-0.8177
0.26869 -3.0433 2.709e-03
JAPAN
0.4012
0.25742
1.5584 1.210e-01
US
0.1581
0.19856
0.7964 4.269e-01
------------------------------------------------------------------Sigma link function: log
Sigma Coefficients:
Estimate Std. Error
t value
Pr(>|t|)
4.560e+00
1.470e-01
3.103e+01
3.854e-72
------------------------------------------------------------------No. of observations in the fit: 181
Degrees of Freedom for the fit: 10
Residual Deg. of Freedom: 171
at cycle: 13
Global Deviance:
1696.391
AIC:
1716.391
SBC:
1748.376
*******************************************************************
> exp(NegBinI$sigma.coef)
(Intercept)
95.56499
> pR2(NegBinI)
GAMLSS-RS iteration 1: Global Deviance = 1790.816
GAMLSS-RS iteration 2: Global Deviance = 1787.876
GAMLSS-RS iteration 3: Global Deviance = 1786.311
GAMLSS-RS iteration 4: Global Deviance = 1785.551

220

Models with Limited Dependent Variables

GAMLSS-RS iteration 5: Global Deviance = 1785.201


GAMLSS-RS iteration 6: Global Deviance = 1785.045
GAMLSS-RS iteration 7: Global Deviance = 1784.982
GAMLSS-RS iteration 8: Global Deviance = 1784.958
GAMLSS-RS iteration 9: Global Deviance = 1784.947
GAMLSS-RS iteration 10: Global Deviance = 1784.943
GAMLSS-RS iteration 11: Global Deviance = 1784.941
GAMLSS-RS iteration 12: Global Deviance = 1784.94
GAMLSS-RS iteration 13: Global Deviance = 1784.939
llh
llhNull
G2
McFadden
-848.1955684 -892.4697439
88.5483510
0.0496086
Negative Binomial II 3rd estimation
> NegBinII <- gamlss(P91 ~ LR91 + AEROSP + CHEMIST +
COMPUTER + MACHINES + VEHICLES + JAPAN + US,
sigma.formula = ~1, family = NBI, data = patents)
GAMLSS-RS iteration 1: Global Deviance = 1639.194
GAMLSS-RS iteration 2: Global Deviance = 1639.191
GAMLSS-RS iteration 3: Global Deviance = 1639.191
> summary(NegBinII)
*******************************************************************
Family: c("NBI", "Negative Binomial type I")
Call:
gamlss(formula = P91 ~ LR91 + AEROSP + CHEMIST + COMPUTER + MACHINES +
VEHICLES + JAPAN + US, sigma.formula = ~1, family = NBI,
data = patents)
Fitting method: RS()
------------------------------------------------------------------Mu link function: log
Mu Coefficients:
Estimate Std. Error t value
Pr(>|t|)
(Intercept) -0.32542
0.49878 -0.6524 5.150e-01
LR91
0.83160
0.07681 10.8272 3.893e-21
AEROSP
-1.49745
0.37730 -3.9689 1.061e-04
CHEMIST
0.48875
0.25683
1.9030 5.872e-02
COMPUTER
-0.17365
0.29888 -0.5810 5.620e-01
MACHINES
0.05944
0.27936
0.2128 8.318e-01
VEHICLES
-1.53081
0.37397 -4.0934 6.542e-05
JAPAN
0.25251
0.42654
0.5920 5.546e-01
US
-0.59036
0.27883 -2.1173 3.568e-02
------------------------------------------------------------------Sigma link function: log

Models with Limited Dependent Variables

Sigma Coefficients:
Estimate Std. Error
0.26311
0.10568

t value
2.48971

221

Pr(>|t|)
0.01374

------------------------------------------------------------------No. of observations in the fit: 181


Degrees of Freedom for the fit: 10
Residual Deg. of Freedom: 171
at cycle: 3
Global Deviance:
1639.191
AIC:
1659.191
SBC:
1691.176
*******************************************************************
> exp(NegBinII$sigma.coef)
(Intercept)
1.300965
> pR2(NegBinII)
GAMLSS-RS iteration 1: Global Deviance = 1784.939
GAMLSS-RS iteration 2: Global Deviance = 1784.939
llh
llhNull
G2
McFadden
-819.59573868 -892.46969695 145.74791654
0.08165427
Parameter estimates and their standard errors are again close but not equal to the
values provided by Verbeek.
Negative Binomial I and II (maximum likelihood)
We now build the likelihood functions for the NegBinI and NegBinII models by having
recourse respectively to the distribution functions NBII and NBI that are available
in the package gamlss.dist.
> library(gamlss.dist)
> regr <- cbind(1, patents$LR91, patents$AEROSP, patents$CHEMIST,
patents$COMPUTER, patents$MACHINES, patents$VEHICLES,
patents$JAPAN, patents$US)
> llI <- function(b1, b2, b3, b4, b5, b6, b7, b8, b9,
d2) {
b <- c(b1, b2, b3, b4, b5, b6, b7, b8, b9)
mu <- exp(regr %*% b)
-sum(dNBII(patents$P91, mu = mu, sigma = d2,
log = TRUE))
}
> llII <- function(b1, b2, b3, b4, b5, b6, b7, b8,
b9, d2) {
b <- c(b1, b2, b3, b4, b5, b6, b7, b8, b9)
mu <- exp(regr %*% b)
-sum(dNBI(patents$P91, mu = mu, sigma = d2, log = TRUE))
}

222

Models with Limited Dependent Variables

We can use the function mle2 in the package bbmle to obtain the maximum likelihood
parameter estimates with their standard errors and significance information.
> library(bbmle)
> b.start <- as.list(c(NegBinI$mu.coef, exp(NegBinI$sigma.coef)))
> names(b.start) <- c(paste("b", 1:9, sep = ""), "d2")
> NegBinIout <- mle2(llI, start = b.start)
> summary(NegBinIout)
Maximum likelihood estimation
Call:
mle2(minuslogl = llI, start = b.start)
Coefficients:
Estimate Std. Error
b1 0.695898
0.507413
b2 0.577782
0.067676
b3 -0.786540
0.336884
b4 0.732261
0.185290
b5 0.144167
0.206440
b6 0.154857
0.255039
b7 -0.816659
0.268684
b8 0.400487
0.257415
b9 0.158445
0.198506
d2 95.564819 14.100465
--Signif. codes: 0 "***"

z value
Pr(z)
1.3715
0.17023
8.5375 < 2.2e-16 ***
-2.3348
0.01956 *
3.9520 7.751e-05 ***
0.6984
0.48496
0.6072
0.54372
-3.0395
0.00237 **
1.5558
0.11976
0.7982
0.42476
6.7774 1.223e-11 ***
0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1

-2 log L: 1696.391
By changing the relative tolerance value and incrementing the maximum number of
iterations we have:
> NegBinIout <- mle2(llI, start = b.start, control=list(reltol = 1e-15,
maxit = 50000))
> summary(NegBinIout)
Maximum likelihood estimation
Call:
mle2(minuslogl = llI, start = b.start, control = list(reltol = 1e-15,
maxit = 50000))
Coefficients:
Estimate Std. Error z value
Pr(z)
b1 0.690189
0.506968 1.3614 0.173385
b2 0.578394
0.067628 8.5525 < 2.2e-16 ***
b3 -0.786539
0.336789 -2.3354 0.019522 *

Models with Limited Dependent Variables

223

b4 0.733320
0.185161 3.9604 7.481e-05 ***
b5 0.144998
0.206314 0.7028 0.482179
b6 0.155770
0.254981 0.6109 0.541259
b7 -0.817559
0.268611 -3.0437 0.002337 **
b8 0.400543
0.257280 1.5568 0.119508
b9 0.158789
0.198397 0.8004 0.423502
d2 95.243438 14.006343 6.8000 1.046e-11 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
-2 log L: 1696.39
>
>
>
>

b.start<-as.list(c(round(NegBinI$mu.coef,2),exp(NegBinII$sigma.coef)))
b.start <- as.list(c(NegBinII.glm.nb$coef, 1/NegBinII.glm.nb$theta))
names(b.start) <- c(paste("b", 1:9, sep = ""), "d2")
NegBinIIout <- mle2(llII,start=b.start,control=list(reltol=1e-15,
maxit = 50000))
> summary(NegBinIIout)
Maximum likelihood estimation
Call:
mle2(minuslogl = llII, start = b.start, control = list(reltol = 1e-15,
maxit = 50000))
Coefficients:
Estimate Std. Error
b1 -0.324623
0.498168
b2 0.831479
0.076595
b3 -1.497458
0.377230
b4 0.488611
0.256769
b5 -0.173551
0.298809
b6 0.059264
0.279293
b7 -1.530649
0.373899
b8 0.252223
0.426426
b9 -0.590497
0.278778
d2 1.300937
0.137459
--Signif. codes: 0 "***"

z value
-0.6516
10.8555
-3.9696
1.9029
-0.5808
0.2122
-4.0937
0.5915
-2.1182
9.4641

Pr(z)
0.51464
< 2.2e-16
7.199e-05
0.05705
0.56137
0.83196
4.245e-05
0.55420
0.03416
< 2.2e-16

***
***
.

***
*
***

0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1

-2 log L: 1639.191

Parameter estimates and their standard errors are now closer to the values in
Verbeeks Table 7.8.

224

Models with Limited Dependent Variables

7.6

Expenditures on Alcohool and Tobacco (Part 1) (Section


7.4.3)

Data are available in the file TOBACCO.WF1, which is a work file of EViews.
The function unzip allows to extract the file from the compressed archive ch07.zip.
> library(hexView)
> at <- readEViews(unzip("ch07.zip", "Chapter 7/tobacco.wf1"))
Skipping boilerplate variable
Skipping boilerplate variable
The file tobacco contains information about 2724 Belgian households, taken from
the Belgian household budget survey of 1995/96. The data are kindly supplied by the
National Institute of Statistics (NIS), Belgium. The following variables are present
(some are dummy variables):

BLUECOL dummy, 1 if head is blue collar worker (1)

WHITECOL dummy, 1 if head is white collar worker (1)

FLANDERS dummy, 1 if living in Flanders (2)

WALLOON dummy, 1 if living in Wallonie (2)

NKIDS number of children > 2 years old

NKIDS2 number of children <= 2 years old

NADULTS number of adults in household

LNX log total expenditures

SHARE2 budgetshare tobacco

SHARE1 budgetshare alcohol

NADLNX nadults x lnx

AGELNX age x lnx

AGE age in brackets (0-4)

D1 dummy, 1 if share1>0

D2 dummy, 1 if share2>0

W1 budgetshare alcohol ,if >0, missing otherwise

W2 budgetshare tobacco ,if >0, missing otherwise

LNX2 lnx squared

AGE2 age squared

The shares of families having zero expenditures on alcohol and tobaccoes may be
determined as:
> sum(at$SHARE1 == 0)/length(at$SHARE1)
[1] 0.171072

Models with Limited Dependent Variables

225

> sum(at$SHARE2 == 0)/length(at$SHARE2)


[1] 0.6196769
and the average budget shares over the corresponding subsamples are
> mean(at$SHARE1[at$SHARE1 > 0])
[1] 0.0215073
> mean(at$SHARE2[at$SHARE2 > 0])
[1] 0.03219081
The Engel curves are estimated separately for alcohol and tobacco expenditures3 .
To estimate a Tobit model, see Verbeeks Table 7.9, we can have recourse to the
tobit function available in the package AER, see Kleiber and Zeileis (2008). The first
argument is the model specification and the second one the data.frame containing
the variables in the model. To obtain the estimation output apply summary to the
tobit object obtained with the function tobit.
Observe that the tobit function returns the standard error and the z statistic for the
estimate of the logarithm of the scale parameter . In the output the estimate of the
scale parameter is also given. The Wald statistic is approximately equal to the one
provided by Verbeek.
> library(AER)
> at7.9a <- tobit(SHARE1 ~ AGE + NADULTS + NKIDS +
NKIDS2 + LNX + AGE:LNX + NADULTS:LNX, data = at)
> summary(at7.9a)
Call:
tobit(formula = SHARE1 ~ AGE + NADULTS + NKIDS + NKIDS2 + LNX +
AGE:LNX + NADULTS:LNX, data = at)
Observations:
Total
2724

Left-censored
466

Uncensored Right-censored
2258
0

Coefficients:
(Intercept)
AGE
NADULTS
NKIDS
NKIDS2
LNX
AGE:LNX
NADULTS:LNX
Log(scale)

Estimate Std. Error z value


-0.1591526 0.0437780
-3.635
0.0134936 0.0108824
1.240
0.0291900 0.0169468
1.722
-0.0026408 0.0006049
-4.366
-0.0038789 0.0023835
-1.627
0.0126678 0.0032156
3.939
-0.0008093 0.0008006
-1.011
-0.0022484 0.0012232
-1.838
-3.7123577 0.0153342 -242.096

3 Since AGE is in brackets, we should transform it in a factor:


at$AGE<-as.factor(at$AGE)

Pr(>|z|)
0.000277
0.214993
0.084988
1.27e-05
0.103652
8.17e-05
0.312072
0.066051
< 2e-16

***
.
***
***
.
***

226

Models with Limited Dependent Variables

--Signif. codes:

0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1

Scale: 0.02442
Gaussian distribution
Number of Newton-Raphson Iterations: 3
Log-likelihood: 4755 on 9 Df
Wald-statistic: 118.8 on 7 Df, p-value: < 2.22e-16
> at7.9t <- tobit(SHARE2 ~ AGE + NADULTS + NKIDS +
NKIDS2 + LNX + AGE:LNX + NADULTS:LNX, data = at)
> summary(at7.9t)
Call:
tobit(formula = SHARE2 ~ AGE + NADULTS + NKIDS + NKIDS2 + LNX +
AGE:LNX + NADULTS:LNX, data = at)
Observations:
Total
2724

Left-censored
1688

Uncensored Right-censored
1036
0

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.5899802 0.0934269
6.315 2.70e-10 ***
AGE
-0.1258530 0.0241783
-5.205 1.94e-07 ***
NADULTS
0.0153697 0.0380475
0.404 0.68624
NKIDS
0.0042697 0.0013247
3.223 0.00127 **
NKIDS2
-0.0099719 0.0054713
-1.823 0.06837 .
LNX
-0.0444314 0.0068893
-6.449 1.12e-10 ***
AGE:LNX
0.0088221 0.0017832
4.947 7.52e-07 ***
NADULTS:LNX -0.0006007 0.0027501
-0.218 0.82709
Log(scale) -3.0366568 0.0246517 -123.183 < 2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Scale: 0.048
Gaussian distribution
Number of Newton-Raphson Iterations: 3
Log-likelihood: 758.7 on 9 Df
Wald-statistic: 171.3 on 7 Df, p-value: < 2.22e-16
To obtain the total expenditure elasticities evaluated at the sample averages of
those households that have positive expenditures, we have first to adapt Verbeeks
relationship (7.72) to the present context. The budget share for good j
wj =

pj qj
x

Models with Limited Dependent Variables

227

is modeled according to the following relationship:


wj (x) = 1j + 2j AGE + 3j NADULTS + 4j NKIDS + 5j NKIDS2 + 6j ln(x)
+ 7j AGE ln(x) + 8j NADULTS ln(x)

From the budget share we have:


qj = x

wj
wj (x)
=x
pj
pj

Deriving with respect to x we obtain:


wj (x)
dqj
1 dwj (x)
=
+x
dx
pj
pj dx


wj
1
1
x
1
=
+
6j + 7j AGE + 8j NADULTS
pj
pj
x
x
x
=

wj
1
+ (6j + 7j AGE + 8j NADULTS)
pj
pj

The elasticity follows multiplying by (x/qj )


ej =

dqj x
wj x
1 x
=
+
(6j + 7j AGE + 8j NADULTS)
dx qj
pj qj
pj qj
=1+

6j + 7j AGE + 8j NADULTS
.
wj

Let averages be the vector containing the average levels of the independent variables
in the model we have estimated for alcohool
> (averages <- apply(model.matrix(at7.9a)[at$SHARE1 >
0, ], 2, mean))
(Intercept)
AGE
NADULTS
NKIDS
NKIDS2
1.00000000 2.43489814 2.00177148 0.56864482 0.04384411
LNX
AGE:LNX NADULTS:LNX
13.77622903 33.42541096 27.74124174
The function model.matrix extracts the matrix of the regressors (including the
constant) of model at7.9a.
[at$SHARE1>0,] selects the observations with a positive expenditure in alcohol.
The column averages are obtained by means of the function apply(,2,mean).
The elements 2 and 3 of averages correspond respectively to the averages of the
variables AGE and NADULTS and coef(at7.9a)[6:8] are the 6j , 7j and 8j coefficient
estimates for alcohool.
The elasticity results:

228

Models with Limited Dependent Variables

> w_j <- mean(at$SHARE1[at$SHARE1 > 0])


> 1 + sum(c(1, averages[2:3]) * coef(at7.9a)[6:8])/w_j
[1] 1.288113
For tobacco we have:
> averages <- apply(model.matrix(at7.9t)[at$SHARE2 >
0, ], 2, mean)
> w_j <- mean(at$SHARE2[at$SHARE2 > 0])
> 1 + sum(c(1, averages[2:3]) * coef(at7.9t)[6:8])/w_j
[1] 0.1795591

7.7

Expenditures on Alcohool and Tobacco (Part 2) (Section


7.5.4)

For data reading and description see the preceding Section 7.6.
Verbeek suggests to estimate the Engel curve only for statistical units who have a
positive budget share, by means of OLS4 .
To this aim we can use the function lm by using the argument subset.
> at7.10a <- lm(SHARE1 ~ AGE + NADULTS + NKIDS + NKIDS2 +
LNX + AGE:LNX + NADULTS:LNX, data = at, sub = which(SHARE1 >
0))
> at7.10t <- lm(SHARE2 ~ AGE + NADULTS + NKIDS + NKIDS2 +
LNX + AGE:LNX + NADULTS:LNX, data = at, sub = which(SHARE2 >
0))
> library(memisc)
> mtable(at7.10a, at7.10t)
Calls:
at7.10a: lm(formula = SHARE1 ~ AGE + NADULTS + NKIDS + NKIDS2 + LNX +
AGE:LNX + NADULTS:LNX, data = at, subset = which(SHARE1 >
0))
at7.10t: lm(formula = SHARE2 ~ AGE + NADULTS + NKIDS + NKIDS2 + LNX +
AGE:LNX + NADULTS:LNX, data = at, subset = which(SHARE2 >
0))
=====================================
at7.10a
at7.10t
------------------------------------(Intercept)
0.053
0.490***
(0.044)
(0.074)
AGE
0.008
-0.031
(0.011)
(0.021)
4 As observed above since AGE is in brackets, we should transform it in a factor:
at$AGE<-as.factor(at$AGE)

Models with Limited Dependent Variables

229

NADULTS

-0.013
-0.013
(0.016)
(0.032)
NKIDS
-0.002***
0.001
(0.001)
(0.001)
NKIDS2
-0.002
-0.003
(0.002)
(0.005)
LNX
-0.002
-0.034***
(0.003)
(0.005)
AGE x LNX
-0.000
0.002
(0.001)
(0.002)
NADULTS x LNX
0.001
0.001
(0.001)
(0.002)
------------------------------------R-squared
0.051
0.154
adj. R-squared
0.048
0.148
sigma
0.022
0.029
F
17.270
26.732
p
0.000
0.000
Log-likelihood
5467.424 2200.044
Deviance
1.043
0.868
AIC
-10916.849 -4382.088
BIC
-10865.348 -4337.599
N
2258
1036
=====================================

Detailed results about single models can be obtained with summary(at7.10a) and
summary(at7.10t).
To obtain the total expenditure elasticities evaluated at the sample averages of those
households that have positive expenditures, we have to follow a procedure similar to
that presented above.
> averages <- apply(model.matrix(at7.10a), 2, mean)
> w_j <- mean(at$SHARE1[at$SHARE1 > 0])
> 1 + sum(c(1, averages[2:3]) * coef(at7.10a)[6:8])/w_j
[1] 0.922836
> averages <- apply(model.matrix(at7.10t), 2, mean)
> w_j <- mean(at$SHARE2[at$SHARE2 > 0])
> 1 + sum(c(1, averages[2:3]) * coef(at7.10t)[6:8])/w_j
[1] 0.1765833
Verbeek suggests then the estimation of two probit models.
> at7.11a <- glm(sign(SHARE1) ~ AGE + NADULTS + NKIDS +
NKIDS2 + LNX + AGE:LNX + NADULTS:LNX + BLUECOL +
WHITECOL, family = binomial(link = probit), data = at)
> at7.11t <- glm(sign(SHARE2) ~ AGE + NADULTS + NKIDS +
NKIDS2 + LNX + AGE:LNX + NADULTS:LNX + BLUECOL +

230

Models with Limited Dependent Variables

WHITECOL, family = binomial(link = probit), data = at)


> library(memisc)
> mtable(at7.11a, at7.11t)
Calls:
at7.11a: glm(formula = sign(SHARE1) ~ AGE + NADULTS + NKIDS + NKIDS2 +
LNX+AGE:LNX+NADULTS:LNX+BLUECOL+WHITECOL,family=binomial(link=probit),
data = at)
at7.11t: glm(formula = sign(SHARE2) ~ AGE + NADULTS + NKIDS + NKIDS2 +
LNX+AGE:LNX+NADULTS:LNX+BLUECOL+WHITECOL,family=binomial(link=probit),
data = at)
===========================================
at7.11a
at7.11t
------------------------------------------(Intercept)
-15.882*** 8.244***
(2.570)
(2.213)
AGE
0.668
-2.483***
(0.652)
(0.560)
NADULTS
2.255*
0.485
(1.020)
(0.872)
NKIDS
-0.077*
0.081**
(0.037)
(0.031)
NKIDS2
-0.186
-0.212
(0.140)
(0.123)
LNX
1.236*** -0.632***
(0.191)
(0.163)
BLUECOL
-0.061
0.206*
(0.098)
(0.084)
WHITECOL
0.051
0.022
(0.085)
(0.070)
AGE x LNX
-0.045
0.175***
(0.049)
(0.041)
NADULTS x LNX
-0.169*
-0.025
(0.074)
(0.063)
------------------------------------------Aldrich-Nelson R-sq.
0.060
0.038
McFadden R-sq.
0.069
0.030
Cox-Snell R-sq.
0.062
0.039
Nagelkerke R-sq.
0.103
0.053
phi
1.000
1.000
Likelihood-ratio
173.176
108.910
p
0.000
0.000
Log-likelihood
-1159.865 -1754.886
Deviance
2319.730
3509.772
AIC
2339.730
3529.772
BIC
2398.828
3588.871

Models with Limited Dependent Variables

231

N
2724
2724
===========================================
Detailed results pertaining single models can be obtained with summary(at7.11a)
and summary(at7.11t) and the goodness of fit statistics with pR2(at7.11a) and
pR2(at7.11t).
In presence of an household consisting of two adults, the head being a 35-year-old
(belonging to the second AGE class) blue-collar worker and two children older than
2, with total expenditures equal to the overall sample average, the implied estimated
probabilities of a positive budget share of alcohol and tobacco are respectively
(assuming the total expenditures to increase 10%).
> pnorm(predict(at7.11a, data.frame(AGE = 2, NADULTS = 2,
NKIDS = 2, NKIDS2 = 0, LNX = mean(at$LNX), BLUECOL = 1,
WHITECOL = 0)))
1
0.800741
> pnorm(predict(at7.11a, data.frame(AGE = 2, NADULTS = 2,
NKIDS = 2, NKIDS2 = 0, LNX = mean(at$LNX) + log(1.1),
BLUECOL = 1, WHITECOL = 0)))
1
0.8215568
> pnorm(predict(at7.11t, data.frame(AGE = 2, NADULTS = 2,
NKIDS = 2, NKIDS2 = 0, LNX = mean(at$LNX), BLUECOL = 1,
WHITECOL = 0)))
1
0.5171916
> pnorm(predict(at7.11t, data.frame(AGE = 2, NADULTS = 2,
NKIDS = 2, NKIDS2 = 0, LNX = mean(at$LNX) + log(1.1),
BLUECOL = 1, WHITECOL = 0)))
1
0.5045241
The estimated probabilities for alcohoolic beverages are slightly different from those
by Verbeek.
Observe that the standard errors of the parameter estimates obtained with glm
are somewhat different from those obtained by Verbeek since glm uses iteratively
reweighted least squares and not maximum likelihood as estimation method. The
probit function in the package sampleSelection provides maximum likelihood
estimates and standard errors.
> library(sampleSelection)
> at7.11aa <- probit(sign(SHARE1) ~ AGE + NADULTS +
NKIDS + NKIDS2 + LNX + AGE:LNX + NADULTS:LNX +
BLUECOL + WHITECOL, data = at)
> at7.11tt <- probit(sign(SHARE2) ~ AGE + NADULTS +

232

Models with Limited Dependent Variables

NKIDS + NKIDS2 + LNX + AGE:LNX + NADULTS:LNX +


BLUECOL + WHITECOL, data = at)
> summary(at7.11aa)
-------------------------------------------Probit binary choice model/Maximum Likelihood estimation
Newton-Raphson maximisation, 5 iterations
Return code 1: gradient close to zero
Log-Likelihood: -1159.865
Model: Y == '1' in contrary to '0'
2724 observations (466 'negative' and 2258 'positive')
and 10 free parameters (df = 2714)
Estimates:
Estimate Std. error t value
Pr(> t)
(Intercept) -15.882314
2.573933 -6.1704 6.810e-10 ***
AGE
0.667851
0.652002 1.0243
0.30569
NADULTS
2.255389
1.024527 2.2014
0.02771 *
NKIDS
-0.077049
0.037246 -2.0686
0.03858 *
NKIDS2
-0.185719
0.140827 -1.3188
0.18725
LNX
1.235531
0.191295 6.4588 1.056e-10 ***
BLUECOL
-0.061166
0.097771 -0.6256
0.53157
WHITECOL
0.050564
0.084713 0.5969
0.55058
AGE:LNX
-0.044801
0.048540 -0.9230
0.35602
NADULTS:LNX -0.168792
0.074232 -2.2738
0.02298 *
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Significance test:
chi2(9) = 173.1762 (p=1.344063e-32)
-------------------------------------------> summary(at7.11tt)
-------------------------------------------Probit binary choice model/Maximum Likelihood estimation
Newton-Raphson maximisation, 4 iterations
Return code 1: gradient close to zero
Log-Likelihood: -1754.886
Model: Y == '1' in contrary to '0'
2724 observations (1688 'negative' and 1036 'positive')
and 10 free parameters (df = 2714)
Estimates:
Estimate Std. error t value
Pr(> t)
(Intercept) 8.244472
2.211077 3.7287 0.0001925 ***
AGE
-2.483000
0.559601 -4.4371 9.118e-06 ***
NADULTS
0.485200
0.871738 0.5566 0.5778085
NKIDS
0.081283
0.030830 2.6365 0.0083766 **
NKIDS2
-0.211662
0.123050 -1.7201 0.0854076 .
LNX
-0.632080
0.163195 -3.8731 0.0001074 ***
BLUECOL
0.206420
0.083432 2.4741 0.0133566 *

Models with Limited Dependent Variables

233

WHITECOL
0.021534
0.069428 0.3102 0.7564324
AGE:LNX
0.174736
0.041305 4.2303 2.334e-05 ***
NADULTS:LNX -0.025340
0.062923 -0.4027 0.6871532
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Significance test:
chi2(9) = 108.9097 (p=2.450886e-19)
-------------------------------------------Note that the functions mtable and predict cannot (at the moment) be applied to
objects resulting from sampleSelection::probit.
The Engel curves are finally re-estimated by Verbeek with the two-step estimation
procedure proposed by Heckmann. The function selection, available in the package
sampleSelection can be used. The function selection depends on 4 main
arguments: selection: a formula specifying the (probit) selection model; outcome
a formula relating the outcome to its explanatory variables; data, a data.frame
containing the data to analyze; the method specifying the estimation method, in our
case "2step".
> library(sampleSelection)
> at7.12a <- selection(selection = sign(SHARE1) ~ AGE +
NADULTS + NKIDS + NKIDS2 + LNX + AGE:LNX + NADULTS:LNX +
BLUECOL + WHITECOL, outcome = SHARE1 ~ AGE +
NADULTS + NKIDS + NKIDS2 + LNX + AGE:LNX + NADULTS:LNX,
data = at, method = "2step")
> at7.12t <- selection(selection = sign(SHARE2) ~ AGE +
NADULTS + NKIDS + NKIDS2 + LNX + AGE:LNX + NADULTS:LNX +
BLUECOL + WHITECOL, outcome = SHARE2 ~ AGE +
NADULTS + NKIDS + NKIDS2 + LNX + AGE:LNX + NADULTS:LNX,
data = at, method = "2step")
> summary(at7.12a)
-------------------------------------------Tobit 2 model (sample selection model)
2-step Heckman / heckit estimation
2724 observations (466 censored and 2258 observed)
21 free parameters (df = 2704)
Probit selection equation:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -15.88231
2.57393 -6.170 7.83e-10 ***
AGE
0.66785
0.65200
1.024
0.3058
NADULTS
2.25539
1.02453
2.201
0.0278 *
NKIDS
-0.07705
0.03725 -2.069
0.0387 *
NKIDS2
-0.18572
0.14083 -1.319
0.1874
LNX
1.23553
0.19130
6.459 1.25e-10 ***
BLUECOL
-0.06117
0.09777 -0.626
0.5316
WHITECOL
0.05056
0.08471
0.597
0.5506

234

Models with Limited Dependent Variables

AGE:LNX
-0.04480
0.04854 -0.923
0.3561
NADULTS:LNX -0.16879
0.07423 -2.274
0.0231 *
Outcome equation:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.0542675 0.1329935
0.408 0.68327
AGE
0.0077095 0.0130468
0.591 0.55463
NADULTS
-0.0133444 0.0247045 -0.540 0.58913
NKIDS
-0.0020244 0.0007637 -2.651 0.00808 **
NKIDS2
-0.0024127 0.0025715 -0.938 0.34821
LNX
-0.0024288 0.0093674 -0.259 0.79544
AGE:LNX
-0.0004044 0.0009420 -0.429 0.66773
NADULTS:LNX 0.0008461 0.0018047
0.469 0.63922
Multiple R-Squared:0.051,
Adjusted R-Squared:0.0476
Error terms:
Estimate Std. Error t value Pr(>|t|)
invMillsRatio -0.0002045 0.0165285 -0.012
0.99
sigma
0.0214876
NA
NA
NA
rho
-0.0095160
NA
NA
NA
-------------------------------------------> summary(at7.12t)
-------------------------------------------Tobit 2 model (sample selection model)
2-step Heckman / heckit estimation
2724 observations (1688 censored and 1036 observed)
21 free parameters (df = 2704)
Probit selection equation:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.24447
2.21108
3.729 0.000196 ***
AGE
-2.48300
0.55960 -4.437 9.48e-06 ***
NADULTS
0.48520
0.87174
0.557 0.577855
NKIDS
0.08128
0.03083
2.637 0.008425 **
NKIDS2
-0.21166
0.12305 -1.720 0.085522 .
LNX
-0.63208
0.16320 -3.873 0.000110 ***
BLUECOL
0.20642
0.08343
2.474 0.013418 *
WHITECOL
0.02153
0.06943
0.310 0.756456
AGE:LNX
0.17474
0.04131
4.230 2.41e-05 ***
NADULTS:LNX -0.02534
0.06292 -0.403 0.687185
Outcome equation:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.4515813 0.1086284
4.157 3.32e-05 ***
AGE
-0.0172991 0.0358591 -0.482 0.629547
NADULTS
-0.0174378 0.0339635 -0.513 0.607693
NKIDS
0.0007643 0.0015130
0.505 0.613471
NKIDS2
-0.0020755 0.0053883 -0.385 0.700128
LNX
-0.0301094 0.0090459 -3.329 0.000885 ***
AGE:LNX
0.0012243 0.0025454
0.481 0.630568

Models with Limited Dependent Variables

235

NADULTS:LNX 0.0013650 0.0024238


0.563 0.573371
Multiple R-Squared:0.1542,
Adjusted R-Squared:0.1476
Error terms:
Estimate Std. Error t value Pr(>|t|)
invMillsRatio -0.009018
0.018589 -0.485
0.628
sigma
0.029881
NA
NA
NA
rho
-0.301793
NA
NA
NA
-------------------------------------------To obtain the total expenditure elasticities evaluated at the sample averages of those
households that have positive expenditures, we have to follow the same procedure
adopted for the preceding models.
> averages <- apply(model.matrix(at7.12a), 2, mean,
na.rm = TRUE)
> w_j <- mean(at$SHARE1[at$SHARE1 > 0])
> 1 + sum(c(1, averages[2:3]) * coef(at7.12a)[16:18])/w_j
[1] 0.9200393
> averages <- apply(model.matrix(at7.12t), 2, mean,
na.rm = TRUE)
> w_j <- mean(at$SHARE2[at$SHARE2 > 0])
> 1 + sum(c(1, averages[2:3]) * coef(at7.12t)[16:18])/w_j
[1] 0.2334157
Robust Heckitt estimates can be obtained by having recourse to the function
heckitrob available in the package ssmrob
> library(ssmrob)
> with(at, {
at7.12arob <- heckitrob(selection = sign(SHARE1) ~
AGE + NADULTS + NKIDS + NKIDS2 + LNX + AGE:LNX +
NADULTS:LNX + BLUECOL + WHITECOL, outcome = SHARE1 ~
AGE + NADULTS + NKIDS + NKIDS2 + LNX + AGE:LNX +
NADULTS:LNX,
control = heckitrob.control(weights.x1 = "robCov"))
summary(at7.12arob)
})
------------------------------------------------------------Robust 2-step Heckman / heckit M-estimation
Probit selection equation:
Estimate Std.Error t-value p-value
XS(Intercept) -15.95924691 2.68714692 -5.9390 2.87e-09 ***
XSAGE
0.69340378 0.68917714 1.0060 3.14e-01
XSNADULTS
2.28871511 1.06754439 2.1440 3.20e-02
*
XSNKIDS
-0.09667342 0.03852380 -2.5090 1.21e-02
*
XSNKIDS2
-0.22720078 0.14400781 -1.5780 1.15e-01
XSLNX
1.24363619 0.19999064 6.2180 5.02e-10 ***

236

Models with Limited Dependent Variables

XSBLUECOL
-0.06540005 0.10183231 -0.6422 5.21e-01
XSWHITECOL
0.01981345 0.08890862 0.2229 8.24e-01
XSAGE:LNX
-0.04739224 0.05139110 -0.9222 3.56e-01
XSNADULTS:LNX -0.17079467 0.07741752 -2.2060 2.74e-02
*
Outcome equation:
Estimate
Std.Error t-value p-value
XO(Intercept) 0.0481414810 0.506033190 0.09514
0.924
XOAGE
-0.0056858851 0.030390752 -0.18710
0.852
XONADULTS
-0.0044571146 0.075028273 -0.05941
0.953
XONKIDS
-0.0013210916 0.002505419 -0.52730
0.598
XONKIDS2
-0.0018052433 0.005900382 -0.30600
0.760
XOLNX
-0.0022514209 0.035468795 -0.06348
0.949
XOAGE:LNX
0.0005240257 0.002130696 0.24590
0.806
XONADULTS:LNX 0.0002636993 0.005524299 0.04773
0.962
imrData$IMR1 -0.0011748054 0.067667883 -0.01736
0.986
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
------------------------------------------------------------> with(at, {
at7.12trob <- heckitrob(selection = sign(SHARE2) ~
AGE + NADULTS + NKIDS + NKIDS2 + LNX + AGE:LNX +
NADULTS:LNX + BLUECOL + WHITECOL, outcome = SHARE2 ~
AGE + NADULTS + NKIDS + NKIDS2 + LNX + AGE:LNX +
NADULTS:LNX,
control = heckitrob.control(weights.x1 = "robCov"))
summary(at7.12trob)
})
------------------------------------------------------------Robust 2-step Heckman / heckit M-estimation
Probit selection equation:
Estimate Std.Error t-value p-value
XS(Intercept) 8.28126218 2.22147554 3.7280 1.93e-04 ***
XSAGE
-2.48452150 0.56216375 -4.4200 9.89e-06 ***
XSNADULTS
0.46546383 0.87382559 0.5327 5.94e-01
XSNKIDS
0.08116955 0.03087415 2.6290 8.56e-03 **
XSNKIDS2
-0.19818390 0.12341443 -1.6060 1.08e-01
XSLNX
-0.63408431 0.16394126 -3.8680 1.10e-04 ***
XSBLUECOL
0.20400053 0.08368888 2.4380 1.48e-02
*
XSWHITECOL
0.01799770 0.06979220 0.2579 7.97e-01
XSAGE:LNX
0.17500330 0.04148927 4.2180 2.46e-05 ***
XSNADULTS:LNX -0.02425829 0.06306736 -0.3846 7.01e-01
Outcome equation:
Estimate
Std.Error t-value p-value
XO(Intercept) 0.3979558830 0.162089777 2.4550 0.0141 *
XOAGE
-0.0323871073 0.060120904 -0.5387 0.5900
XONADULTS
-0.0117157109 0.036582449 -0.3203 0.7490

Models with Limited Dependent Variables

XONKIDS
0.0003799716 0.002269950 0.1674 0.8670
XONKIDS2
-0.0024720481 0.006241712 -0.3961 0.6920
XOLNX
-0.0264461249 0.014363553 -1.8410 0.0656 .
XOAGE:LNX
0.0022926188 0.004209442 0.5446 0.5860
XONADULTS:LNX 0.0009147453 0.002477952 0.3692 0.7120
imrData$IMR1 -0.0074295332 0.035604993 -0.2087 0.8350
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
-------------------------------------------------------------

237

8
Univariate Time Series Models
8.1

Some examples of stochastic processes

We first consider the behaviour of some weakly stationary stochastic processes by


simulating realizations from a Gaussian White Noise, some autoregressive and some
moving average processes.1

8.1.1

The Gaussian White Noise

The Gaussian White noise consists of a sequence of identically and independently


distributed (i.i.d.) Normal random variables with mean 0 and the same variance
i.i.d.

t N ( = 0, 2 ).
To create a time series E with 100 Normal pseudo-random numbers with mean = 0
and variance 2 = 3 use the code
> E <- ts(data = rnorm(n = 500, mean = 0, sd = 3^0.5))
The function ts is used to create time series objects; its main arguments are the
data consisting in a numeric vector or matrix, start defining the time of the first
observation, which can be a single number or a vector of two integers (a natural
time unit and the number of samples into the time unit, e.g. c(2012,2) for 2012
February in a monthly series), end defining the time of the last observation, specified
in the same way as start, and one of the two options: frequency, the number of
observations per unit of time, or deltat, the fraction of the sampling period between
successive observations; e.g., 1/12 for monthly data.
By default both frequency and deltat are set to 1.
The graphical representation of the stochastic process {t } shows no regular
pattern. See Figure 8.1 that can be obtained with the code:
> plot(E)
Observe that {t } is a White Noise process when it consists of a sequence of
uncorrelated random variables with mean 0 and the same variance.
The Gaussian White Noise is an example of White noise.
1 We address readers interested in the financial implementations of stochastic processes to the
R-metrics site (https://www.rmetrics.org). The e-book by W
urtz et. al. (2009) is a good reference
presenting an overall introduction to the type of classes used by R for dealing with time series.

Univariate Time Series Models

240

100

200

300

400

500

Time

Figure 8.1

8.1.2

Graphical representation of a Gaussian White Noise process

The Autoregressive Process

Simulation of a realization from an AR(1) Process


Lets consider the following stochastic finite difference equation2 , where {t } is a
White Noise process and |1 | < 1
yt = 1 yt1 + t .

(8.1)

If we assume {t } assigned, the unique, asymptotic, weakly stationary solution of (8.1)


is a stochastic process {yt } named AutoRegressive process of order 1 and denoted with
AR(1).
To simulate a realization y from an AR(1) stochastic process we need the
corresponding realization E of a White Noise process {t } and a first initial condition
2 Verbeeks notation is used: {y } is a 0 mean stochastic process, possibly obtained from {Y },
t
t
stochastic process whose components have common mean , as yt = Yt .

Univariate Time Series Models

241

for {yt }: y0 the value of yt at t = 0


y[1] = 1 y0 + E[1]
y[2] = 1 y[1] + E[2]
(8.2)

...
y[n] = 1 y[n-1] + E[n]

Initial data need to be dropped to make their memory effect vanish. So we simulate
a realization longer than n.
> theta1 <- 0.9
> n <- 500
> E <- ts(rnorm(n + 100, mean = 0, sd = 1))
We can now define the initial condition y0 for {yt }, initialize a variable y with the
same length as E and obtain y[1]. Relationships (8.2) are then implemented by means
of a for cycle.
>
>
>
>

y0 <- 0
y <- E * 0
y[1] <- theta1 * y0 + E[1]
for (t in 2:(n + 100)) {
y[t] <- theta1 * y[t - 1] + E[t]
}
> y <- y[(length(y) - n + 1):length(y)]
Observe that if, as we assumed, |1 | < 1 we can obtain, by recursive substitutions, the
causal representation of {yt } as a linear filter based on the generating process {t }
yt =

1i ti ;

i=0

this relationship can be implemented by means of the function filter, providing as


arguments: the generating time series E, the vector of the autoregressive coefficients
and by specifying method="recursive"
> y <- ts(filter(E, filter = theta1, method = "recursive")[(length(E) n + 1):length(E)])
Figure 8.2 represents the behaviour of an AR(1) stochastic process which is
characterized by large deviations from the mean, here 0, to be followed by large
deviations from the mean and by a general custering also of small deviations from
the mean.
It can be obtained with:
> plot(y, ylab = paste("Yt AR(1), theta1= ", theta1))
To simulate a realization from an AR(1) process it is also possible to make direct use
of the function arima whose main arguments are the model, consisting of a list with

Univariate Time Series Models

2
0
2
6

Yt AR(1), theta1= 0.9

242

100

200

300

400

500

Time

Figure 8.2

Graphical representation of an AR(1) process

components ar and/or ma giving the AR and MA coefficients respectively (an empty


list gives an ARIMA(0, 0, 0) model, that is a White Noise); n is the length of the
output series. See the help ?arima.sim for more information on the function and see
also ?arima for the definition of the ARIMA process used by the related R functions.
> y <- arima.sim(model = list(ar = theta1), n = 100)
The reader is invited to observe what happens by changing the theta1 parameter,
e.g. by considering 1 {0.8, 0.6, 0.4, 0.2, 0, 0.2, 0.4, 0.6, 0.8, 1, 1.2}.
She will find that arima.sim returns an error for |1 | 1, since the stochastic process
{yt } becomes no more weakly stationary.
Simulation from an AR(2) process
The stochastic process {yt } named AutoRegressive process of order 2 and denoted
with AR(2) is the unique, asymptotic, weakly stationary solution of the stochastic

Univariate Time Series Models

243

finite difference equation


yt = 1 yt1 + 2 yt2 + t

(8.3)
3

where {t } is a White Noise process and the roots of the characteristic equation
1 1 z 2 z 2 = 0
lie outside the unit circle.
To simulate a realization from an AR(2) stochastic process we need the realizations
of a White Noise process {t } and 2 initial conditions for {yt }, say y_1 and y0, the
values of {yt } at times t = 1 and 0
y[1] = 1 y0 + 2 y_1 + E[1]
y[2] = 1 y[1] + 2 y0 + E[2]
y[3] = 1 y[2] + 2 y[1] + E[3]
y[4] = 1 y[3] + 2 y[2] + E[4]
(8.4)

...
y[n] = 1 y[n-1] + 2 y[n-2] + E[n]
Initial data are dropped to make their memory effect vanish
>
>
>
>
>
>
>
>
>

theta1 <- 0.5


theta2 <- 0.2
y_1 <- y0 <- 0
n <- 500
E <- ts(rnorm(n + 100, mean = 0, sd = 1))
y <- E * 0
y[1] <- theta1 * y0 + theta2 * y_1 + E[1]
y[2] <- theta1 * y[1] + theta2 * y0 + E[2]
for (i in 3:(n + 100)) {
y[i] <- theta1 * y[i - 1] + theta2 * y[i - 2] +
E[i]
}
> y <- ts(y[(length(y) - n + 1):length(y)])
The realization can also be obtained by using the linear filter representation of the
process:

X
yt =
i ti ,
i=0

where the i do satisfy:


(1 1 z 2 z 2 )(1 + 1 z + 2 z 2 + . . .) = 1.
3 Observe

that sometimes the stationarity condition is expressed by means of the auxiliary equation
z 2 1 z 2 = 0

whose roots must lie inside the unit circle for the process {yt } to be stationary.

244

Univariate Time Series Models

Observe that to define the filter we do not have to compute the coefficients i ,
which could also be obtained by recursive substitutions of relationship (8.3): we only
need to specify, as for the AR(1) process, the autoregressive coefficients and state
"recursive" as method.
> y <- ts(filter(E, filter = c(theta1, theta2), method =
"recursive")[(length(E) - n + 1):length(E)])
We can also make direct use of the function arima.sim
> y <- arima.sim(model = list(ar = c(theta1, theta2)),
n = 500)

8.1.3

The Moving Average Process

Simulation of a realization from a MA(1) Process


Let {t } be a White Noise process and consider the following stochastic finite
difference equation
yt = t + 1 t1 .
The unique, asymptotic, weakly stationary solution is a stochastic process {yt } named
Moving Average process of order 1 denoted with MA(1), which is invertible say {yt }
can be expressed as a linear filter of its past values only if |1 | < 1.
To simulate a realization from a MA(1) stochastic process we need the realizations
of a White Noise process {t } and also an initial condition, that is e0, the value of
{t } at time 0:
y[1] = e[1] + 1 e0
y[2] = e[2] + 1 e[1]
...

(8.5)

y[n] = e[n] + 1 e[n-1]

>
>
>
>
>
>
>

alpha1 <- 0.7


n <- 500
e0 <- rnorm(1, 0, 1)
E <- ts(rnorm(n, mean = 0, sd = 1))
y <- E * 0
y[1] <- E[1] + alpha1 * e0
for (t in 2:n) {
y[t] <- E[t] + alpha1 * E[t - 1]
}

Observe that the for cycle can be substituted with the following vector instruction:

Univariate Time Series Models

245

> y <- E * 0
> y[1] <- E[1] + alpha1 * e0
> y[-1] <- E[-1] + alpha1 * E[-length(y)]
Regarding the linear filter representation observe that
yt =

i ti ,

with 0 = 1 and i = 0, i > 1.

i=0

This can be implemented with the function filter by specifying the moving average
coefficient, "convolution" as method and4 sides=1.
> y <- filter(E, filter = c(1, alpha1), method = "convolution",
sides = 1)
The series obtained with filter differs from the one built with the for cycle only in
the first observation y[1], since filter uses this observation as the initial condition.
Observe that initial conditions do not have any effect on the evolution of a moving
average process.
We can also make direct use of the function arima.sim:
> y <- arima.sim(model = list(ma = alpha1), n = 100)
The reader can observe what happens by varying 1 in the set {0.8, 0.6, . . . , 0.6, 0.8}.
Simulation from a MA(2) Process
The unique, asymptotic, weakly stationary solution of the following stochastic finite
difference equation, where {t } is a White Noise,
yt = t + 1 t1 + 2 t2
is a stochastic process {yt } named Moving Average process of order 2 denoted with
MA(2), which is invertible if the roots of the characteristic equation
1 + 1 z + 2 z 2 = 0
lie outside the unit circle.
To simulate a realization from a MA(2) stochastic process we need the realizations of
4 With

sides=2 the following non-causal linear filter would be applied


Yt =

k
X

i ti ,

with 0 = 1,

i=k

the option filter defining the vector


[k , k+1 , . . . , k1 , k ].

Univariate Time Series Models

1
0
1
3

Yt MA(1), alpha1= 0.7

246

100

200

300

400

500

Time

Figure 8.3

Graphical representation of a MA(1) process

a White Noise process {t } and two initial conditions, say e0 and e_1, for the values
0 and 1
y[1] = e[1] + 1 e0 + 2 e_1
y[2] = e[2] + 1 e[1] + 2 e0
y[3] = e[3] + 1 e[2] + 2 e[1]
y[4] = e[4] + 1 e[3] + 2 e[2]
...
y[n] = e[n] + 1 e[n-1] + 2 e[n-2]

> alpha1 <- 0.5


> alpha2 <- 0.2
> n <- 100

(8.6)

Univariate Time Series Models

>
>
>
>
>
>
>

247

E <- ts(rnorm(n, mean = 0, sd = 1))


e0 <- rnorm(1, 0, 1)
e_1 <- rnorm(1, 0, 1)
y <- E * 0
y[1] <- E[1] + alpha1 * e0 + alpha2 * e_1
y[2] <- E[2] + alpha1 * E[1] + alpha2 * e0
y[-(1:2)] <- E[-(1:2)] + alpha1 * E[-c(1, length(y))] +
alpha2 * E[-c(length(y) - 1, length(y))]

We can also use the filter function, with method="convolution" and obtain the
same series except for the 2 initial values.
> y <- filter(E, filter = c(1, alpha1, alpha2), method = "convolution",
sides = 1)
The function arima.sim can also be used
> y <- arima.sim(model = list(ma = c(alpha1, alpha2)),
n = 100)

8.1.4

Simulation of a realization from an AR(1) process with drift

Let us now simulate a realization from the stochastic process {Yt } defined by the
following stochastic finite difference equation:
Yt = + 1 Yt1 + t
with = 2 and 1 = 0.8; {t } is an assigned Gaussian White Noise process.
{Yt } is an autoregressive process of order 1 with the presence of a drift.
To simulate a realization from this process we can use the following code:
>
>
>
>

n <- 500
alpha <- 2
theta1 <- 0.8
yt <- arima.sim(model = list(ar = theta1), n = n,
rand.gen = function(n) {
alpha + rnorm(n)
})

The argument rand.gen in arima.sim specifies the generating model for {t }. Here
we considered a sequence of i.i.d. normal pseudo-random values shifted by the constant
alpha, which is equivalent to specify a sequence of normal pseudo-random values with
mean alpha:
> yt <- arima.sim(model = list(ar = theta1), n = n,
rand.gen = function(n) {
rnorm(n, mean = alpha)
})

Univariate Time Series Models

12
10
8
6

Yt AR(1), theta1= 0.8 drift= 2

14

248

100

200

300

400

500

Time

Figure 8.4

Graphical representation of an AR(1) process with the presence of a drift

To obtain the plot of the time series {Yt }, see Fig. 8.4, use
> plot(yt, ylab = paste("Yt AR(1), theta1= ", theta1,
" drift= ", alpha))
We can compute the mean and the variance of Yt , which can be compared with their
theoretical values

1
= 10 and V ar(Yt ) =
= 2.778.
E(Yt ) =
1 1
1 12
> mean(yt)
[1] 10.10053
> var(yt)
[1] 2.179227
Finally we repeat the procedure k = 200 times and obtain summary statistics for the
mean and the variance and plot an histogram for the estimates. See Figures 8.5 and
8.6.

Univariate Time Series Models

249

> k <- 200


> simula <- function(n = 500, alpha = 2, theta1 = 0.8) {
yt <- arima.sim(model = list(ar = theta1), n = n,
rand.gen = function(n) {
rnorm(n, mean = alpha)
})
return(c(mean(yt), var(yt)))
}
> a <- replicate(k, simula())
> rowMeans(a)
[1] 10.011119 2.734807
> apply(a, 1, var)
[1] 0.04990404 0.11620690
> hist(a[1, ], freq = FALSE)
> curve(dnorm(x, mean = mean(a[1, ]), sd = sd(a[1,
])), add = TRUE)
> hist(a[1, ], freq
xlim = c(9,
> par(new = TRUE)
> plot(density(a[1,
xlim = c(9,

8.2

= FALSE, main = "", xlab = "",


11), ylim = c(0, 2))
]), main = "histogram with kernel density",
11), ylim = c(0, 2))

Autocorrelation, Partial autocorrelation functions and


ARMA model identification

We now consider the study of the autocorrelation function and of the partial
autocorrelation function for a time series with the aim of identifying the order of
an autoregressive or of a moving average model.

8.2.1

Autocorrelation and Partial autocorrelation functions for an


AR(1) process with drift

Let us consider a realization of length n = 500 from an autoregressive stochastic


process of order 1 with drift:
Yt = const + 1 Yt1 + t
where const = 2, 1 = 0.8 and {t } a Gaussian White Noise
> n <- 500
> const <- 2
> theta <- 0.8

(8.7)

250

Univariate Time Series Models

1.0
0.0

0.5

Density

1.5

2.0

Histogram of a[1, ]

9.4

9.6

9.8

10.0

10.2

10.4

10.6

a[1, ]

Figure 8.5

Mean estimates distribution for 200 replications of the simulation of {yt }

> yt <- arima.sim(model = list(ar = theta), n = n,


rand.gen = function(n) {
rnorm(n, mean = const)
})
The plot of the autocorrelation function (correlogram) of {Yt }, see Fig. 8.7, can be
obtained by using the function acf which also returns the values of the autocorrelation
function.
> (ytcorrelogram <- acf(yt))
Autocorrelations of series
Syt
S, by lag
0
1
2
3
4
5
6
7
8
1.000 0.801 0.621 0.480 0.383 0.324 0.251 0.191 0.133
9
10
11
12
13
14
15
16
17
0.071 0.034 -0.001 -0.044 -0.051 -0.075 -0.090 -0.110 -0.140
18
19
20
21
22
23
24
25
26
-0.146 -0.163 -0.158 -0.141 -0.126 -0.120 -0.112 -0.095 -0.091

Univariate Time Series Models

251

0.0

0.5

1.0

Density

1.5

2.0

histogram with kernel density

9.0

9.5

10.0

10.5

11.0

N = 200 Bandwidth = 0.0569

Figure 8.6 Mean estimates distribution for 200 replications of the simulation of {yt } and
kernel estimate of the density

We can observe that the autocorrelation function has a quite slow decay; namely for
the AR(1) process defined by relationship (8.7) we have that:
Cor(Yt , Yt1 ) = Cor(yt , yt1 ) = 1 ,
Cor(Yt , Yt2 ) = Cor(yt , yt2 ) = 12 ,
...,
Cor(Yt , Ytk ) = Cor(yt , ytk ) = 1k ,
thus the autocorrelation function for an AR(1) process shows a decay of an exponential
type which is not very slow for larger than 0.7.
The plot of the partial autocorrelation function, see Fig. 8.8, can be obtained with
> (ytpacf <- pacf(yt))
Partial autocorrelations of series
Syt
S, by lag
1

252

Univariate Time Series Models

0.4
0.2

0.0

0.2

ACF

0.6

0.8

1.0

Series yt

10

15

20

25

Lag

Figure 8.7

Autocorrelation function of {yt }, AR(1) process with drift ( = 0.8, drif t = 2)

0.801 -0.058 -0.001


10
11
12
0.013 -0.031 -0.060
19
20
21
-0.051 0.019 0.011

0.034 0.047 -0.066 0.001 -0.033 -0.055


13
14
15
16
17
18
0.058 -0.065 -0.008 -0.037 -0.053 0.008
22
23
24
25
26
0.005 -0.038 0.009 0.019 -0.053

From a theoretical point of view the partial autocorrelation function of an AR(1)


process cuts off at lag 1.
As observed by Shumway and Stoffer (2011) different scales are used by acf and
pacf when plotting the autocorrelation and partial autocorrelation functions: in
particular with reference to the autocorrelation function also the non informative
autocorrelation at lag 0 is produced, which always results equal to 1. The function
acf2 in the package astsa returns the autocorrelation function and the partial
autocorrelation function plots with the same scale, on a unique device.
> library(astsa)
> t(acf2(yt, 20))

Univariate Time Series Models

253

0.4
0.0

0.2

Partial ACF

0.6

0.8

Series yt

10

15

20

25

Lag

Figure 8.8

Partial autocorrelation for {yt }, AR(1) process with drift ( = 0.8, drif t = 2)

[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
0.8 0.62 0.48 0.38 0.32 0.25 0.19 0.13 0.07 0.03
0.8 -0.06 0.00 0.03 0.05 -0.07 0.00 -0.03 -0.06 0.01
[,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20]
ACF
0.00 -0.04 -0.05 -0.08 -0.09 -0.11 -0.14 -0.15 -0.16 -0.16
PACF -0.03 -0.06 0.06 -0.07 -0.01 -0.04 -0.05 0.01 -0.05 0.02
ACF
PACF

Since the values of the autocorrelation and partial autocorrelation functions are
returned as columns of a matrix, we prefer to use, for typographical convenience, the
transpose operator when invoking the acf2 function. The second argument in acf2
establishes the number of lags to be considered when producing the correlogram.
Observe that it is also possible to use the function PacfPlot available in the package
FitAR, which returns 95% confidence intervals for the partial autocorrelations. See
Fig. 8.10.
> library(FitAR)
> PacfPlot(yt)

254

Univariate Time Series Models

0.2

0.2

ACF

0.6

1.0

Series: yt

10
LAG

15

20

10
LAG

15

20

0.2

0.2

PACF

0.6

1.0

Figure 8.9 Autocorrelation and Partial autocorrelations, via acf2, for {yt }, AR(1) process
with drift ( = 0.8, drif t = 2)

As already observed the autocorrelation function shows a slow decay, while the partial
autocorrelation function cuts off at lag 1; so we can conclude that an autoregressive
model of order 1 can fit the data.

8.2.2

Autocorrelation and Partial autocorrelation functions for some


AR(p) processes with drift

We consider the simulation of a realization from the following stochastic process:


Yt = const + 1 Yt1 + 2 Yt2 + 3 Yt3 + 4 Yt4 + 5 Yt5 + t
with const = 2, {t } a Gaussian White Noise and 1 = 0.5, 2 = 0.1, 3 = 0.2, 4 =
0, 5 = 0.2.
> n <- 500
> const <- 2

255

0.0
1.0

0.5

pacf

0.5

1.0

Univariate Time Series Models

10

12

14

lag
95% confidence intervals for pacf

Figure 8.10 Partial autocorrelation for {yt }, AR(1) process with drift ( = 0.8, drif t = 2)
(95% confidence intervals)

Try with
> yt <- arima.sim(model = list(ar = c(0.5,0.1,0.2,0,.2)), n = n,
rand.gen = function(n) {rnorm(n, mean = const)})
To understand why you did not succeed in the generation of a realization from this
process, we can check if the roots of the characteristic equation allow for a stationary
solution of the stochastic difference equation. This can be made with the function
InvertibleQ in the package FitAR which checks if the roots of the characteristic
equation, here
1 0.5z 0.1z 2 0.2z 3 0.2z 5 = 0,
lie outside the unit circle.
> library(FitAR)
> InvertibleQ(c(0.5, 0.1, 0.2, 0, 0.2))
[1] FALSE

256

Univariate Time Series Models

Let us now simulate a realization from the following stochastic processes


Yt = const + 1 Yt1 + 2 Yt2 + 3 Yt3 + 4 Yt4 + 5 Yt5 + t
with const = 2, {t } a Gaussian White Noise and:
Process
Process
Process
Process

1
2
3
4

1
1
1
1

= 0.5
=0
= 0.1
= 0.2

2
2
2
2

= 0.1
= 0.1
= 0.5
= 0.1

3
3
3
3

= 0.2
= 0.2
=0
=0

4
4
4
4

=0
=0
= 0.2
= 0.5

5
5
5
5

=0
= 0.5
=0
=0

(8.8)

The autocorrelation function, the partial autocorrelation function and the confidence
intervals for the partial autocorrelation function will also be plotted.
We can define a function for simulating the four processes.
> n <- 500
> const <- 2
> genera <- function(thetas) {
yt <<- arima.sim(model = thetas, n = n, rand.gen = function(n) {
rnorm(n, mean = const)
})
print("theta parameters: ")
print(paste("th", 1:length(thetas[[1]]), "=",
thetas[[1]], sep = "", collapse = ","))
print(t(acf2(yt, 20)))
}
See Figures 8.11, 8.12, 8.13, 8.14.
For these processes it is more complicated to derive the theoretical behaviour of
the autocorrelation function analitically. A simpler way (the reader is invited to try
with this method) is to simulate a very long realization from the processes (e.g. with
n = 105 ) and check for the behaviour of the estimated autocorrelation and partial
autocorrelation functions, which will be very close to their theoretical counterparts.
We can, however, observe that the partial autocorrelation function can help us
identifying the order of the autoregressive model apt to describe the involved time
series. It cuts off at the maximum autoregressive lag.

Univariate Time Series Models

257

ACF
0.0 0.2 0.4 0.6 0.8 1.0

Series: yt

10
LAG

15

20

10
LAG

15

20

PACF
0.0 0.2 0.4 0.6 0.8 1.0

Figure 8.11 Autocorrelation and Partial autocorrelation plots for a realization from
Process 1: AR with drif t = 2, the autoregressive coefficients are 0.5, 0.1, 0.2, 0, 0

> genera(list(ar = c(0.5, 0.1, 0.2, 0, 0)))


[1] "theta parameters: "
[1] "th1=0.5,th2=0.1,th3=0.2,th4=0,th5=0"
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
ACF 0.68 0.57 0.55 0.47 0.38 0.36 0.31 0.26 0.19 0.18
PACF 0.68 0.20 0.19 0.02 -0.06 0.06 -0.02 -0.01 -0.08 0.04
[,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20]
ACF
0.18
0.2 0.20 0.24 0.22 0.25 0.24 0.26 0.29 0.32
PACF 0.06
0.1 0.05 0.09 -0.01 0.07 -0.01 0.04 0.06 0.09

258

Univariate Time Series Models

10
LAG

15

20

10
LAG

15

20

0.2

0.2

PACF

0.6

1.0

0.2

0.2

ACF

0.6

1.0

Series: yt

Figure 8.12 Autocorrelation and Partial autocorrelation plots for a realization from
Process 2: AR with drif t = 2, the autoregressive coefficients are 0, 0.1, 0.2, 0, 0.5

> genera(list(ar = c(0, 0.1, 0.2, 0, 0.5)))


[1] "theta parameters: "
[1] "th1=0,th2=0.1,th3=0.2,th4=0,th5=0.5"
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
ACF -0.1 0.21 0.29 -0.09 0.64 -0.09 0.13 0.23 -0.09 0.42
PACF -0.1 0.20 0.35 -0.07 0.57 -0.05 -0.04 -0.07 -0.01 0.02
[,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20]
ACF -0.13 0.08 0.16 -0.13 0.26 -0.15 0.03
0.1 -0.12 0.18
PACF -0.09 -0.01 -0.02 -0.05 -0.03 -0.04 0.00
0.0 0.05 0.05

Univariate Time Series Models

259

ACF
0.0 0.2 0.4 0.6 0.8 1.0

Series: yt

10
LAG

15

20

10
LAG

15

20

PACF
0.0 0.2 0.4 0.6 0.8 1.0

Figure 8.13 Autocorrelation and Partial autocorrelation plots for a realization from
Process 3: AR with drif t = 2, the autoregressive coefficients are 0.1, 0.5, 0, 0.2, 0

> genera(list(ar = c(0.1, 0.5, 0, 0.2, 0)))


[1] "theta parameters: "
[1] "th1=0.1,th2=0.5,th3=0,th4=0.2,th5=0"
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
ACF 0.13 0.59 0.09 0.51 0.11 0.36 0.09 0.23 0.08 0.18 0.09
PACF 0.13 0.58 -0.03 0.25 0.04 -0.03 0.01 -0.08 0.00 0.01 0.04
[,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20]
ACF
0.18 0.10 0.12 0.12 0.06 0.08 0.05 0.10 0.06
PACF 0.11 0.03 -0.03 0.05 -0.11 -0.05 0.02 0.03 0.07

260

Univariate Time Series Models

ACF
0.0 0.2 0.4 0.6 0.8 1.0

Series: yt

10
LAG

15

20

10
LAG

15

20

PACF
0.0 0.2 0.4 0.6 0.8 1.0

Figure 8.14 Autocorrelation and Partial autocorrelation plots for a realization from
Process 4: AR with drif t = 2, the autoregressive coefficients are 0.2, 0.1, 0, 0.5, 0

> genera(list(ar = c(0.2, 0.1, 0, 0.5, 0)))


[1] "theta parameters: "
[1] "th1=0.2,th2=0.1,th3=0,th4=0.5,th5=0"
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
ACF 0.39 0.29 0.27 0.60 0.36 0.27 0.24 0.40 0.27 0.21 0.21
PACF 0.39 0.16 0.14 0.53 0.00 0.01 0.05 0.03 -0.05 -0.01 0.03
[,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20]
ACF
0.26
0.2 0.17 0.17 0.16 0.11 0.14 0.18 0.12
PACF -0.05
0.0 0.01 -0.01 -0.03 -0.05 0.05 0.08 -0.04

320

210

320

200

320

200
BIC

BIC

310
310
300

190
190

180

290

180

270

250

270

240

270

240

260

230
BIC

BIC

(Intercept)
Ylag1
Ylag2
Ylag3
Ylag4
Ylag5
errorlag2
errorlag3
errorlag4
errorlag5
errorlag1

180

300

(Intercept)
Ylag1
Ylag2
Ylag3
Ylag4
Ylag5
errorlag1
errorlag2
errorlag3
errorlag4
errorlag5

300

260

261

(Intercept)
Ylag1
Ylag2
Ylag3
Ylag4
Ylag5
errorlag1
errorlag2
errorlag3
errorlag4
errorlag5

(Intercept)
Ylag1
Ylag2
Ylag3
Ylag4
Ylag5
errorlag2
errorlag3
errorlag4
errorlag5
errorlag1

Univariate Time Series Models

230
220

250

220

250

210

240

210

Figure 8.15 Identification by means of the BIC criterion of the time series simulated with
relationships (8.8) from some autoregressive
 processes. The correspondence between graphs
Process 1 Process 3
and processes is:
.
Process 2 Process 4

The function armasubsets available in the package TSA can also be used to establish
the order of the autoregressive process, that is for ARMA model selection. The
selection algorithm orders different models, chosen following a method proposed by
Hannan and Rissanen (1982), according to their BIC (Bayesian Information Criterion)
value. See Fig. 8.15.
> layout(matrix(1:4, 2, 2))
> library(TSA)
> sapply(1:4, function(i) plot(armasubsets(data[, i],
nar = 5, nma = 5)))
> detach("package:TSA")
We detach the package TSA since it re-defines the functions acf and arima.

262

Univariate Time Series Models

8.2.3

Autocorrelation and Partial autocorrelation functions for a


MA(1) process

Let us now simulate a realization from the following stochastic process:


yt = t + 1 t1
with 1 = 0.8 and {t } a Gaussian White Noise.
> n <- 500
> alpha <- 0.8
> yt <- arima.sim(model = list(ma = alpha), n = n)
and plot the autocorrelation function (correlogram) and the partial autocorrelation
function of {yt }, see Fig. 8.16.
> source("acf2.r")
> t(acf2(yt, 20, ma.test = TRUE))
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
ACF 0.52 -0.01 -0.11 -0.08 -0.04 -0.07 -0.05 0.04 0.06 0.01
PACF 0.52 -0.38 0.16 -0.14 0.08 -0.15 0.11 0.00 0.00 -0.02
[,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20]
ACF -0.04 -0.10 -0.07 0.02 0.06 0.02 -0.06 -0.07 -0.01 -0.01
PACF -0.03 -0.09 0.04 0.03 0.02 -0.05 -0.06 0.01 0.00 -0.02
We can observe, as expected from theoretical results pertaining a MA(1) process,
that the autocorrelation function cuts off at lag 1 while the partial autocorrelation
function shows an exponential decay.
By plotting the autocorrelation function two standard error bounds for k based
on the estimated variance 1 + 2
21 + . . . + 2
2k1 have been included, see Verbeeks
relationship (8.64). Observe that this estimate of the variance holds under the
hypothesis that q = 0 for q > k, that is the process is a Moving Average process of
order k. On the booksite www.educatt.it/libri/materiali a version of the function
acf2 is present, which includes also the logical argument ma.test that can be set to
FALSE or TRUE, according respectively to the white noise or the moving average
hypotheses. Namely, if we want check if a series of residuals is white noise we will
leave ma.test=FALSE.

8.2.4

Autocorrelation and Partial autocorrelation functions for some


MA(p) processes

Let us simulate a realization from the following stochastic process:


yt = t + 1 t1 + 2 t2 + 3 t3 + 4 t4 + 5 t5

(8.9)

with {t } a Gaussian White Noise and 1 = 0.5, 2 = 0.1, 3 = 0.2, 4 = 0, 5 = 0.2.

Univariate Time Series Models

263

0.4

0.0

ACF
0.4

0.8

Series: yt

10
LAG

15

20

10
LAG

15

20

0.4

0.0

PACF
0.4

0.8

Figure 8.16
( = 0.8)

Autocorrelation and partial autocorrelation functions of {yt }, MA(1) process

> n <- 500


> yt <- arima.sim(model = list(ma = c(0.5, 0.1, 0.2,
0, 0.2)), n = n)
To establish if the process is stationary and invertible, we check if the roots of the
characteristic equation
1 0.5z 0.1z 2 0.2z 3 0.2z 5 = 0,
allow for an invertible solution of the stochastic difference equation (8.9).
> library(FitAR)
> InvertibleQ(c(0.5, 0.1, 0.2, 0, 0.2))
[1] FALSE

264

Univariate Time Series Models

Let us now plot the autocorrelation function and the partial autocorrelation function
for the following parameter configurations:
Process
Process
Process
Process

1
2
3
4

1
1
1
1

= 0.5
=0
= 0.1
= 0.2

2
2
2
2

= 0.1
= 0.1
= 0.5
= 0.1

3
3
3
3

= 0.2
= 0.2
=0
=0

4
4
4
4

=0
=0
= 0.2
= 0.5

5
5
5
5

=0
= 0.5
=0
=0

(8.10)

> n <- 500


> genera <- function(alphas, ma.test = TRUE) {
yt <<- arima.sim(model = alphas, n = n)
print("alpha parameters: ")
print(paste("al", 1:length(alphas[[1]]), "=",
alphas[[1]], sep = "", collapse = ","))
source("acf2.r")
print(t(acf2(yt, 20, ma.test = ma.test)))
}
See Figures 8.17, 8.18, 8.19, 8.20.
As before to study the theoretical behaviour of the autocorrelation and partial
autocorrelation functions the reader can simulate a very long realization from the
processes (e.g. with n = 105 ).
Observe that the autocorrelation function (correlogram) can help us identifying the
order of the moving average model apt to describe the involved time series. It cuts
off at the maximum moving average lag.

Univariate Time Series Models

265

ACF
0.0 0.2 0.4 0.6 0.8 1.0

Series: yt

10
LAG

15

20

10
LAG

15

20

PACF
0.0 0.2 0.4 0.6 0.8 1.0

Figure 8.17 Autocorrelation and Partial autocorrelation plots for a realization from
Process 1: MA with moving average parameters 0.5, 0.1, 0.2, 0, 0

> genera(list(ma = c(0.5, 0.1, 0.2, 0, 0)), ma.test = TRUE)


[1] "alpha parameters: "
[1] "al1=0.5,al2=0.1,al3=0.2,al4=0,al5=0"
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
ACF 0.52 0.2 0.12 -0.02 0.02 0.08 0.09 0.02 0.02 0.01 0.01
PACF 0.52 -0.1 0.07 -0.13 0.13 0.02 0.05 -0.09 0.06 -0.02 0.03
[,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20]
ACF
0.03 0.07 0.07 0.00 -0.02 -0.03 -0.06 0.02 0.04
PACF -0.01 0.08 0.00 -0.06 -0.01 -0.01 -0.03 0.08 -0.04

266

Univariate Time Series Models

0.2

0.2

ACF

0.6

1.0

Series: yt

10
LAG

15

20

10
LAG

15

20

0.2

0.2

PACF

0.6

1.0

Figure 8.18 Autocorrelation and Partial autocorrelation plots for a realization from
Process 2: MA with moving average parameters 0, 0.1, 0.2, 0, 0.5

> genera(list(ma = c(0, 0.1, 0.2, 0, 0.5)), ma.test = TRUE)


[1] "alpha parameters: "
[1] "al1=0,al2=0.1,al3=0.2,al4=0,al5=0.5"
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
ACF 0.01 0.17 0.25 -0.02 0.37 -0.02 -0.05 0.05 -0.05 0.00
PACF 0.01 0.17 0.25 -0.04 0.31 -0.07 -0.16 -0.12 0.01 -0.08
[,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20]
ACF -0.04 -0.04 0.04 -0.02 -0.01 -0.04 -0.07 0.04 0.00 -0.04
PACF 0.01 0.07 0.10 -0.01 0.01 -0.08 -0.11 -0.01 0.06 0.00

Univariate Time Series Models

267

ACF
0.0 0.2 0.4 0.6 0.8 1.0

Series: yt

10
LAG

15

20

10
LAG

15

20

PACF
0.0 0.2 0.4 0.6 0.8 1.0

Figure 8.19 Autocorrelation and Partial autocorrelation plots for a realization from
Process 3: MA with moving average parameters 0.1, 0.5, 0, 0.2, 0

> genera(list(ma = c(0.1, 0.5, 0, 0.2, 0)), ma.test = TRUE)


[1] "alpha parameters: "
[1] "al1=0.1,al2=0.5,al3=0,al4=0.2,al5=0"
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
ACF 0.09 0.47 -0.03 0.18 -0.04 0.03 0.03 0.00 0.00 0.00
PACF 0.09 0.46 -0.11 -0.03 0.02 -0.06 0.08 0.01 -0.06 0.03
[,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20]
ACF
0.04 0.07 0.07 0.11 0.05 0.07 0.06 0.05 0.11 0.08
PACF 0.06 0.07 0.04 0.05 -0.02 0.00 0.07 0.01 0.08 0.06

268

Univariate Time Series Models

0.2

0.2

ACF

0.6

1.0

Series: yt

10
LAG

15

20

10
LAG

15

20

0.2

0.2

PACF

0.6

1.0

Figure 8.20 Autocorrelation and Partial autocorrelation plots for a realization from
Process 4: MA with moving average parameters 0.2, 0.1, 0, 0.5, 0

> genera(list(ma = c(0.2, 0.1, 0, 0.5, 0)), ma.test = TRUE)


[1] "alpha parameters: "
[1] "al1=0.2,al2=0.1,al3=0,al4=0.5,al5=0"
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
ACF 0.16 0.13 0.03 0.4 0.04 0.04 -0.06 -0.02 0.04 0.01
PACF 0.16 0.10 0.00 0.4 -0.09 -0.03 -0.05 -0.20 0.13 0.00
[,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20]
ACF -0.03 -0.07 -0.03 -0.02 0.00 -0.07 0.00 0.01 0.04 -0.04
PACF 0.00 0.03 -0.12 0.00 0.02 -0.07 0.11 0.01 0.01 0.00

110

150

110

150

110

140

100
94

130

88

130

82

100

110

100

110

99

100

95

98

BIC

BIC

(Intercept)
Ylag1
Ylag2
Ylag3
Ylag4
Ylag5
errorlag1
errorlag2
errorlag3
errorlag4
errorlag5

140

89

269

(Intercept)
Ylag1
Ylag2
Ylag3
Ylag4
Ylag5
errorlag3
errorlag4
errorlag5
errorlag1
errorlag2
110

150

BIC

150

(Intercept)
Ylag1
Ylag2
Ylag3
Ylag4
Ylag5
errorlag1
errorlag2
errorlag3
errorlag4
errorlag5

BIC

(Intercept)
Ylag1
Ylag2
Ylag3
Ylag4
Ylag5
errorlag1
errorlag2
errorlag3
errorlag4
errorlag5

Univariate Time Series Models

97

84

94

80

89

78

83

Figure 8.21 Identification by means of the BIC criterion of the time series simulated with
relationships (8.8)from some moving average
 processes. The correspondence between graphs
Process 1 Process 3
and processes is:
.
Process 2 Process 4

The function armasubsets, see Section 8.2.2, available in the package TSA can also
be used for model selection. See Figures 8.21.
> library(TSA)
> sapply(1:4, function(i) plot(armasubsets(data[, i],
nar = 5, nma = 5)))
> detach("package:TSA")
We can observe that the methods, described above, are not always definitively
resolving for the identification of ARMA models fitting the 4 time series. For Process
3, the autocorrelation function suggests the correct moving average order of the
generating process, while both the partial autocorrelation function and Bayesian
Information Criterion give hints about the presence of an autoregressive model. We
consider again the issue in Section 8.2.6.

270

Univariate Time Series Models

8.2.5

Autocorrelation and Partial autocorrelation functions for an


ARMA(1,1) process

We simulate a realization from the following stochastic process:


yt = 1 yt1 + t + 1 t1
with 1 = 0.8, 1 = 0.5 and {t } a Gaussian White Noise.
>
>
>
>
>

n <- 500
theta <- 0.8
alpha <- 0.5
set.seed(1234)
yt <- arima.sim(model = list(ar = theta, ma = alpha),
n = n)

and plot the autocorrelation function (correlogram) and the partial autocorrelation
function of {yt }. See Fig. 8.22.
> t(acf2(yt, 20, ma.test = TRUE))
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
ACF 0.89 0.70 0.54 0.41 0.3 0.21 0.15 0.10 0.04 0.00 -0.04
PACF 0.89 -0.43 0.20 -0.14 0.0 0.02 0.01 -0.11 0.01 -0.01 -0.05
[,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20]
ACF -0.07 -0.1 -0.12 -0.11 -0.08 -0.04 0.00 0.02 0.04
PACF -0.03
0.0 0.00 0.06 0.05 0.03 0.01 -0.04 0.04
We can observe that the autocorrelation and the partial autocorrelation functions die
out very slowly.
A large order is necessary to identify both an AR model and a MA model to the
simulated time series. So the recourse to a more parsimonious ARMA model can help
relieving model complexity.
Fig. 8.23 shows the identification by means of the BIC criterion using the function
armasubsets
> library(TSA)
> plot(armasubsets(yt, nar = 5, nma = 5))
> detach("package:TSA")

8.2.6

Problems in identifying an ARMA model for a time series

We simulate a realization from the ARMA(4,0,1) stochastic process:


yt = 0.2yt1 + 0.7yt4 + t + 0.4t1
with 1 = 0.8 and {t } a Gaussian White Noise.

Univariate Time Series Models

271

0.4

0.0

ACF
0.4

0.8

Series: yt

10
LAG

15

20

10
LAG

15

20

0.4

0.0

PACF
0.4

0.8

Figure 8.22 Estimate of the autocorrelation and partial autocorrelation functions for a
realization from {yt }, ARMA(1,1) process with parameters = 0.8, = 0.5

> n <- 100


> set.seed(12345)
> yt <- arima.sim(model = list(ar = c(0.2, 0, 0, 0.7),
ma = 0.4), n = n)
and check the autocorrelation function (correlogram) and the partial autocorrelation
function of {yt } for identifying the order of an ARMA model. See Fig. 8.24.
> t(acf2(yt, 20, ma.test = TRUE))
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
ACF 0.36 -0.20 0.04 0.61 0.37 -0.12 -0.17 0.27 0.31 0.03
PACF 0.36 -0.37 0.36 0.52 -0.14 0.04 -0.15 0.04 -0.02 0.11
[,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20]
ACF -0.10 0.17 0.22 0.08 -0.04 0.11 0.16 0.05 -0.09 0.00
PACF 0.13 0.09 -0.11 0.04 -0.06 0.02 0.05 -0.06 -0.03 -0.11

errorlag5

errorlag4

errorlag3

errorlag2

errorlag1

Ylag5

Ylag4

Ylag3

Ylag2

Ylag1

Univariate Time Series Models

(Intercept)

272

840

840

840

BIC

830

830

820

820

720

Figure 8.23 Identification of an ARMA model by means of the BIC criterion for a
realization from {yt }, ARMA(1,1) process with parameters = 0.8, = 0.5

We consider the behaviour of the armasubsets function for different values of the
arguments nar and nma, see Fig. 8.25.
> library(TSA)
> layout(matrix(1:4, 2, 2))
> plot(armasubsets(yt, nar = 5,
Reordering variables and trying
> plot(armasubsets(yt, nar = 8,
Reordering variables and trying
> plot(armasubsets(yt, nar = 5,
Reordering variables and trying
> plot(armasubsets(yt, nar = 8,
Reordering variables and trying
> detach("package:TSA")

nma = 5))
again:
nma = 5))
again:
nma = 8))
again:
nma = 8))
again:

Univariate Time Series Models

273

0.4

0.0

ACF
0.4

0.8

Series: yt

10
LAG

15

20

10
LAG

15

20

0.4

0.0

PACF
0.4

0.8

Figure 8.24 Autocorrelation and partial autocorrelation functions of {yt }, AR(1) process
with parameters = 0.8

We can conclude that identification methods based on the use of the autocorrelation
function and the partial autocorrelation function or on the use of information criterion,
like the BIC, can help the researcher but cannot definitively solve the problem.
In particular the function armasubsets has to be called for different combinations
of the maximum AR and MA orders to check for the stability of the proposed
identification solution. Here the model suggested by armasubsets depends clearly
on the maximum order of nar and nma the researcher has chosen.
Another solution is to use the tools available in the package forecast, which
perform automatic model selection and forecasting with reference to Exponential
smoothing and ARIMA methods, see Hyndman and Khandakar (2008). In particular
the function auto.arima deals also with seasonal and integrated5 time series providing
the possible orders of differentiation by having recourse to the KPSS and CanovaHansen tests. See ?auto.arima for more information.
5 See Section 8.4 for Integrated ARMA (ARIMA) models. Seasonal ARMA models are not treated
here.

54

53

54

52

51

49

50
BIC

54

47

48
44

40

40

38

38

33

36

56

54

54

54

52

51

51

50
BIC

BIC

(Intercept)
Ylag1
Ylag2
Ylag3
Ylag4
Ylag5
Ylag6
Ylag7
Ylag8
errorlag5
errorlag6
errorlag7
errorlag8
errorlag1
errorlag2
errorlag3
errorlag4

42

(Intercept)
Ylag1
Ylag2
Ylag3
Ylag4
Ylag5
Ylag6
Ylag7
Ylag8
errorlag5
errorlag1
errorlag2
errorlag3
errorlag4

BIC

(Intercept)
Ylag1
Ylag2
Ylag3
Ylag4
Ylag5
errorlag2
errorlag3
errorlag4
errorlag5
errorlag6
errorlag7
errorlag8
errorlag1

Univariate Time Series Models

(Intercept)
Ylag1
Ylag2
Ylag3
Ylag4
Ylag5
errorlag2
errorlag3
errorlag4
errorlag5
errorlag1

274

47

48

42

44

40

40

38

38

33

36

Figure 8.25 Identification for diffferent choices of nar and nma for {yt }, ARMA(1,0)
process with parameter = 0.8

> library(forecast)
> auto.arima(yt, max.p = 10, max.q = 10)
Series: yt
ARIMA(3,1,2)
Coefficients:
ar1
-0.7519
s.e.
0.0914

ar2
-0.6415
0.0884

ar3
-0.7543
0.0674

ma1
0.3568
0.1375

ma2
-0.2349
0.1319

sigma^2 estimated as 0.8383: log likelihood=-133.06


AIC=278.12
AICc=279.03
BIC=293.69
Observe that AIC and BIC criteria are computed by armasubsets and auto.arima
with different formulae. The reader will note that the proposed solution is again
different from those obtained with armasubsets. We have encountered a realization

Univariate Time Series Models

275

giving serious problems in its parameter identification.


Sometimes the frequency of data may help the researcher solving the identification
problem: e.g. in presence of daily data an AR(7) component may be present, or an
AR(5) if data were collected only on working days.

8.3

On the bias of the OLS estimator of the autoregressive


coefficient for an AR(1) process with AR(1) errors

Let {yt } be an autoregressive stochastic process of order 1 whose evolution does


not depend upon a White Noise generating mechanism for the error but on an
autoregressive process of order 1, say:
yt = yt1 + ut
ut = ut1 + t
with = 0.5 and = 0.5 and {t } a Gaussian White Noise. We consider the
estimation of the parameter in the first relationship by means of Ordinary Least
Squares without taking into account the autoregressive nature of the error {ut }.
Let us simulate a realization of length n = 100:
>
>
>
>
>

n <- 100
beta <- 0.5
rho <- 0.5
ut <- arima.sim(model = list(ar = rho), n = n)
yt <- arima.sim(model = list(ar = beta), n = n, innov = ut)

The argument innov in the function arima.sim defines the sequence to be used as
error in the generating process.
The estimate of results:
> lm(yt[-1] ~ -1 + yt[-n])$coef
yt[-n]
0.7504439
which is quite different from the theoretical value 0.5.
We can expect the following theoretical value for the bias (according to asymptotic
results):
2
) = (1 ) = 0.3.
plim(E()
1 +
Let now perform the preceding task k = 1000 times, obtain summary statistics and
plot an histogram for the bias of the estimates of the coefficient (beta).
To this purpose we can create a function
> simularar <- function(n = 100, beta = 0.5, rho = 0.5) {
ut <- arima.sim(model = list(ar = rho), n = n)
yt <- arima.sim(model = list(ar = beta), n = n,
innov = ut)
lm(yt[-1] ~ -1 + yt[-n])$coef
}

276

Univariate Time Series Models

50

Frequency

100

150

Histogram of betahat beta

0.10

0.15

0.20

0.25

0.30

0.35

0.40

betahat beta

Figure 8.26

Distribution of the parameter estimate bias

To replicate k times the preceding function and collect the results, we can use the
function replicate
> k <- 1000
> betahat <- replicate(k, simularar())
> summary(betahat - beta)
Min. 1st Qu. Median
Mean 3rd Qu.
0.1159 0.2587 0.2953 0.2901 0.3234

Max.
0.4174

Figures 8.26 and 8.27 show the histogram and the density estimate for the distribution
of the bias of the parameter estimates. The graphs can be obtained with the code
> hist(betahat - beta)
> plot(density(betahat - beta))
To obtain in a unique graph, see Fig. 8.28, the histogram, the kernel density and the
normal density use the code.

Univariate Time Series Models

277

4
0

Density

density.default(x = betahat beta)

0.1

0.2

0.3

0.4

N = 1000 Bandwidth = 0.01092

Figure 8.27

Density estimate of the distribution of the parameter estimate bias

> hist(betahat - beta, freq = FALSE, breaks = 11, xlim = c(0.1,


0.45), ylim = c(0, 10), xlab = "", main = "")
> par(new = TRUE)
> plot(density(betahat - beta), xlim = c(0.1, 0.45),
ylim = c(0, 10), xlab = "", main = "")
> curve(dnorm(x, mean(betahat - beta), sd(betahat beta)), lwd = 3, add = TRUE)
Pay attention the latter code has a disadvantage, in the sense that we have to specify
the ylim.
It is possible to use a code that makes use of the package lattice, see Fig. 8.29 top.
The general instruction is in this case histogram; the successive panels establish
what effectively will appear on the graph: in this case a normal density (with the
same mean and variance of the data) and the kernel estimate of the density function.
With the parameter n, a smootheed plot of the density like in Fig. 8.27 is produced.
The advantage of using the function histogram is that the axis limits can be defined
only once.

Univariate Time Series Models

Density

10

278

0.10

0.15

0.20

0.25

0.30

0.35

0.40

0.45

Figure 8.28 Histogram, Kernel density estimate, and Normal distribution approximation
of the distribution of the parameter estimate bias

See Sarkar (2008) for a detailed presentation of the package lattice and at
demo(lattice) and ?lattice::lattice for its main features.
> library(lattice)
> tp1 <- histogram(~(betahat - beta), type = "density",
breaks = 11, panel = function(x, ...) {
panel.histogram(x, ...)
panel.mathdensity(dmath = dnorm, col = "black",
lwd = 3, args = list(mean = mean(x),
sd = sd(x)), n = 101)
panel.densityplot(x, col = "black", lwd = 1,
n = 101, ...)
})
Package lattice defines the axis limits by internal algorithms, see Fig. 8.29 top,
being always possible to define their values when results from automatic procedures
are not satisfying, see Fig. 8.29 bottom.

Univariate Time Series Models

279

Density

0
0.1

0.2

0.3

0.4

(betahat beta)

Density

8
6
4
2

0.1

0.2

0.3

0.4

(betahat beta)

Figure 8.29 Histogram, Normal distribution approximation and Kernel density estimate
of the distribution of the parameter estimate bias by means of the package lattice

> tp2 <- histogram(~(betahat - beta), type = "density",


breaks = 11, ylim = c(0, 10), panel = function(x,
...) {
panel.histogram(x, ...)
panel.mathdensity(dmath = dnorm, col = "black",
lwd = 3, args = list(mean = mean(x),
sd = sd(x)), n = 101)
panel.densityplot(x, col = "black", lwd = 1,
n = 101, ...)
})
> plot(tp1, split = c(1, 1, 1, 2))
> plot(tp2, split = c(1, 2, 1, 2), newpage = FALSE)

280

Univariate Time Series Models

8.3.1

Some remarks on the use of the function curve

Let y be a vector of 1000 pseudo-random numbers from a Normal distribution, with


mean 10 and variance 4.
To produce the histogram of y use the function freq, by specifying freq=FALSE to
obtain densities:
> y <- rnorm(1000, mean = 10, sd = 2)
> hist(y, freq = FALSE)
If you want to add to the graph the density function of the normal distribution with
the mean and s.d. estimated by your data, you can use the function code:
> curve(dnorm(x, mean = mean(y), sd = sd(y)), add = TRUE)
The variable x, you see in the function dnorm is an internal variable defining the
domain of the density function. The parameters mean and sd of the function dnorm
must not refer to x. So your vector of random numbers cannot be named x:
Try the following code, which does not work correctly:
> x <- rnorm(1000, mean = 10, sd = 2)
> hist(x, freq = FALSE)
> curve(dnorm(x, mean = mean(x), sd = sd(x)), add = TRUE)
and the following one, which works correctly.
>
>
>
>

x <- rnorm(1000, mean = 10, sd = 2)


y <- x
hist(x, freq = FALSE)
curve(dnorm(x, mean = mean(y), sd = sd(y)), add = TRUE)

The problem does not ensue when you make recourse to the function histogram of
the package lattice; the general call is histogram applied to the data x (remember
to use the ~ simbol); the panel function specifies which curves have to be plotted.
> library(lattice)
> histogram(~x, type = "density", panel = function(x,
...) {
panel.histogram(x, ...)
panel.mathdensity(dmath = dnorm, col = "black",
lwd = 3, args = list(mean = mean(x), sd = sd(x)),
n = 101)
})

8.4

Estimation of ARIMA Models with the function arima

To estimate6 an ARIMA model it is possible to make use of the function arima,


available in the package stats, which is automatically loaded when R starts.
6 In this section it is assumed the possible presence of unit roots in the characteristic equation
pertaining the autoregressive part of the model, so the more general ARIMA (AutoRegressive
Integrated Moving Average) models (see Verbeeks Section 8.3) are considered. Tests for detecting
the presence of unit root are presented in Section 8.7 following Verbeeks example 8.4.4.

Univariate Time Series Models

281

The main arguments of the function arima are:

the time series x to be modeled;

the order = c(p, d, q) where p and q are respectively the order of the
autoregressive and of the moving average parts, and the integer d is the possible
order of differentiation to render the time series x weakly stationary;

xreg is a vector or a matrix of external regressors describing a deterministic


trend for the series x;

the argument include.mean allows a mean term to be included in the model;


it is by default set to TRUE while it is ignored when d > 0;

Moreover, but the subject is not treated here, the argument seasonal
= list(order = c(0, 0, 0), period = NA) is available for dealing with
seasonal time series, see Hyndman and Khandakar (2008).

The general form for a stationary ARMA(p, q) process, as considered by R, see also
Brockwell and Davis (1991), is:
Yt = 1 (Yt1 ) + . . . + p (Ytp ) + t + 1 t1 + . . . + q tq

(8.11)

where = E(Yt ) is the mean value common to the components of the stochastic
process {Yt }.
The stochastic difference equation may be also referred to the de-meaned process
yt = Yt , or to a de-trended process yt = Yt g(t, Xt ), where g(t, Xt ) is a function
of the time and/or of some other stochastic process {Xt }:
yt = 1 yt1 + 2 yt2 + . . . + p ytp + t + 1 t1 + 2 t2 + . . . + q tq ,
for which it follows that = E(yt ) = 0, t. The above relationships can be re-written
by means of the polynomials in the backward operator B : Byt = yt1 :
p (B) = 1 1 B 2 B 2 . . . p B p ,

q (B) = 1 + 1 B + 2 B 2 + . . . + q B q ,

as follows:
p (B)(Yt ) = q (B)t
or
p (B)yt = q (B)t .
Remind that process {Yt } is stationary if the roots of p (z) = 0 lie outside the unit
circle and it is invertible if the roots of q (z) = 0 lie outside the unit circle. In case
some unit roots, say d, were present in p+d (z) = 0 then {Yt } must be differenced d
times, that is an ARIMA(p, d, q) model has to be fitted.
Different configurations for an ARIMA model are now considered: a stationary
ARMA process and two ARIMA processes with integration orders 1 and 2, for the
case without and with the presence of a drift7 .
7 In

case of an ARMA(p,q) model a drift is present when 6= 0 and from (8.11) we have:
Yt = (1 1 . . . p ) + 1 Yt1 + . . . + p Ytp + t + 1 t1 + . . . + q tq

that is
Yt = drif t + 1 Yt1 + . . . + p Ytp + t + 1 t1 + . . . + q tq .

282

Univariate Time Series Models

Table 8.1 Summary of the code for estimating ARIMA(p, d, q) models with the arima
function and corresponding code for prediction k steps-ahead.
The sarima function in the package astsa can also be used when d 1, see Section 8.5
(B)yt = drif t + (B)t

typical
behaviour

code for parameter estimation;


interpretation of intercept/xreg;
code for predicting (k steps-ahead).

no unit roots

no drift

Fig. 8.30

no unit roots

with drift

Fig. 8.31

1 unit root

no drift

Fig. 8.32

1 unit root

with drift

Fig. 8.33

2 unit roots

no drift

Fig. 8.34

2 unit roots

with drift

Fig. 8.35

arima(x,c(p,0,q),include.mean=FALSE)
not present
predict(obj,n.ahead=k)
arima(x,c(p,1,q))
mean of {Yt }
predict(obj,n.ahead=k)
arima(x,c(p,1,q))
not present
predict(obj,n.ahead=k)
arima(x,c(p,1,q),xreg=1:length(x))
linear deterministic trend slope
predict(obj,k,newxreg=1:k+length(x))
arima(x,c(p,2,q))
not present
predict(obj,n.ahead=k)
arima(x,c(p,1,q),xreg=(1:length(x))2)
quadratic deterministic trend coefficient
predict(obj,k,newxreg=(1:k+length(x))2)

We will assume that no roots of p (z) = 0 are inside the unit circle and that p (z) = 0
and q (z) = 0 do not have any common roots. As an example, the code is reported
for estimating the parameters of a time series simulated with the presence of only an
autoregressive coefficient of order 1.
Remember that among the parameters, that have to be estimated, are also p and q,
the orders respectively of the autoregressive and moving average parts of the model.
We assume here that the order has already been identified e.g. by examining the
autocorrelation and partial autocorrelation functions and/or using automatic criteria
like TSA::arimasubsets, or forecast::auto.arima, see Section 8.2.6.
See Section 8.10.8 for the estimation of non complete (subset) models, which can be
performed by using the argument fixed in the arima function.
In Section 8.5 other R functions are presented for the parameter estimation of ARIMA
models.
Table 8.1 summarizes the use of arima and also shows how to use the function
predict for obtaining k steps-ahead forecasts.

Univariate Time Series Models

8.4.1

283

No unit roots in the characteristic equation p (z) = 0

No drift presence
The time series x is stationary; the estimation of
p (B)yt = q (B)t
can be performed with:
> arima(x, c(p, 0, q), include.mean = FALSE)
Example: Estimation of an ARMA(1,0,0) without drift. See Fig. 8.30.
>
>
>
>

n <- 2000
drift <- 0
set.seed(123)
y <- arima.sim(model = list(ar = 0.8), n = n, rand.gen = function(n)
drift + rnorm(n))

If we do not specify include.mean=FALSE, that is we use the code for the case with
the mean of the series different from 0 (a drift is present), we obtain:
> arima(y, c(1, 0, 0))
Series: y
ARIMA(1,0,0) with non-zero mean
Coefficients:
ar1 intercept
0.7699
0.1464
s.e. 0.0143
0.0969
sigma^2 estimated as 0.9976: log likelihood=-2835.97
AIC=5677.94
AICc=5677.95
BIC=5694.74
we can first observe that the coefficient named intercept in the output is the
estimate for the mean
of {Yt }. Namely the average value for the realization is
mean(y)=0.1507.
As expected the estimate for the mean is not significantly different from zero
(0.1464/0.0969 < 2 ' 1.96 and also the estimate for the drift will not be significant);
so we can proceed to estimate an autoregressive model with 0 mean, that is without
drift, by setting the argument include.mean=FALSE.
> (output <- arima(y, c(1, 0, 0), include.mean = FALSE))
Series: y
ARIMA(1,0,0) with zero mean
Coefficients:
ar1
0.7719

Univariate Time Series Models

0
4

y[1:400]

284

100

200

300

400

Index

Figure 8.30 ARIMA(1,0,0) no drift: an autoregressive behaviour with respect to the mean
0 can be observed. A mean reverting is also present, that is the process tends to come back
to its mean value in the short run

s.e.

0.0142

sigma^2 estimated as 0.9988: log likelihood=-2837.1


AIC=5678.2
AICc=5678.2
BIC=5689.4
Prediction 5 steps-ahead
> predict(output, n.ahead = 5)
$pred
Time Series:
Start = 2001
End = 2005
Frequency = 1
[1] -0.009089261 -0.007016222 -0.005415992 -0.004180736
[5] -0.003227212

Univariate Time Series Models

285

$se
Time Series:
Start = 2001
End = 2005
Frequency = 1
[1] 0.9993847 1.2624990 1.3958956 1.4696365 1.5118672

With drift
The time series is again stationary with mean =

drif t
p (0) .

The estimation of

p (B)(Yt ) = q (B)t
can be performed with
> arima(x, c(p, 0, q))
Estimation of an ARMA(1,0,0) with drift. See Fig. 8.31.
>
>
>
>

n <- 2000
drift <- 2
set.seed(123)
y <- arima.sim(model = list(ar = 0.8), n = n, rand.gen = function(n)
drift + rnorm(n))
> (output <- arima(y, c(1, 0, 0)))
Series: y
ARIMA(1,0,0) with non-zero mean
Coefficients:
ar1 intercept
0.7699
10.1463
s.e. 0.0143
0.0969
sigma^2 estimated as 0.9977: log likelihood=-2835.99
AIC=5677.97
AICc=5677.98
BIC=5694.77
Remind that intercept is the estimate for the mean of {Yt }: the average value in
the realization results mean(y)=10.1507.
Prediction 5 steps-ahead
> predict(output, n.ahead = 5)
$pred
Time Series:
Start = 2001
End = 2005
Frequency = 1
[1] 10.02461 10.05261 10.07418 10.09078 10.10356

286

10
6

y[1:400]

12

14

Univariate Time Series Models

100

200

300

400

Index

Figure 8.31 ARIMA(1,0,0) with drift: an autoregressive behaviour with respect to the
t
mean ( drif
) can be observed
1

$se
Time Series:
Start = 2001
End = 2005
Frequency = 1
[1] 0.9988302 1.2605381 1.3926252 1.4653016 1.5067219

8.4.2

1 unit root in the characteristic equation p+1 (z) = 0

If p+1 (z) = 0 has 1 unit root it follows p+1 (B) = p (B)(1 B) = p (B) = 0.
Here it is essential to distinguish if a drift characterizes the differenced series.

Univariate Time Series Models

287

No drift presence
When no drift is present, the model
p (B)Yt = q (B)t
can be estimated with:
> arima(x, c(p, 1, q))
Observe that the intercept, that is the estimate of the mean, will not be produced.
Estimation of an ARMA(1,1,0) no drift. See Fig. 8.32.
>
>
>
>

n <- 2000
drift <- 0
set.seed(123)
y <- arima.sim(model = list(order = c(1, 1, 0), ar = 0.8),
n = n, rand.gen = function(n) drift + rnorm(n))
> (output <- arima(y, c(1, 1, 0)))
Series: y
ARIMA(1,1,0)
Coefficients:
ar1
0.7719
s.e. 0.0142
sigma^2 estimated as 0.9988: log likelihood=-2837.1
AIC=5678.2
AICc=5678.2
BIC=5689.4
which should correspond to
> arima(diff(y), c(1, 0, 0))
Series: diff(y)
ARIMA(1,0,0) with non-zero mean
Coefficients:
ar1 intercept
0.7699
0.1464
s.e. 0.0143
0.0969
sigma^2 estimated as 0.9976: log likelihood=-2835.97
AIC=5677.94
AICc=5677.95
BIC=5694.74
here the intercept, that is the estimate of the mean of {yt } is produced since
no integration was required in the model, but it is not significant, thus we have to
proceed again to estimate a model without the mean:
> arima(diff(y), c(1, 0, 0), include.mean = FALSE)

Univariate Time Series Models

20
0

y[1:400]

40

60

288

100

200

300

400

Index

Figure 8.32 ARIMA(1,1,0) no drift: an autoregressive behaviour with respect to random


mean levels can be observed. There is no mean reverting

Series: diff(y)
ARIMA(1,0,0) with zero mean
Coefficients:
ar1
0.7719
s.e. 0.0142
sigma^2 estimated as 0.9988: log likelihood=-2837.1
AIC=5678.2
AICc=5678.2
BIC=5689.4
which is equivalent to the first estimation result.
Prediction 5 steps-ahead
> predict(output, n.ahead = 5)
$pred
Time Series:

Univariate Time Series Models

289

Start = 2002
End = 2006
Frequency = 1
[1] 301.3907 301.3837 301.3783 301.3741 301.3709
$se
Time Series:
Start = 2002
End = 2006
Frequency = 1
[1] 0.9993847 2.0333772 3.1199618 4.2095757 5.2762076

With drift
When the drift is present for the differenced series, it corresponds to the presence of
a linear (deterministic) trend in {Yt } and the model
p (B)(Yt bt) = q (B)t
can be estimated with:
> arima(x, c(p, 1, q), xreg = 1:length(x))
where xreg is an external regressor for {Yt }, here the time sequence.
The coefficient corresponding to xreg is the estimate for the linear (deterministic)
slope b, while the estimate for the drift is p (1) xreg = (1 1 . . . p ) xreg.
Namely in case of an ARIMA(1,1,0) we have:
(Yt bt) = (Yt1 b(t 1)) + t
Yt b = [Yt1 b] + t
Yt = (1 )b + Yt1 + t
the drift (1 )b corresponding to a linear (deterministic) trend with slope b.
Estimation of an ARMA(1,1,0) with drift. See Fig. 8.33.
>
>
>
>

n <- 2000
drift <- 0.2
set.seed(123)
y <- arima.sim(model = list(order = c(1, 1, 0), ar = 0.8),
n = n, rand.gen = function(n) drift + rnorm(n))
> (output <- arima(y, c(1, 1, 0), xreg = 1:length(y)))
Series: y
ARIMA(1,1,0)
Coefficients:
ar1 1:length(y)

290

Univariate Time Series Models

s.e.

0.7700
0.0143

1.1158
0.0971

sigma^2 estimated as 0.9977: log likelihood=-2836.02


AIC=5678.04
AICc=5678.05
BIC=5694.85
The coefficient corresponding to xreg is an estimate for the slope in a linear model
without the intercept describing y as a linear function of the time:
> lm(y ~ -1 + c(1:length(y)))
Call:
lm(formula = y ~ -1 + c(1:length(y)))
Coefficients:
c(1:length(y))
1.114
The estimate of the drift is:
> (1 - output$coef[1]) * output$coef[2]
ar1
0.2565846
Prediction 5 steps-ahead
> predict(output, n.ahead = 5, newxreg = 1:5 + length(y))
$pred
Time Series:
Start = 2002
End = 2006
Frequency = 1
[1] 2302.410 2303.450 2304.507 2305.578 2306.660
$se
Time Series:
Start = 2002
End = 2006
Frequency = 1
[1] 0.9988481 2.0306506 3.1136020 4.1983926 5.2592863

291

200
0

100

y[1:400]

300

400

Univariate Time Series Models

100

200

300

400

Index

Figure 8.33 ARIMA(1,1,0) with drift: an autoregressive behaviour characterized by an


unitary root with respect to a linear deterministic trend is present. Deviations from the
trend were represented in Fig. 8.32, they can be modelled as an ARIMA(1,1,0) process
without drift

8.4.3

2 unit roots in the characteristic equation p+2 (z) = 0

If 2 unit roots are present in p+2 (B) = 0 we have p+2 (B) = p (B)(1 B)2 =
p (B)2 = 0.
Also here we have to distinguish if a drift characterizes the differenced series.
No drift presence
When no drift is present, the model
p (B)2 Yt = (B)t
can be estimated with:
> arima(x, c(p, 2, q))

292

Univariate Time Series Models

Estimation of an ARMA(1,2,0) no drift. See Fig. 8.34.


>
>
>
>

n <- 2000
drift <- 0
set.seed(123)
y <- arima.sim(model = list(order = c(1, 2, 0), ar = 0.8),
n = n, rand.gen = function(n) drift + rnorm(n))
> (output <- arima(y, c(1, 2, 0)))
Series: y
ARIMA(1,2,0)
Coefficients:
ar1
0.7719
s.e. 0.0142
sigma^2 estimated as 0.9988: log likelihood=-2837.1
AIC=5678.2
AICc=5678.2
BIC=5689.4
Prediction 5 steps-ahead
> predict(output, n.ahead = 5)
$pred
Time Series:
Start = 2003
End = 2007
Frequency = 1
[1] 219352.7 219654.1 219955.5 220256.8 220558.2
$se
Time Series:
Start = 2003
End = 2007
Frequency = 1
[1] 0.9993847

2.9449759

5.9209014

9.9226820 14.9209790

With drift
When the drift is present for the differenced series, it corresponds to the presence of
a quadratic (deterministic) trend and the model
p (B)2 (Yt ct2 ) = q (B)t
can be estimated with:
> arima(x, c(p, 2, q), xreg = (1:length(x))^2)

293

4000
0

2000

y[1:400]

6000

8000

Univariate Time Series Models

100

200

300

400

Index

Figure 8.34 ARIMA(1,2,0) no drift: an autoregressive behaviour with respect to local


linear trends with random slopes can be observed

The coefficient corresponding to xreg is the estimate for the coefficient c of the
quadratic deterministic trend; the estimate for the drift results (1) 2 xreg =
(1 1 . . . p ) 2 xreg.
Namely in case of an ARIMA(1,2,0) we have:
2 (Yt ct2 ) = 2 (Yt1 c(t 1)2 ) + t
and since 2 ct2 = [ct2 c(t 1)2 ] = (2ct c) = 2c
2 Yt 2c = [2 Yt1 2c] + t
2 Yt = (1 )2c + 2 Yt1 + t ;
the drift (1 )2c corresponds to a deterministic quadratic trend with coefficient c.
Estimation of an ARMA(1,2,0) with drift. See Fig. 8.35.
> n <- 2000
> drift <- 0.2

294

Univariate Time Series Models

> set.seed(123)
> y <- arima.sim(model = list(order = c(1, 2, 0), ar = 0.8),
n = n, rand.gen = function(n) drift + rnorm(n))
> (output <- arima(y, c(1, 2, 0), xreg = (1:length(y))^2))
Series: y
ARIMA(1,2,0)
Coefficients:
ar1 (1:length(y))^2
0.7701
0.5493
s.e. 0.0143
0.0490
sigma^2 estimated as 0.9978: log likelihood=-2836.09
AIC=5678.19
AICc=5678.2
BIC=5694.99
The coefficient corresponding to xreg is an estimate for the coefficient in a linear
model without the intercept describing y as a quadratic function of the time:
> lm(y ~ -1 + I((1:length(y))^2))
Call:
lm(formula = y ~ -1 + I((1:length(y))^2))
Coefficients:
I((1:length(y))^2)
0.5493
The estimate of the drift is:
> (1 - output$coef[1]) * 2 * output$coef[2]
ar1
0.2525803
Prediction 5 steps-ahead
> predict(output, n.ahead = 5, newxreg = (1:5 + length(y))^2)
$pred
Time Series:
Start = 2003
End = 2007
Frequency = 1
[1] 2222338 2224642 2226946 2229252 2231558
$se
Time Series:
Start = 2003
End = 2007
Frequency = 1
[1] 0.9988837

2.9417787

5.9114554

9.9022919 14.8840825

295

40000
0

20000

y[1:400]

60000

80000

Univariate Time Series Models

100

200

300

400

Index

Figure 8.35 ARIMA(1,2,0) with drift: an autoregressive behaviour characterized by two


unitary roots with respect to a quadratic deterministic trend is present (thought we cannot
see it). Deviations from the trend can be modelled as an ARIMA(1,2,0) process without drift

8.5

Some other R functions for ARMA model parameter


estimation

Some other functions are present in the packages of the R system to obtain parameter
estimates for an ARIMA model. See the R help for more information.
Results may differ since numerical methods are adopted within different estimation
techniques, they may depend also on the assumptions made on the starting values of
t , when the order q of the moving average part of the model is greater than 1.
We simulate an ARIMA(2,1,0) process yt = {Yt } and apply some of the available
functions to the estimation of the parameters of an ARMA(2,0,0) model to Dyt = Yt .
> n <- 2000
> drift <- 0.2
> set.seed(123456)

296

Univariate Time Series Models

> yt <- arima.sim(model = list(order = c(2, 1, 0),


ar = c(0.4, 0.2)), n = n, rand.gen = function(n) drift +
rnorm(n))
> Dyt <- diff(yt)

8.5.1

The arima function

drif t
Pay attention, the quantity called intercept is the estimate for the mean 1
=
1 2
0.5; moreover the covariance matrix of the estimates is found from the Hessian of the
log-likelihood, and so may be only a rough guide.

> arima(Dyt, c(2, 0, 0))


Series: Dyt
ARIMA(2,0,0) with non-zero mean
Coefficients:
ar1
ar2
0.4322 0.1971
s.e. 0.0219 0.0219

intercept
0.5043
0.0605

sigma^2 estimated as 1.008:


AIC=5700.41
AICc=5700.43

log likelihood=-2846.2
BIC=5722.81

Remember that if we apply an ARIMA(2,1,0) to the undifferenced series yt by means


of the arima function, we have to use also the argument xreg to take into account
the presence of the drift. The argument xreg must be used in arima by using the
time as external regressor.
> arima(yt, c(2, 1, 0), xreg = 1:length(yt))
Series: yt
ARIMA(2,1,0)
Coefficients:
ar1
ar2
0.4322 0.1971
s.e. 0.0219 0.0219

1:length(yt)
0.5038
0.0605

sigma^2 estimated as 1.008:


AIC=5700.41
AICc=5700.43

log likelihood=-2846.2
BIC=5722.81

Small differences in the parameter estimates may be present since numerical


procedures are considered to find roots maximizing the likelihood function. Maximum
likelihood estimators are consistent and the differences vanish for longer series.
The mean of Dyt = Yt corresponds to the average difference between consecutive
observations of Yt that is, in presence of a unit root, to the slope of a linear
deterministic trend describing the evolution of Yt , see Section 8.4.2.

Univariate Time Series Models

8.5.2

297

The sarima function in the package astsa

If there is no integration or if the order of integration8 is 1, the function sarima in


the package astsa which wraps the use of xreg can be used.
For fitting an ARIMA(p,d,q) model (d=0 or 1) to a time series x the call is
sarima(x,p,d,q), see http://www.stat.pitt.edu/stoffer/tsa2/Examples.htm
for more details. It also return values of the AIC and BIC on the same scale as
Verbeeks formulae (8.68), (8.69).
> library(astsa)
> sarima(Dyt, 2, 0, 0, details = FALSE)
$fit
Series: xdata
ARIMA(2,0,0) with zero mean
Coefficients:
ar1
ar2
0.4322 0.1971
s.e. 0.0219 0.0219

xmean
0.5043
0.0605

sigma^2 estimated as 1.008:


AIC=5700.41
AICc=5700.43

log likelihood=-2846.2
BIC=5722.81

$AIC
[1] 1.011115
$AICc
[1] 1.012125
$BIC
[1] 0.01951616
The result is consistent with:
> sarima(yt, 2, 1, 0, details = FALSE)
$fit
Series: xdata
ARIMA(2,1,0) with non-zero mean
Coefficients:
ar1
ar2
0.4322 0.1971
s.e. 0.0219 0.0219

constant
0.5038
0.0605

sigma^2 estimated as 1.008:


AIC=5700.41
AICc=5700.43
8 For

log likelihood=-2846.2
BIC=5722.81

practical purposes the order d=1 is the one most frequently occurring.

298

Univariate Time Series Models

Standardized Residuals

500

1000

1500

2000

Time

ACF of Residuals

Normal QQ Plot of Std Residuals


4
2
0
4

ACF

0.1

0.1

0.3

Sample Quantiles

10

20

30

40

50

LAG

Theoretical Quantiles

0.4

p value

0.8

p values for LjungBox statistic

0.0

10

15

20

lag

Figure 8.36 Diagnostic tools from sarima for the model ARIMA(4,0,0) applied to the
differenced time series

$AIC
[1] 1.011113
$AICc
[1] 1.012123
$BIC
[1] 0.01951124
Observe that the constant is the estimate for the coefficient defining the linear
deterministic trend.
Some diagnostic graphs for the residuals are also produced, see Figures 8.36.

Univariate Time Series Models

8.5.3

299

The Arima function in the package forecast

It is available in the package forecast. Like sarima it is essentially a wrapper for


arima. Also in this case the quantity in the output, which is named intercept
corresponds to the estimate for the mean. Some error measures for the training set
are returned.
> library(forecast)
> ar4 <- Arima(Dyt, c(2, 0, 0))
> summary(ar4)
Series: Dyt
ARIMA(2,0,0) with non-zero mean
Coefficients:
ar1
ar2
0.4322 0.1971
s.e. 0.0219 0.0219

intercept
0.5043
0.0605

sigma^2 estimated as 1.008:


AIC=5700.41
AICc=5700.43

log likelihood=-2846.2
BIC=5722.81

Training set error measures:


ME
RMSE
MAE
-7.814393e-04 1.004066e+00 8.052378e-01
MAPE
MASE
2.395949e+02 8.727309e-01

8.5.4

MPE
6.975735e+01

The armaFit function

For ARIMA models (also fractionally differenced). Pay attention: the mean of the
time series is first estimated, according also to Brockwell and Davis (1991), then other
parameters are estimated for the de-meaned series (without drift). Also in this case
the quantity in the output, which is named intercept corresponds to the estimate for
the mean. With the function armaFit in the package fArma the estimation methods
"MLE" Maximum Likelihood and "ols" Ordinary Least Squares may be used.
> library(fArma)
> ar2 <- armaFit(Dyt ~ ar(2), data = Dyt, method = "ols")
> summary(ar2)
Title:
ARIMA Modelling
Call:
armaFit(formula = Dyt ~ ar(2), data = Dyt, method = "ols")
Model:

300

Univariate Time Series Models

AR(2) with method: ols


Coefficient(s):
ar1
ar2
0.4323
0.1971
Residuals:
Min
1Q
-3.75165 -0.70435

intercept
0.5034

Median
0.01726

3Q
0.69478

Max
3.68815

Moments:
Skewness Kurtosis
0.012757 -0.004689
Coefficient(s):
Estimate Std. Error t
ar1
0.43233
0.02193
ar2
0.19710
0.02192
intercept
0.50344
0.02246
--Signif. codes: 0 "***" 0.001 "**"
sigma^2 estimated as:
AIC Criterion:

value Pr(>|t|)
19.71
<2e-16 ***
8.99
<2e-16 ***
22.41
<2e-16 ***
0.01 "*" 0.05 "." 0.1 " " 1

NULL
0

Description:
Fri May 24 17:10:56 2013 by user: gabriele.cantaluppi

8.5.5

The FitARMA function

For ARIMA models (also fractionally differenced). It uses a Fast Maximum Likelihood
estimation method as proposed by McLeod and Zhang (2008). Here the parameter
mean is indicated correctly.
> library(FitARMA)
> a <- FitARMA(Dyt, c(2, 0, 0))
> summary(a)
ARIMA(2,0,0)
length of series = 2000 , number of parameters = 3
loglikelihood = -8.33 , aic = 22.7 , bic = 39.5
> coef(a)
MLE
sd
Z-ratio
phi(1) 0.4321938 0.02192204 19.7150305
phi(2) 0.1970989 0.02192204 8.9909012
mu
0.5034386 1.18871170 0.4235161

Univariate Time Series Models

301

The Ljung-Box statistics, see Section 8.10.2, can also be extracted


> a$Ljun[1:12, ]
m
Qm
pvalue
1 0.03 0.8669619
2 0.07 0.7984524
3 1.50 0.2203072
4 3.00 0.2227739
5 3.09 0.3774278
6 4.68 0.3212406
7 5.80 0.3257347
8 7.07 0.3145735
9 7.60 0.3690962
10 9.00 0.3420269
11 9.03 0.4348361
12 9.34 0.5005130
Observe that in case of estimation of a Moving Average model by means of FitARMA
the coefficients have signes opposite to those obtained with arima:
> arima(Dyt, c(0, 0, 2))
Series: Dyt
ARIMA(0,0,2) with non-zero mean
Coefficients:
ma1
ma2
0.4229 0.3051
s.e. 0.0218 0.0191

intercept
0.5036
0.0399

sigma^2 estimated as 1.065: log likelihood=-2900.91


AIC=5809.82
AICc=5809.84
BIC=5832.23
> library(FitARMA)
> a <- FitARMA(Dyt, c(0, 0, 2))
> coef(a)
MLE
sd
Z-ratio
theta(1) -0.4228582 0.02129473 -19.85741
theta(2) -0.3050715 0.02129473 -14.32615
mu
0.5034386 0.02307363 21.81878

8.5.6

The ar function

The function ar fits an autoregressive time series model to the data, by default
selecting the complexity of the model by the Akaike Information Criterion.
> ar(Dyt, order.max = 8, aic = TRUE)

302

Univariate Time Series Models


Table 8.2

Functions for forecasting with ARMA models

package

function to estimate

function for predicting

stats

arima
sarima by Stoffer
armaFit
FitAR

predict
sarima.for
predict
predict

fArma
FitAR

Call:
ar(x = Dyt, aic = TRUE, order.max = 8)
Coefficients:
1
2
0.4320 0.1972
Order selected 2

8.5.7

sigma^2 estimated as

1.01

The arima function in the package TSA

For ARIMA models with also the presence of exogenous variables affecting the
response through a transfer function, the function arima in the package TSA can
be used.
> library(TSA)
> arima(Dyt, c(2, 0, 0))
Series: x
ARIMA(2,0,0) with non-zero mean
Coefficients:
ar1
ar2
0.4322 0.1971
s.e. 0.0219 0.0219

intercept
0.5043
0.0605

sigma^2 estimated as 1.008:


AIC=5698.41
AICc=5698.43
> detach("package:TSA")

8.6

log likelihood=-2846.2
BIC=5720.81

R functions for predicting with ARMA models

According to the function one uses to obtain the parameter estimates of an ARIMA
model there exist corresponding functions to forecast future values of the time series,
see Table 8.2.
Remember to use new.xreg with stats::arima when integrated ARIMA models
are considered.

Univariate Time Series Models

303

Examples of code

library: stats, function: predict.


The first argument of predict is an object obtained with arima, the second
argument is the number of steps-ahead to consider for the forecast.
> arimaest <- arima(Dyt, c(2, 0, 0))
> predict(arimaest, 4)
$pred
Time Series:
Start = 2002
End = 2005
Frequency = 1
[1] 0.1758992 0.2301188 0.3210887 0.3710921
$se
Time Series:
Start = 2002
End = 2005
Frequency = 1
[1] 1.004066 1.093829 1.159756 1.186845
In presence of an integrated process one has to remember to include the time
corresponding to the external regressor xreg used with arima; the argument in
predict is named newxreg and corresponds here to periods n+1, n+2, n+3, n+4.
> n <- length(yt)
> a <- arima(yt, c(2, 1, 0), xreg = 1:n)
> predict(a, 4, newxreg = 1:4 + n)
$pred
Time Series:
Start = 2002
End = 2005
Frequency = 1
[1] 1007.053 1007.283 1007.603 1007.974
$se
Time Series:
Start = 2002
End = 2005
Frequency = 1
[1] 1.004066 1.753864 2.530035 3.272484

function sarima.for in the package astsa.


The first argument is the time series to be modeled. The second argument is

304

Univariate Time Series Models

the number of steps-ahead to consider for the forecast; the three successive
arguments specify the order of the ARIMA model.
> sarima.for(Dyt, 4, 2, 0, 0)
$pred
Time Series:
Start = 2002
End = 2005
Frequency = 1
[1] 0.1758992 0.2301188 0.3210887 0.3710921
$se
Time Series:
Start = 2002
End = 2005
Frequency = 1
[1] 1.004066 1.093829 1.159756 1.186845
sarima.for produces consistent forecasts also for the original time series
without having to include the new.xreg argument corresponding to the external
regressor.
> sarima.for(yt, 4, 2, 1, 0)
$pred
Time Series:
Start = 2002
End = 2005
Frequency = 1
[1] 1007.053 1007.283 1007.603 1007.974
$se
Time Series:
Start = 2002
End = 2005
Frequency = 1
[1] 1.004066 1.753864 2.530035 3.272484

The functions predict in the packages fArma and FitAR have the same structure
of the function predict in the package stats.
> library(fArma)
> ar4 <- armaFit(Dyt ~ ar(2), data = Dyt, method = "ols")
> predict(ar4, 4)
$pred
Time Series:

Univariate Time Series Models

Dyt

1900

1920

1940

1960

1980

2000

Time

Figure 8.37

Forecasts and their confidence intervals obtained with sarima.for

Start = 2001
End = 2004
Frequency = 1
[1] 0.1746484 0.2283741 0.3188913 0.3686140
$se
Time Series:
Start = 2001
End = 2004
Frequency = 1
[1] 1.004065 1.093883 1.159847 1.186961
$out
Time Series:
Start = 2001
End = 2004
Frequency = 1

305

306

Univariate Time Series Models

2001
2002
2003
2004

Low 95
-1.7933
-1.9156
-1.9544
-1.9578

Low 80 Forecast High 80 High 95


-1.1121
0.1746 1.4614 2.1426
-1.1735
0.2284 1.6302 2.3723
-1.1675
0.3189 1.8053 2.5921
-1.1525
0.3686 1.8898 2.6950

> library(FitAR)
> a <- FitAR(Dyt, c(2))
> predict(a, 4)
$Forecasts
1
2
3
4
2000 0.1755649 0.2296387 0.3204801 0.3703991
$SDForecasts
1
2
3
4
2000 1.003959 1.093712 1.159633 1.186718
sarima.for and the method for fArma produce also graphs with the forecasts and
their confidence intervals, see Fig. 8.37.

8.7

Stock Prices and Earnings (Section 8.4.4)

Data on the ratio of the S&P composite stock price index and S&P composite earnings
over the period 18712009 (T = 139) are considered; they can be read by means of
the function readEViews, having extracted the file priceearnings.wf1 from the
compressed archive ch08.zip.
Last line (140) has to be dropped, since the corresponding observation does not exist.
The function ts(object, start, frequency) creates a multiple time series from
the columns of a table; in this case there is no need to specify the frequency since
data are annual.
> library(hexView)
> pe <- readEViews(unzip("ch08.zip", "Chapter 8/priceearnings.wf1"))
Skipping boilerplate variable
Skipping boilerplate variable
> pe <- pe[-140, ]
> pe <- ts(pe, start = 1871)
The following variables are available:

lne log earnings

lnp log price

lnpe log price to earnings

To obtain a plot of the log of the stock price and of the earnings series use the function
xyplot available in the package lattice. Remind that a multiple time series, mts,

Univariate Time Series Models

307

LNE
LNP

1900

1950

2000

Time

Figure 8.38

Log stock price and earnings, 18712009

object can be treated like a matrix so it is possible to make reference to the appropriate
columns of the object pe
> library(lattice)
> xyplot(pe[, 1:2], superpose = TRUE)

8.7.1

Dickey-Fuller test - construction

As Verbeek observes it is clear that both the log price and log earnings series are not
weakly stationary, he suggests to test if the non stationarity is due to the presence of
a stationary trend or of one or more deterministic roots.
To test for the presence of a unit root we have to consider the standard Dickey-Fuller
regression, see Verbeeks equation (8.58):
Yt = + t + 1 Yt1 + et

(8.12)

Let y be the log price series pe[,2]. The estimates of the parameters in relationship
(8.12) can be obtained by making use of the function dynlm as:

308

Univariate Time Series Models

> y <- pe[, 2]


> library(dynlm)
> summary(dynlm(d(y) ~ c(1:138) + L(y)))
Time series regression with "ts" data:
Start = 1872, End = 2009
Call:
dynlm(formula = d(y) ~ c(1:138) + L(y))
Residuals:
Min
1Q
-0.56867 -0.10424

Median
0.01858

3Q
0.11953

Max
0.34743

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.4370623 0.1647873
2.652 0.00895 **
c(1:138)
0.0017627 0.0007406
2.380 0.01870 *
L(y)
-0.0984286 0.0375499 -2.621 0.00977 **
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.1763 on 135 degrees of freedom
Multiple R-squared: 0.04883,
Adjusted R-squared: 0.03474
F-statistic: 3.465 on 2 and 135 DF, p-value: 0.03408
The Dickey-Fuller statistic is given by the t statistic for the lagged variable coefficient
(2.621): the coefficient must be different from 0 for refusing the presence of a
unit root; its t statistic has to be compared with proper critical values to assess
a conclusion.

8.7.2

Dickey-Fuller test - direct function

The result can be obtained directly by using the function ur.df available in the
package urca. The function ur.df has four parameters. The first is a time series.
With the parameter type it is possible to specify if only a constant has to be included
in model (8.12) (type="drift") or both a trend and the drift (type="trend") or
neither the drift nor the trend (type="none") have to be included. The parameter
lags specifies a number of lags for Yt to include in the regression (8.12); selectlags,
which by default is equal to "fixed", may be set to "AIC" or "BIC" for obtaining an
automatic lag selection according to the Akaike or the Bayesian Information criteria,
within the maximum number of lags specified by lags.
> library(urca)
> summary(ur.df(y, type = "trend", lags = 0))
###############################################
# Augmented Dickey-Fuller Test Unit Root Test #

Univariate Time Series Models

309

###############################################
Test regression trend

Call:
lm(formula = z.diff ~ z.lag.1 + 1 + tt)
Residuals:
Min
1Q
-0.56867 -0.10424

Median
0.01858

3Q
0.11953

Max
0.34743

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.4370623 0.1647873
2.652 0.00895 **
z.lag.1
-0.0984286 0.0375499 -2.621 0.00977 **
tt
0.0017627 0.0007406
2.380 0.01870 *
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.1763 on 135 degrees of freedom
Multiple R-squared: 0.04883,
Adjusted R-squared: 0.03474
F-statistic: 3.465 on 2 and 135 DF, p-value: 0.03408

Value of test-statistic is: -2.6213 2.8316 3.465


Critical values for test statistics:
1pct 5pct 10pct
tau3 -3.99 -3.43 -3.13
phi2 6.22 4.75 4.07
phi3 8.43 6.49 5.47
The resulting statistic (the first value in the section Value of test-statistic is:)
is the augmented Dickey-Fuller Test for unit root.
Verbeek provides tests for up to six additional lags of Yt . The ADF(1)-ADF(6)
statistics can be obtained by using the function ur.df specifying the value of lags
from 1 to 6.
> summary(ur.df(y, type = "trend", lags = 1))
###############################################
# Augmented Dickey-Fuller Test Unit Root Test #
###############################################
Test regression trend

310

Univariate Time Series Models

Call:
lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)
Residuals:
Min
1Q
-0.54915 -0.10718

Median
0.01243

3Q
0.11533

Max
0.32751

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.468720
0.169427
2.767 0.00647 **
z.lag.1
-0.106162
0.038692 -2.744 0.00691 **
tt
0.001897
0.000760
2.496 0.01376 *
z.diff.lag
0.077647
0.086853
0.894 0.37293
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.1771 on 133 degrees of freedom
Multiple R-squared: 0.05456,
Adjusted R-squared: 0.03324
F-statistic: 2.559 on 3 and 133 DF, p-value: 0.05781

Value of test-statistic is: -2.7438 3.0122 3.7954


Critical values for test statistics:
1pct 5pct 10pct
tau3 -3.99 -3.43 -3.13
phi2 6.22 4.75 4.07
phi3 8.43 6.49 5.47
> summary(ur.df(y, type = "trend", lags = 2))
###############################################
# Augmented Dickey-Fuller Test Unit Root Test #
###############################################
Test regression trend

Call:
lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)
Residuals:
Min
1Q
-0.57092 -0.10709

Median
0.01747

3Q
0.12584

Max
0.38166

Coefficients:
(Intercept)

Estimate Std. Error t value Pr(>|t|)


0.4050820 0.1746234
2.320
0.0219 *

Univariate Time Series Models

z.lag.1
-0.0907927
tt
0.0016462
z.diff.lag1 0.0766940
z.diff.lag2 -0.1370071
--Signif. codes: 0 "***"

0.0399401
0.0007765
0.0867307
0.0906890

-2.273
2.120
0.884
-1.511

311

0.0246 *
0.0359 *
0.3782
0.1333

0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1

Residual standard error: 0.1768 on 131 degrees of freedom


Multiple R-squared: 0.07057,
Adjusted R-squared: 0.04219
F-statistic: 2.487 on 4 and 131 DF, p-value: 0.04657

Value of test-statistic is: -2.2732 2.4769 2.6283


Critical values for test statistics:
1pct 5pct 10pct
tau3 -3.99 -3.43 -3.13
phi2 6.22 4.75 4.07
phi3 8.43 6.49 5.47
> summary(ur.df(y, type = "trend", lags = 3))
###############################################
# Augmented Dickey-Fuller Test Unit Root Test #
###############################################
Test regression trend

Call:
lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)
Residuals:
Min
1Q
-0.61682 -0.11229

Median
0.01993

3Q
0.11150

Max
0.39296

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.4710862 0.1780298
2.646 0.00916 **
z.lag.1
-0.1067164 0.0407667 -2.618 0.00991 **
tt
0.0018926 0.0007863
2.407 0.01750 *
z.diff.lag1 0.1131589 0.0887265
1.275 0.20447
z.diff.lag2 -0.1370854 0.0902885 -1.518 0.13138
z.diff.lag3 0.1613740 0.0910693
1.772 0.07876 .
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.1761 on 129 degrees of freedom

312

Univariate Time Series Models

Multiple R-squared: 0.09282,


F-statistic: 2.64 on 5 and 129 DF,

Adjusted R-squared:
p-value: 0.02625

0.05765

Value of test-statistic is: -2.6177 2.8556 3.4626


Critical values for test statistics:
1pct 5pct 10pct
tau3 -3.99 -3.43 -3.13
phi2 6.22 4.75 4.07
phi3 8.43 6.49 5.47
> summary(ur.df(y, type = "trend", lags = 4))
###############################################
# Augmented Dickey-Fuller Test Unit Root Test #
###############################################
Test regression trend

Call:
lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)
Residuals:
Min
1Q
-0.59392 -0.10608

Median
0.02363

3Q
0.10998

Max
0.35148

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.4221223 0.1843159
2.290
0.0237 *
z.lag.1
-0.0952906 0.0422614 -2.255
0.0259 *
tt
0.0017278 0.0008067
2.142
0.0341 *
z.diff.lag1 0.1168995 0.0891027
1.312
0.1919
z.diff.lag2 -0.1617383 0.0935295 -1.729
0.0862 .
z.diff.lag3 0.1631230 0.0913932
1.785
0.0767 .
z.diff.lag4 -0.0994732 0.0926253 -1.074
0.2849
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.1766 on 127 degrees of freedom
Multiple R-squared: 0.1009,
Adjusted R-squared: 0.05844
F-statistic: 2.376 on 6 and 127 DF, p-value: 0.03292

Value of test-statistic is: -2.2548 2.4539 2.6072


Critical values for test statistics:

Univariate Time Series Models

313

1pct 5pct 10pct


tau3 -3.99 -3.43 -3.13
phi2 6.22 4.75 4.07
phi3 8.43 6.49 5.47
> summary(ur.df(y, type = "trend", lags = 5))
###############################################
# Augmented Dickey-Fuller Test Unit Root Test #
###############################################
Test regression trend

Call:
lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)
Residuals:
Min
1Q
-0.59366 -0.10540

Median
0.02747

3Q
0.11002

Max
0.34461

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.4220206 0.1890821
2.232
0.0274 *
z.lag.1
-0.0934907 0.0434077 -2.154
0.0332 *
tt
0.0016215 0.0008219
1.973
0.0507 .
z.diff.lag1 0.1141232 0.0910061
1.254
0.2122
z.diff.lag2 -0.1582504 0.0937244 -1.688
0.0938 .
z.diff.lag3 0.1553505 0.0946890
1.641
0.1034
z.diff.lag4 -0.0971210 0.0927473 -1.047
0.2970
z.diff.lag5 -0.0172506 0.0931058 -0.185
0.8533
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.1767 on 125 degrees of freedom
Multiple R-squared: 0.1005,
Adjusted R-squared: 0.05009
F-statistic: 1.994 on 7 and 125 DF, p-value: 0.06091

Value of test-statistic is: -2.1538 2.4698 2.3375


Critical values for test statistics:
1pct 5pct 10pct
tau3 -3.99 -3.43 -3.13
phi2 6.22 4.75 4.07
phi3 8.43 6.49 5.47
> summary(ur.df(y, type = "trend", lags = 6))

314

Univariate Time Series Models

###############################################
# Augmented Dickey-Fuller Test Unit Root Test #
###############################################
Test regression trend

Call:
lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)
Residuals:
Min
1Q
-0.60505 -0.09813

Median
0.02635

3Q
0.10937

Max
0.36751

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.4665094 0.1942144
2.402
0.0178 *
z.lag.1
-0.1045930 0.0446025 -2.345
0.0206 *
tt
0.0018028 0.0008365
2.155
0.0331 *
z.diff.lag1 0.1306554 0.0923672
1.415
0.1597
z.diff.lag2 -0.1370080 0.0958230 -1.430
0.1553
z.diff.lag3 0.1501264 0.0949723
1.581
0.1165
z.diff.lag4 -0.0682406 0.0960788 -0.710
0.4789
z.diff.lag5 -0.0208425 0.0933334 -0.223
0.8237
z.diff.lag6 0.1097285 0.0933387
1.176
0.2420
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.1771 on 123 degrees of freedom
Multiple R-squared: 0.1107,
Adjusted R-squared: 0.05282
F-statistic: 1.913 on 8 and 123 DF, p-value: 0.06374

Value of test-statistic is: -2.345 2.5898 2.7733


Critical values for test statistics:
1pct 5pct 10pct
tau3 -3.99 -3.43 -3.13
phi2 6.22 4.75 4.07
phi3 8.43 6.49 5.47
Observe that the function adf.test is also available in the package tseries; but this
function incorporates both the drift and the trend in the regression, by default. Thus
tseries::adf.test(y,k=k) is equivalent to ur.df(y,type="trend",lags=k).
> library(tseries)
> adf.test(y, k = 0)

Univariate Time Series Models

315

Augmented Dickey-Fuller Test


data: y
Dickey-Fuller = -2.6213, Lag order = 0, p-value = 0.3179
alternative hypothesis: stationary

8.7.3

How to produce the Dickey-Fuller statistic for different lags

It is possible to create a function that simplifies the code writing, without repeating
7 times the same command. Observe that the function ur.df uses classes of type S4,
so one has to use the @, and not the $ sign, to extract a slot9 from an object of class
S4 produced by ur.df, see str(ur.df(y,type="drift",lags=6)).
> f <- function(x) {
urtest <- ur.df(y, type = "trend", lags = x)
c(stat = urtest@teststat[1], "5% crit. value" = urtest@cval[1,
2])
}
> a <- 0:6
> names(a) <- c("DF", paste("ADF(", 1:6, ")", sep = ""))
> round(sapply(a, f), 3)
DF ADF(1) ADF(2) ADF(3) ADF(4) ADF(5) ADF(6)
stat
-2.621 -2.744 -2.273 -2.618 -2.255 -2.154 -2.345
5% crit. value -3.430 -3.430 -3.430 -3.430 -3.430 -3.430 -3.430
The function f with argument x is defined, which extracts from the object resulting
from ur.df applied to the time series y, with type="trend" and lags=x, the first
element of the slot teststat, which is the Dickey-Fuller statistic, and the element in
the first row, second column of the slot cval, which is the 5% critical value (see also
the Critical values for test statistics section in the preceding outputs).
The variable a contains the desired lags for the unit root test.
The names DF and ADF(1) to ADF(6) are assigned to the elements of a.
The function sapply is finally used to call the function f for the different values of
the lags in the array a.

8.7.4

Other tests for unit roots detection

None of the preceding tests implies a rejection of the null hypothesis of unit root.
Verbeek suggests to use the Phillips-Perron and the KPSS tests10 for unit root, the
tests can be obtained with the functions ur.pp and ur.kpss available in the package
urca.
The arguments of the function ur.pp are the time series x to be tested for a unit root,
the type, which can be "Z-alpha" or "Z-tau"; the model, with values "constant"
9 Elements

of objects belonging to an S4 class are named slots.


that the null hypothesis fo the KPSS test is stationarity or trend stationarity.

10 Remember

316

Univariate Time Series Models

or "trend", determining the deterministic part in the test regression, and lags,
specifying the lags used for correction of error term, which can be "short" or "long";
an exact number of lags can be specified with the argument use.lag. The output has
a structure similar to that of ur.df. See the help ?ur.pp for more information.
The arguments of the function kpss.pp are the time series x to be tested for a unit
root, the type, which can be "mu" or "tau"; lags, specicying the maximum number
of lags used for correction of error term, which can be "short", "long", or "nil".
An exact number of lags can be specified with the argument use.lag. The output
has a structure similar to that of ur.df. Only the version with Bartlett weights is
implemented. See the help ?ur.kpss for more information.
> summary(ur.pp(y, type = "Z-tau", model = "trend",
use.lag = 6))
##################################
# Phillips-Perron Unit Root Test #
##################################
Test regression with intercept and trend

Call:
lm(formula = y ~ y.l1 + trend)
Residuals:
Min
1Q
-0.56867 -0.10424

Median
0.01858

3Q
0.11953

Max
0.34743

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.5586889 0.2065375
2.705 0.00771 **
y.l1
0.9015714 0.0375499 24.010 < 2e-16 ***
trend
0.0017627 0.0007406
2.380 0.01870 *
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.1763 on 135 degrees of freedom
Multiple R-squared: 0.9512,
Adjusted R-squared: 0.9504
F-statistic: 1315 on 2 and 135 DF, p-value: < 2.2e-16

Value of test-statistic, type: Z-tau

Z-tau-mu
Z-tau-beta

aux. Z statistics
2.5749
2.4219

Critical values for Z statistics:

is: -2.6634

Univariate Time Series Models

317

1pct
5pct
10pct
critical values -4.02682 -3.442804 -3.14582
> summary(ur.kpss(y, type = "tau", use.lag = 6))
#######################
# KPSS Unit Root Test #
#######################
Test is of type: tau with 6 lags.
Value of test-statistic is: 0.2233
Critical value for a significance level of:
10pct 5pct 2.5pct 1pct
critical values 0.119 0.146 0.176 0.216
The KPSS statistic 0.2233 is larger than the critical value 0.146 thus rejecting trend
stationarity in favour of a unit root.
The KPSS statistic can also be obtained by means of the function kpss.test available
in the package tseries but the function does not allow an exact lag to be specified.

8.7.5

Testing for multiple unitary roots

By imposing a first unit root it is possible to test for the presence of a second unit
root with regressions of the form, see Verbeek p. 298:
2 Yt = + Yt1 + c1 2 Yt1 + . . . + t

(8.13)

the null hypothesis corresponds to = 0.


We can define a function g computing the Dickey-Fuller statistic with type drift
since as Verbeek observes it seems unlikely that stock returns exhibit a deterministic
trend for the differenced series diff(y) and then use the function sapply to obtain
the results of the function g applied to the different values of the lags in the array a,
defined above. (The 5% critical value is the same as before).
> g <- function(x) summary(ur.df(diff(y), type = "drift",
lags = x))@teststat[1]
> (adf <- round(sapply(a, g), 3))
DF ADF(1) ADF(2) ADF(3) ADF(4) ADF(5) ADF(6)
-11.289 -9.602 -6.558 -6.522 -5.997 -4.921 -4.106
The ADF(6) value strongly rejects the unit root bypothesis. The KPSS test with
Bartlett weights results.
> summary(ur.kpss(diff(y), type = "mu", use.lag = 6))
#######################
# KPSS Unit Root Test #
#######################

318

Univariate Time Series Models

Test is of type: mu with 6 lags.


Value of test-statistic is: 0.054
Critical value for a significance level of:
10pct 5pct 2.5pct 1pct
critical values 0.347 0.463 0.574 0.739
The KPSS statistics 0.054 is lower that the 5% critical value 0.463 also indicating
that the first-differenced price series is likely to be stationary.
With regard to the log earnings series we have for the ADF(6) statistic:
> y <- pe[, 1]
> summary(ur.df(y, type = "trend", lags = 6))
###############################################
# Augmented Dickey-Fuller Test Unit Root Test #
###############################################
Test regression trend

Call:
lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)
Residuals:
Min
1Q
-0.99802 -0.10972

Median
0.03045

3Q
0.13696

Max
0.50098

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.502842
0.177367
2.835 0.00536 **
z.lag.1
-0.269302
0.096935 -2.778 0.00632 **
tt
0.004042
0.001498
2.698 0.00796 **
z.diff.lag1 0.057546
0.115829
0.497 0.62021
z.diff.lag2 -0.122193
0.109751 -1.113 0.26772
z.diff.lag3 -0.034512
0.105715 -0.326 0.74463
z.diff.lag4 -0.085099
0.102001 -0.834 0.40573
z.diff.lag5 -0.267248
0.095763 -2.791 0.00610 **
z.diff.lag6 0.049989
0.097278
0.514 0.60826
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.2292 on 123 degrees of freedom
Multiple R-squared: 0.2382,
Adjusted R-squared: 0.1887
F-statistic: 4.808 on 8 and 123 DF, p-value: 3.644e-05

Univariate Time Series Models

319

Value of test-statistic is: -2.7782 3.5752 3.8965


Critical values for test statistics:
1pct 5pct 10pct
tau3 -3.99 -3.43 -3.13
phi2 6.22 4.75 4.07
phi3 8.43 6.49 5.47
the value 2.7782 does not reject the presence of a unit root; for the KPSS(6) statistic
we have:
> summary(ur.kpss(y, type = "tau", use.lag = 6))
#######################
# KPSS Unit Root Test #
#######################
Test is of type: tau with 6 lags.
Value of test-statistic is: 0.1576
Critical value for a significance level of:
10pct 5pct 2.5pct 1pct
critical values 0.119 0.146 0.176 0.216
marginally rejecting the trend stationarity at the 5% level. The Phillips Perron
statistic results 4.908 clearly rejecting the unit root hypothesis:
> summary(ur.pp(y, type = "Z-tau", model = "trend",
use.lag = 6))
##################################
# Phillips-Perron Unit Root Test #
##################################
Test regression with intercept and trend

Call:
lm(formula = y ~ y.l1 + trend)
Residuals:
Min
1Q
-1.04265 -0.11320

Median
0.04181

3Q
0.13670

Max
0.54002

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.929064
0.180139
5.157 8.72e-07 ***
y.l1
0.676123
0.063411 10.662 < 2e-16 ***

320

Univariate Time Series Models

trend
0.004739
0.001050
4.515 1.37e-05 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.2319 on 135 degrees of freedom
Multiple R-squared: 0.8791,
Adjusted R-squared: 0.8773
F-statistic: 490.9 on 2 and 135 DF, p-value: < 2.2e-16

Value of test-statistic, type: Z-tau

Z-tau-mu
Z-tau-beta

is: -4.908

aux. Z statistics
5.9746
4.3123

Critical values for Z statistics:


1pct
5pct
10pct
critical values -4.02682 -3.442804 -3.14582
Verbeek then analises the log of the price/earnings ratio. To obtain a graphical
representation of the series, see Fig. 8.39, use:
> library(lattice)
> xyplot(pe[, 3])
Verbeek observes that the series seems to fluctuate around a long-run average but the
mean reverting seems to need several yars to happen. For this series we have ADF(0)
and ADF(6) statistics without trend respectively of 4.424 and 2.208 the first one
rejecting the presence of a unit root.
> y <- pe[, 3]
> ur.df(y, type = "drift", lags = 0)
###############################################################
# Augmented Dickey-Fuller Test Unit Root / Cointegration Test #
###############################################################
The value of the test statistic is: -4.424 9.8065
> ur.df(y, type = "drift", lags = 6)
###############################################################
# Augmented Dickey-Fuller Test Unit Root / Cointegration Test #
###############################################################
The value of the test statistic is: -2.2078 2.567
Verbeek observes that by considering the presence of a trend the situation does not
make much difference; namely we have:
> ur.df(y, type = "trend", lags = 0)

321

2.0

2.5

3.0

3.5

Univariate Time Series Models

1900

1950

2000

Time

Figure 8.39

Log price/earnings ratio, 18712009

###############################################################
# Augmented Dickey-Fuller Test Unit Root / Cointegration Test #
###############################################################
The value of the test statistic is: -4.6422 7.2047 10.7863
> ur.df(y, type = "trend", lags = 6)
###############################################################
# Augmented Dickey-Fuller Test Unit Root / Cointegration Test #
###############################################################
The value of the test statistic is: -2.3615 1.9641 2.8168
The KPSS(6) statistic with error correction using the Bartlett kernel results 0.331
and does not reject the null of no unit root:
> summary(ur.kpss(y, type = "mu", use.lag = 6))
#######################
# KPSS Unit Root Test #

322

Univariate Time Series Models

#######################
Test is of type: mu with 6 lags.
Value of test-statistic is: 0.3308
Critical value for a significance level of:
10pct 5pct 2.5pct 1pct
critical values 0.347 0.463 0.574 0.739
The conclusion is that data are not sufficiently informative to distinguish between the
two hypotheses; and mean reversion, if present, is very slow.

8.8

Some remarks on the function ur.df

As the reader has surely observed, the function ur.df (with the options none, drift
and trend) produces respectively from 1 to 3 statistics named tau1/tau2/tau3,
phi1/phi2 and phi3. We now consider the hypotheses underlying those test statistics.
No autocorrelations for the differenced series in the augmented Dickey-Fuller test are
taken into account for simplicity.

8.8.1

The Dickey-Fuller test for a unit root, type "none"

The following process is considered:



Yt ut
ut = ut1 + t ,

(8.14)

by substituting the first relationship in the latter one we have an AR(1) process.
Note that if = 1 we have a random walk
Yt = Yt1 + t .
The augmented Dickey-Fuller test consists in testing the null 1 = 0 in the following
model
Yt = ( 1)Yt1 + t .
In the ur.df output will appear only one statistic: tau1, which corresponds to the
hypothesis:

H0 : random walk without drift
H1 : stationary AR(1) without drift
1%, 5% and 10% critical values for the tau1 statistic are reported in the output. The
null hypothesis of unit root will be rejected at a given critical value if the DickeyFuller statistic is lower than the corresponding critical value. This happens when
the statistic is negative and very far from 0 which gives evidence that the process is
stationary.

Univariate Time Series Models

8.8.2

323

Dickey-Fuller test for a unit root, type drift

The following process is considered:



Yt = + ut
.
ut = ut1 + t

(8.15)

Substitute ut = Yt in the second relationship to obtain an AR(1) process with


drift:
Yt = (1 ) + Yt1 + t .
(8.16)
Note that if = 1 the last relationship becomes
Yt = Yt1 + t
that is an ARIMA(0,1,0) without drift (a random walk).
The augmented Dickey-Fuller test consists in testing the null 1 = 0 in the following
model
Yt = a + ( 1)Yt1 + t
(8.17)
The test is named tau2 in the ur.df output and corresponds to the following set of
hypotheses

H0 : random walk, see (8.16)
H1 : stationary AR(1) with drift, see (8.15) and (8.16).
1%, 5% and 10% critical values for the tau2 statistic are reported in the output. The
null hypothesis of unit root will be rejected at a given critical value if the DickeyFuller statistic is lower than the corresponding critical value. This happens when
the statistic is negative and very far from 0 which gives evidence that the process is
stationary.
Another test, named phi1, regards the null H0 (a = ( 1) = 0) (that is if Yt is a
random walk without drift), so corresponding to the following hypothesis

H0 : random walk without drift
H1 : stationary AR(1) without drift or random walk with drift;
the second option under H1 is described by the presence of a linear trend and clearly
requires a unit root test with type trend; the behaviour of such a time series can be
clearly detected from its graphical representation, see Section 8.4.2.

8.8.3

Dickey-Fuller test for a unit root, type trend

The following process is considered:



Yt = + t + ut
.
ut = ut1 + t

(8.18)

that is we have a linear deterministic trend with the presence of an autocorrelated


error; if < 1 {Yt } is said trend stationary. Note that {Yt } is not an ARIMA process.
Substitute ut = Yt t in the second relationship to obtain:
Yt = [ (1 ) + ] + [ (1 )]t + Yt1 + t .

(8.19)

324

Univariate Time Series Models

Note that if = 1 the last relationship becomes


Yt = + Yt1 + t

(8.20)

that is an ARIMA(0,1,0) (a random walk) with drift, and {Yt } is said difference
stationary; namely, it results
Yt = + t
The augmented Dickey-Fuller test consists in testing the null 1 = 0 in the following
model
Yt = a + bt + ( 1)Yt1 + t
The test is named tau3 in the ur.df output and corresponds to the following set of
hypotheses

H0 : difference stationary (random walk with drift)
H1 : trend stationary
1%, 5% and 10% critical values for the tau3 statistic are reported in the output. The
null hypothesis of unit root will be rejected at a given critical value if the DickeyFuller statistic is lower than the corresponding critical value. This happens when the
statistic is negative and very far from 0 which gives evidence that the process is trend
stationary.
Other tests, say phi2 and phi3, regard the null hypotheses:

(a = b = ( 1) = 0) (Yt is a random walk without drift rather trend stationary)

(b = ( 1) = 0) (Yt is a random walk with drift)

A parabolic behaviour can be also described under the alternative hypotheses


(when b 6= 0 and 1 = 0) but this would be clearly visible from the graphical
representation of the time series, requiring to verify the presence of a second unit
root, see Section 8.4.3.

8.8.4

Example

We simulate a realization from process (8.18), with n = 1000, = 0.4, = 0.5 and
= 0.8.
>
>
>
>
>
>
>
>
>

set.seed(456)
n <- 1000
alpha <- 0.4
gamma <- 0.5
theta <- 0.8
epsilon <- rnorm(n)
ut <- arima.sim(model <- list(ar = theta), n, innov = epsilon)
t <- 1:n
yt <- alpha + gamma * t + ut

Figure 8.40 shows the first 200 observations of the simulated time series.

325

20

40

60

80

100

Univariate Time Series Models

50

100

150

200

Time

Figure 8.40

Simulation of a trend stationary series, see (8.18)

> library(lattice)
> xyplot(ts(cbind(yt[1:200], seq(1, 100, length = 200))),
superpose = TRUE, auto.key = FALSE)
Figure 8.41 shows 200 observations from the simulation of a difference stationary time
series.
> library(lattice)
> xyplot(ts(cbind(cumsum(gamma + epsilon[1:200]), seq(1,
100, length = 200))), superpose = TRUE, auto.key = FALSE)
We now estimate model (8.19) by making use of dynlm
Yt = 0 + 0 t + Yt1 + t

> library(dynlm)
> summary(dynlm(yt ~ t + L(yt, 1)))

Univariate Time Series Models

20

40

60

80

100

326

50

100

150

200

Time

Figure 8.41

Simulation of a difference stationary series, see (8.20)

Time series regression with "ts" data:


Start = 2, End = 1000
Call:
dynlm(formula = yt ~ t + L(yt, 1))
Residuals:
Min
1Q
-3.0157 -0.6895

Median
0.0007

3Q
0.7159

Max
3.0052

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.565388
0.062490
9.048
<2e-16 ***
t
0.099609
0.009481 10.507
<2e-16 ***
L(yt, 1)
0.800671
0.018970 42.207
<2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1

Univariate Time Series Models

327

Residual standard error: 0.9812 on 996 degrees of freedom


Multiple R-squared:
1,
Adjusted R-squared:
1
F-statistic: 1.078e+07 on 2 and 996 DF, p-value: < 2.2e-16
> alpha * (1 - theta) + theta * gamma
[1] 0.48
> gamma * (1 - theta)
[1] 0.1
> theta
[1] 0.8
Let us now check the presence of a unit root by using the augmented Dickey-Fuller
test, by introducing, only for completeness, also an autocorrelation lag of order 1.
Only for checking the behaviour of the function ur.df the options "none" and drift
are considered. So in these instances conclusions on the unit root presence can be
misleading since the series clearly shows the presence of a trend, see Fig. 8.40.
none
We can only test for the presence of a unit root
> library(dynlm)
> summary(dynlm(d(yt) ~ -1 + L(yt, 1) + L(d(yt), 1)))
Time series regression with "ts" data:
Start = 3, End = 1000
Call:
dynlm(formula = d(yt) ~ -1 + L(yt, 1) + L(d(yt), 1))
Residuals:
Min
1Q
-3.0465 -0.6226

Median
0.1114

3Q
0.8558

Max
3.0957

Coefficients:
Estimate Std. Error t value Pr(>|t|)
L(yt, 1)
0.0015126 0.0001259 12.015
<2e-16 ***
L(d(yt), 1) -0.0056115 0.0317110 -0.177
0.86
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 1.064 on 996 degrees of freedom
Multiple R-squared: 0.1435,
Adjusted R-squared: 0.1418
F-statistic: 83.46 on 2 and 996 DF, p-value: < 2.2e-16
The Dickey-Fuller statistic corresponds to the t-value pertaining L(yt, 1):

328

Univariate Time Series Models

> library(urca)
> summary(ur.df(yt, type = "none", lags = 1))
###############################################
# Augmented Dickey-Fuller Test Unit Root Test #
###############################################
Test regression none

Call:
lm(formula = z.diff ~ z.lag.1 - 1 + z.diff.lag)
Residuals:
Min
1Q
-3.0465 -0.6226

Median
0.1114

3Q
0.8558

Max
3.0957

Coefficients:
Estimate Std. Error t value Pr(>|t|)
z.lag.1
0.0015126 0.0001259 12.015
<2e-16 ***
z.diff.lag -0.0056115 0.0317110 -0.177
0.86
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 1.064 on 996 degrees of freedom
Multiple R-squared: 0.1435,
Adjusted R-squared: 0.1418
F-statistic: 83.46 on 2 and 996 DF, p-value: < 2.2e-16

Value of test-statistic is: 12.0145


Critical values for test statistics:
1pct 5pct 10pct
tau1 -2.58 -1.95 -1.62

drift
We first test for the presence of a unit root
> summary(dynlm(d(yt) ~ L(yt, 1) + L(d(yt), 1)))
Time series regression with "ts" data:
Start = 3, End = 1000
Call:
dynlm(formula = d(yt) ~ L(yt, 1) + L(d(yt), 1))
Residuals:

Univariate Time Series Models

Min
1Q
-3.1111 -0.7205

Median
0.0281

3Q
0.7169

329

Max
3.1238

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.400e-01 6.752e-02
7.997 3.52e-15 ***
L(yt, 1)
-1.666e-05 2.269e-04 -0.073
0.9415
L(d(yt), 1) -6.498e-02 3.164e-02 -2.054
0.0403 *
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 1.032 on 995 degrees of freedom
Multiple R-squared: 0.004227,
Adjusted R-squared:
F-statistic: 2.112 on 2 and 995 DF, p-value: 0.1215

0.002226

The Dickey-Fuller statistic corresponds to the t-value pertaining L(yt, 1) and


appears (tau2) at the first position in the ur.df output at position Value of the
test statistic is:.
Now we test the null hypothesis = 1 = 0
> anova(dynlm(d(yt) ~ L(yt, 1) + L(d(yt), 1)), dynlm(d(yt) ~
-1 + L(d(yt), 1)))
Analysis of Variance Table
Model 1: d(yt) ~ L(yt, 1) + L(d(yt), 1)
Model 2: d(yt) ~ -1 + L(d(yt), 1)
Res.Df
RSS Df Sum of Sq
F
Pr(>F)
1
995 1059.6
2
997 1291.2 -2
-231.54 108.71 < 2.2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
The F test appears (phi1) at the second position in the ur.df output at position
Value of the test statistic is:
> ur.df(yt, type = "drift", lags = 1)
###############################################################
# Augmented Dickey-Fuller Test Unit Root / Cointegration Test #
###############################################################
The value of the test statistic is: -0.0734 108.7121
> summary(ur.df(yt, type = "drift", lags = 1))
###############################################
# Augmented Dickey-Fuller Test Unit Root Test #
###############################################

330

Univariate Time Series Models

Test regression drift

Call:
lm(formula = z.diff ~ z.lag.1 + 1 + z.diff.lag)
Residuals:
Min
1Q
-3.1111 -0.7205

Median
0.0281

3Q
0.7169

Max
3.1238

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.400e-01 6.752e-02
7.997 3.52e-15 ***
z.lag.1
-1.666e-05 2.269e-04 -0.073
0.9415
z.diff.lag -6.498e-02 3.164e-02 -2.054
0.0403 *
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 1.032 on 995 degrees of freedom
Multiple R-squared: 0.004227,
Adjusted R-squared:
F-statistic: 2.112 on 2 and 995 DF, p-value: 0.1215

Value of test-statistic is: -0.0734 108.7121


Critical values for test statistics:
1pct 5pct 10pct
tau2 -3.43 -2.86 -2.57
phi1 6.43 4.59 3.78

trend
We first test for the presence of a unit root
> t <- t[-n]
> summary(dynlm(d(yt) ~ t + L(yt, 1) + L(d(yt), 1)))
Time series regression with "ts" data:
Start = 3, End = 1000
Call:
dynlm(formula = d(yt) ~ t + L(yt, 1) + L(d(yt), 1))
Residuals:
Min
1Q
-2.99426 -0.68188

Median
0.00284

3Q
0.71159

Max
3.03239

0.002226

Univariate Time Series Models

331

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.649385
0.065077
9.979
<2e-16 ***
t
0.103120
0.009998 10.314
<2e-16 ***
L(yt, 1)
-0.206347
0.020006 -10.314
<2e-16 ***
L(d(yt), 1) 0.037754
0.031690
1.191
0.234
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.9813 on 994 degrees of freedom
Multiple R-squared: 0.1005,
Adjusted R-squared: 0.09778
F-statistic: 37.02 on 3 and 994 DF, p-value: < 2.2e-16
The Dickey-Fuller statistic corresponds to the t-value pertaining L(yt, 1) and
appears (tau3) at the first position in the ur.df output at position Value of the
test statistic is:
Now we test the null hypothesis = = 1 = 0
> anova(dynlm(d(yt) ~ t + L(yt, 1) + L(d(yt), 1)),
dynlm(d(yt) ~ -1 + L(d(yt), 1)))
Analysis of Variance Table
Model 1: d(yt) ~ t + L(yt, 1) + L(d(yt), 1)
Model 2: d(yt) ~ -1 + L(d(yt), 1)
Res.Df
RSS Df Sum of Sq
F
Pr(>F)
1
994 957.17
2
997 1291.15 -3
-333.98 115.61 < 2.2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
The F test will appear (phi2) at the second position in the ur.df output at position
Value of the test statistic is:.
We finally test the null hypothesis = 1 = 0
> anova(dynlm(d(yt) ~ t + L(yt, 1) + L(d(yt), 1)),
dynlm(d(yt) ~ 1 + L(d(yt), 1)))
Analysis of Variance Table
Model 1: d(yt) ~ t + L(yt, 1)
Model 2: d(yt) ~ 1 + L(d(yt),
Res.Df
RSS Df Sum of Sq
1
994 957.17
2
996 1059.61 -2
-102.44
--Signif. codes: 0 "***" 0.001

+ L(d(yt), 1)
1)
F
Pr(>F)
53.193 < 2.2e-16 ***
"**" 0.01 "*" 0.05 "." 0.1 " " 1

The F test appears (phi3) at the third position in the ur.df output at position Value
of the test statistic is:.

332

Univariate Time Series Models

> ur.df(yt, type = "trend", lags = 1)


###############################################################
# Augmented Dickey-Fuller Test Unit Root / Cointegration Test #
###############################################################
The value of the test statistic is: -10.3143 115.6103 53.1927

> summary(ur.df(yt, type = "trend", lags = 1))


###############################################
# Augmented Dickey-Fuller Test Unit Root Test #
###############################################
Test regression trend

Call:
lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)
Residuals:
Min
1Q
-2.99426 -0.68188

Median
0.00284

3Q
0.71159

Max
3.03239

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.649385
0.065077
9.979
<2e-16 ***
z.lag.1
-0.206347
0.020006 -10.314
<2e-16 ***
tt
0.103120
0.009998 10.314
<2e-16 ***
z.diff.lag
0.037754
0.031690
1.191
0.234
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.9813 on 994 degrees of freedom
Multiple R-squared: 0.1005,
Adjusted R-squared: 0.09778
F-statistic: 37.02 on 3 and 994 DF, p-value: < 2.2e-16

Value of test-statistic is: -10.3143 115.6103 53.1927


Critical values for test statistics:
1pct 5pct 10pct
tau3 -3.96 -3.41 -3.12
phi2 6.09 4.68 4.03
phi3 8.27 6.25 5.34

Univariate Time Series Models

8.8.5

Exercise

Lets now consider an AR(1) with a unit root


>
>
>
>
>
>

set.seed(456)
n <- 1000
alpha <- 0.4
gamma <- 0.5
theta1 <- 0.8
ut <- arima.sim(model <- list(order = c(1, 1, 0),
ar = theta1), n)
> t <- ts(0:n)
> yt <- alpha + gamma * t + ut
Testing for a unit root
> summary(dynlm(d(yt) ~ t + L(yt, 1) + L(d(yt), 1)))
Time series regression with "ts" data:
Start = 3, End = 1001
Call:
dynlm(formula = d(yt) ~ t + L(yt, 1) + L(d(yt), 1))
Residuals:
Min
1Q
Median
-2.91494 -0.68226 -0.00539

3Q
0.70942

Max
3.09277

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.1487382 0.0652815
2.278
0.0229 *
t
0.0008830 0.0005898
1.497
0.1347
L(yt, 1)
-0.0011549 0.0007256 -1.592
0.1118
L(d(yt), 1) 0.7975308 0.0191505 41.645
<2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.9806 on 995 degrees of freedom
Multiple R-squared: 0.6359,
Adjusted R-squared: 0.6348
F-statistic: 579.3 on 3 and 995 DF, p-value: < 2.2e-16
Testing for = = 1 = 0
> anova(dynlm(d(yt) ~ t + L(yt, 1) + L(d(yt), 1)),
dynlm(d(yt) ~ -1 + L(d(yt), 1)))
Analysis of Variance Table
Model 1: d(yt) ~ t + L(yt, 1) + L(d(yt), 1)
Model 2: d(yt) ~ -1 + L(d(yt), 1)

333

334

Univariate Time Series Models

Res.Df
RSS Df Sum of Sq
F
Pr(>F)
1
995 956.70
2
998 977.21 -3
-20.511 7.1106 9.956e-05 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Testing for = 1 = 0
> anova(dynlm(d(yt) ~ t + L(yt, 1) + L(d(yt), 1)),
dynlm(d(yt) ~ 1 + L(d(yt), 1)))
Analysis of Variance Table
Model 1: d(yt) ~ t + L(yt, 1) + L(d(yt), 1)
Model 2: d(yt) ~ 1 + L(d(yt), 1)
Res.Df
RSS Df Sum of Sq
F Pr(>F)
1
995 956.70
2
997 959.27 -2
-2.5686 1.3357 0.2634
> summary(ur.df(yt, type = "trend", lags = 1))
###############################################
# Augmented Dickey-Fuller Test Unit Root Test #
###############################################
Test regression trend

Call:
lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)
Residuals:
Min
1Q
Median
-2.91494 -0.68226 -0.00539

3Q
0.70942

Max
3.09277

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.1487382 0.0652815
2.278
0.0229 *
z.lag.1
-0.0011549 0.0007256 -1.592
0.1118
tt
0.0008830 0.0005898
1.497
0.1347
z.diff.lag
0.7975308 0.0191505 41.645
<2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.9806 on 995 degrees of freedom
Multiple R-squared: 0.6359,
Adjusted R-squared: 0.6348
F-statistic: 579.3 on 3 and 995 DF, p-value: < 2.2e-16

Value of test-statistic is: -1.5916 7.1106 1.3357

Univariate Time Series Models

Critical values for test statistics:


1pct 5pct 10pct
tau3 -3.96 -3.41 -3.12
phi2 6.09 4.68 4.03
phi3 8.27 6.25 5.34
What happens if we add another lag?
> summary(dynlm(d(yt) ~ t + L(yt, 1) + L(d(yt), 1:2)))
Time series regression with "ts" data:
Start = 4, End = 1001
Call:
dynlm(formula = d(yt) ~ t + L(yt, 1) + L(d(yt), 1:2))
Residuals:
Min
1Q
Median
-2.90554 -0.69296 -0.00589

3Q
0.70502

Max
3.10600

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
0.1575230 0.0656766
2.398
0.0166 *
t
0.0008424 0.0005911
1.425
0.1544
L(yt, 1)
-0.0011135 0.0007270 -1.532
0.1259
L(d(yt), 1:2)1 0.8165655 0.0316832 25.773
<2e-16 ***
L(d(yt), 1:2)2 -0.0233147 0.0317436 -0.734
0.4628
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.9806 on 993 degrees of freedom
Multiple R-squared: 0.6366,
Adjusted R-squared: 0.6351
F-statistic: 434.9 on 4 and 993 DF, p-value: < 2.2e-16
> summary(ur.df(yt, type = "trend", lags = 2))
###############################################
# Augmented Dickey-Fuller Test Unit Root Test #
###############################################
Test regression trend

Call:
lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)
Residuals:
Min
1Q
Median
-2.90554 -0.69296 -0.00589

3Q
0.70502

Max
3.10600

335

336

Univariate Time Series Models

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.1575230 0.0656766
2.398
0.0166 *
z.lag.1
-0.0011135 0.0007270 -1.532
0.1259
tt
0.0008424 0.0005911
1.425
0.1544
z.diff.lag1 0.8165655 0.0316832 25.773
<2e-16 ***
z.diff.lag2 -0.0233147 0.0317436 -0.734
0.4628
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.9806 on 993 degrees of freedom
Multiple R-squared: 0.6366,
Adjusted R-squared: 0.6351
F-statistic: 434.9 on 4 and 993 DF, p-value: < 2.2e-16

Value of test-statistic is: -1.5317 7.3383 1.2714


Critical values for test statistics:
1pct 5pct 10pct
tau3 -3.96 -3.41 -3.12
phi2 6.09 4.68 4.03
phi3 8.27 6.25 5.34

8.8.6

Exercise

Lets now consider a Random Walk


> set.seed(456)
> n <- 1000
> yt <- ts(cumsum(rnorm(n)))
> library(dynlm)
> summary(dynlm(d(yt) ~ -1 + L(yt, 1) + L(d(yt), 1)))
Time series regression with "ts" data:
Start = 3, End = 1000
Call:
dynlm(formula = d(yt) ~ -1 + L(yt, 1) + L(d(yt), 1))
Residuals:
Min
1Q
-2.99524 -0.65714

Median
0.02312

3Q
0.73171

Max
3.03899

Coefficients:
Estimate Std. Error t value Pr(>|t|)
L(yt, 1)
0.0007409 0.0008270
0.896
0.371

Univariate Time Series Models

L(d(yt), 1) 0.0333952

0.0317223

1.053

337

0.293

Residual standard error: 0.9818 on 996 degrees of freedom


Multiple R-squared: 0.002023,
Adjusted R-squared:
F-statistic: 1.01 on 2 and 996 DF, p-value: 0.3647
> summary(ur.df(yt, type = "none", lags = 1))
###############################################
# Augmented Dickey-Fuller Test Unit Root Test #
###############################################

1.943e-05

Test regression none

Call:
lm(formula = z.diff ~ z.lag.1 - 1 + z.diff.lag)
Residuals:
Min
1Q
-2.99524 -0.65714

Median
0.02312

3Q
0.73171

Max
3.03899

Coefficients:
Estimate Std. Error t value Pr(>|t|)
z.lag.1
0.0007409 0.0008270
0.896
0.371
z.diff.lag 0.0333952 0.0317223
1.053
0.293
Residual standard error: 0.9818 on 996 degrees of freedom
Multiple R-squared: 0.002023,
Adjusted R-squared:
F-statistic: 1.01 on 2 and 996 DF, p-value: 0.3647

Value of test-statistic is: 0.8959


Critical values for test statistics:
1pct 5pct 10pct
tau1 -2.58 -1.95 -1.62
> summary(ur.df(yt, type = "none", lags = 0))
###############################################
# Augmented Dickey-Fuller Test Unit Root Test #
###############################################
Test regression none

Call:
lm(formula = z.diff ~ z.lag.1 - 1)

1.943e-05

338

Univariate Time Series Models

Residuals:
Min
1Q
-3.00751 -0.66268

Median
0.02014

3Q
0.73729

Max
3.01637

Coefficients:
Estimate Std. Error t value Pr(>|t|)
z.lag.1 0.0007877 0.0008256
0.954
0.34
Residual standard error: 0.9816 on 998 degrees of freedom
Multiple R-squared: 0.0009112,
Adjusted R-squared:
F-statistic: 0.9102 on 1 and 998 DF, p-value: 0.3403

-8.988e-05

Value of test-statistic is: 0.9541


Critical values for test statistics:
1pct 5pct 10pct
tau1 -2.58 -1.95 -1.62

8.9

Long-run Purchasing Power Parity (Part 1) (Section 8.5)

To import the data from the file ppp2.wf1, which is a work file of EViews, call first
the package hexView and next the command readEViews.
> library(hexView)
> ppp <- readEViews(unzip("ch08.zip", "Chapter 8/ppp2.wf1"))
Skipping boilerplate variable
Skipping boilerplate variable
Observations from January 1988 to December 2010 (T=276) on price indices and
exhange rates for the United Kingdom, the United States and the Euro area are
available11 .

cpieuro: price index Euro area

cpiuk: price index United Kingdom

cpius: price index United States

fxde: exchange rate dollar/euro

fxdp: exchange rate dollar/pound

fxep: exchange rate euro/pound

logcpieuro: log price index Euro area (Jan 1988=100)

logcpiuk: log price index United Kingdom (Jan 1988=100)

11 As Verbeek observes in Sections 8.5, 9.3 and 9.5.4, the exchange rates are accidentally not
converted to logs when producing the results. Here they are converted to logs for obtaining results.

Univariate Time Series Models

339

4.6

4.8

5.0

5.2

LOGCPIEURO
LOGCPIUK

1990

1995

2000

2005

2010

Time

Figure 8.42

Log consumer price index UK and Euro area, Jan 1988Dec 2010

logcpius: log price index United States (Jan 1988=100)

By using the function ts(object, start, frequency) it is possible to create a


multiple time series from the columns of a table or a matrix; here we have to specify
freq=12 since data have a monthly frequency.
> ppp <- ts(ppp, start = c(1988, 1), freq = 12)
Figure 8.42 represents the log consumer price index for UK and the Euro area.
> library(lattice)
> xyplot(ppp[, 7:8], lty = c(1, 2), superpose = TRUE)
Before analysing if the purhasing power parity condition, see Verbeeks relationship
(8.60), referred to log variables:
st = pt pt
we have to analyse, as a first necessary step, the properties of the involved variables.
The Dickey-Fuller test statistic can be obtained for the log Euro series, obviously in
the version with the trend and the drift, see Fig. 8.42, by using the following code.

340

Univariate Time Series Models

> library(urca)
> y <- ppp[, 7]
> summary(ur.df(y, type = "trend", lags = 0))
###############################################
# Augmented Dickey-Fuller Test Unit Root Test #
###############################################
Test regression trend

Call:
lm(formula = z.diff ~ z.lag.1 + 1 + tt)
Residuals:
Min
1Q
Median
-0.0098260 -0.0013041 -0.0000486

3Q
0.0013773

Max
0.0092253

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.010e-01 3.480e-02
2.902 0.00401 **
z.lag.1
-2.106e-02 7.467e-03 -2.821 0.00514 **
tt
3.306e-05 1.399e-05
2.363 0.01882 *
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.002523 on 272 degrees of freedom
Multiple R-squared: 0.06164,
Adjusted R-squared: 0.05474
F-statistic: 8.934 on 2 and 272 DF, p-value: 0.0001746

Value of test-statistic is: -2.8211 63.4224 8.9344


Critical values for test statistics:
1pct 5pct 10pct
tau3 -3.98 -3.42 -3.13
phi2 6.15 4.71 4.05
phi3 8.34 6.30 5.36
From the preceding outputs the 5% critical values can be recovered: 3.42 for the
test with the drift and the trend.
It is possible to create a function to simultaneously run a range of Augmented DickeyFuller tests and reproduce Verbeeks Table 8.2.
By finding results for Euro and UK separately we also consider a function f1 which
returns 1 if the corresponding (Augmented) Dickey-Fuller Test concludes for accepting
the presence of a 1 root:

Univariate Time Series Models

341

> f <- function(x) {


wt <- summary(ur.df(y, type = "trend", lags = x))@teststat[1]
}
> f1 <- function(x) {
wt <- summary(ur.df(y, type = "trend", lags = x))
wt <- (wt@teststat[1] >= wt@cval[1, 2])
}
> print("Euro")
[1] "Euro"
> y <- ppp[, 7]
> a <- c(0:12, 24, 36)
> names(a) <- c("DF", paste("ADF(", a[-1], ")", sep = ""))
> out <- sapply(a, f)
> out1 <- sapply(a, f1)
> print(round(out, 3))
DF ADF(1) ADF(2) ADF(3) ADF(4) ADF(5) ADF(6) ADF(7)
-2.821 -2.810 -2.912 -3.029 -3.241 -3.402 -3.173 -3.368
ADF(8) ADF(9) ADF(10) ADF(11) ADF(12) ADF(24) ADF(36)
-3.518 -3.600 -3.704 -3.730 -3.506 -3.098 -3.361
> print(out1)
DF ADF(1) ADF(2) ADF(3) ADF(4) ADF(5) ADF(6) ADF(7)
TRUE
TRUE
TRUE
TRUE
TRUE
TRUE
TRUE
TRUE
ADF(8) ADF(9) ADF(10) ADF(11) ADF(12) ADF(24) ADF(36)
FALSE
FALSE
FALSE
FALSE
FALSE
TRUE
TRUE
For the UK consumer price index we have:
> print("UK")
[1] "UK"
> y <- ppp[, 8]
> out <- sapply(a, f)
> out1 <- sapply(a, f1)
> print(round(out, 3))
DF ADF(1) ADF(2) ADF(3) ADF(4) ADF(5) ADF(6)
-3.587 -3.535 -3.697 -3.706 -3.785 -3.936 -3.316
ADF(8) ADF(9) ADF(10) ADF(11) ADF(12) ADF(24) ADF(36)
-3.401 -3.763 -3.816 -3.840 -3.678 -4.068 -2.262
> print(out1)
DF ADF(1) ADF(2) ADF(3) ADF(4) ADF(5) ADF(6)
FALSE
FALSE
FALSE
FALSE
FALSE
FALSE
TRUE
ADF(8) ADF(9) ADF(10) ADF(11) ADF(12) ADF(24) ADF(36)
TRUE
FALSE
FALSE
FALSE
FALSE
FALSE
TRUE
And for Euro and UK simultaneously

ADF(7)
-3.439

ADF(7)
FALSE

342

Univariate Time Series Models

> for (ind in 7:8) {


y <- ppp[, ind]
a <- c(0:12, 24, 36)
names(a) <- c("DF", paste("ADF(", a[-1], ")",
sep = ""))
print(switch(ind - 6, "Euro", "UK"))
out <- sapply(a, f)
print(round(out, 3))
}
[1] "Euro"
DF ADF(1) ADF(2) ADF(3) ADF(4) ADF(5) ADF(6)
-2.821 -2.810 -2.912 -3.029 -3.241 -3.402 -3.173
ADF(8) ADF(9) ADF(10) ADF(11) ADF(12) ADF(24) ADF(36)
-3.518 -3.600 -3.704 -3.730 -3.506 -3.098 -3.361
[1] "UK"
DF ADF(1) ADF(2) ADF(3) ADF(4) ADF(5) ADF(6)
-3.587 -3.535 -3.697 -3.706 -3.785 -3.936 -3.316
ADF(8) ADF(9) ADF(10) ADF(11) ADF(12) ADF(24) ADF(36)
-3.401 -3.763 -3.816 -3.840 -3.678 -4.068 -2.262

ADF(7)
-3.368

ADF(7)
-3.439

We can observe how for the UK consumer price index the null hypothesis of unit root
is not rejected at 5% only for lags 6, 8 and 36; while for the other lags it is accepted
the hypothesis of stationarity at a confidence level of 5%. If we consider a confidence
level of 1%, to which correspond the critical value 3.98, the null of unit root is never
rejected. A larger negative evidence would be necessary for the Dickey-Fuller statistic
to reject the null of unit root and accept the trend stationarity.
With reference to the log exchange rate Euro/UK we have the following unit root
test results
> f <- function(x) {
nt <- summary(ur.df(y, type = "drift", lags = x))@teststat[1]
wt <- summary(ur.df(y, type = "trend", lags = x))@teststat[1]
return(c(nt, wt))
}
> f1 <- function(x) {
nt <- summary(ur.df(y, type = "drift", lags = x))
nt <- (nt@teststat[1] >= nt@cval[1, 2])
wt <- summary(ur.df(y, type = "trend", lags = x))
wt <- (wt@teststat[1] >= wt@cval[1, 2])
return(c(nt, wt))
}
> a <- 0:6
> names(a) <- c("DF", paste("ADF(", a[-1], ")", sep = ""))
> y <- log(ppp[, 6])
> out <- sapply(a, f)
> out1 <- sapply(a, f1)
> rownames(out) <- rownames(out1) <- c("Without trend",

Univariate Time Series Models

343

"With trend")
> print(round(out, 3))
DF ADF(1) ADF(2) ADF(3) ADF(4) ADF(5) ADF(6)
Without trend -1.264 -1.215 -1.243 -1.340 -1.525 -1.404 -1.284
With trend
-1.249 -1.195 -1.222 -1.319 -1.505 -1.375 -1.248
> print(out1)
DF ADF(1) ADF(2) ADF(3) ADF(4) ADF(5) ADF(6)
Without trend TRUE
TRUE
TRUE
TRUE
TRUE
TRUE
TRUE
With trend
TRUE
TRUE
TRUE
TRUE
TRUE
TRUE
TRUE
we have no rejection of the null hypothesis of unit root.
With reference to the log real exchange rate between Euro area and UK:
rst = st (pt pt )
see Fig. 8.43 obtained with the code
> xyplot(ts(log(ppp[, 6]) - (ppp[, 7] - ppp[, 8]),
start = c(1988, 1), freq = 12))
we have the following unit root test results
> y <- ts(log(ppp[, 6]) - (ppp[, 7] - ppp[, 8]), start = c(1988,
1), freq = 12)
> a <- c(0:6, 12)
> names(a) <- c("DF", paste("ADF(", a[-1], ")", sep = ""))
> out <- sapply(a, f)
> out1 <- sapply(a, f1)
> rownames(out) <- rownames(out1) <- c("Without trend",
"With trend")
> print(round(out, 3))
DF ADF(1) ADF(2) ADF(3) ADF(4) ADF(5) ADF(6)
Without trend -1.492 -1.473 -1.427 -1.476 -1.627 -1.520 -1.389
With trend
-1.490 -1.469 -1.418 -1.466 -1.616 -1.504 -1.367
ADF(12)
Without trend -1.993
With trend
-1.966
> print(out1)
DF ADF(1) ADF(2) ADF(3) ADF(4) ADF(5) ADF(6)
Without trend TRUE
TRUE
TRUE
TRUE
TRUE
TRUE
TRUE
With trend
TRUE
TRUE
TRUE
TRUE
TRUE
TRUE
TRUE
ADF(12)
Without trend
TRUE
With trend
TRUE
the null hypothesis of a unit root in rst cannot be rejected.
The KPSS test (Bartlett weights) with a lag length of 6 results:

Univariate Time Series Models

0.2

0.3

0.4

0.5

0.6

344

1990

1995

2000

2005

2010

Time

Figure 8.43

Log real exchange rate Euro area-UK, 1988:12010:12

> summary(ur.kpss(y, type = "mu", use.lag = 6))


#######################
# KPSS Unit Root Test #
#######################
Test is of type: mu with 6 lags.
Value of test-statistic is: 0.5789
Critical value for a significance level of:
10pct 5pct 2.5pct 1pct
critical values 0.347 0.463 0.574 0.739
Based on the results from the Standard Dickey-Fuller regression (without a trend) we
obtain an autocorrelation coefficient equal to:
> library(dynlm)
> (DFreg <- summary(dynlm(y ~ L(y))))

Univariate Time Series Models

345

Time series regression with "ts" data:


Start = 1988(2), End = 2010(12)
Call:
dynlm(formula = y ~ L(y))
Residuals:
Min
1Q
Median
-0.091770 -0.009943 -0.000087

3Q
0.014394

Max
0.053296

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.006740
0.004856
1.388
0.166
L(y)
0.981693
0.012267 80.024
<2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.02116 on 273 degrees of freedom
Multiple R-squared: 0.9591,
Adjusted R-squared: 0.959
F-statistic: 6404 on 1 and 273 DF, p-value: < 2.2e-16
Verbeek observes that according to this result a proportion of 0.982 of any shock in
the real exchange rate will still remain after one month. Thus the shock after two
months is
> DFreg$coef[2]^2
[1] 0.9637211
and the half life of a shock, describing how long it would take for the effect of a shock
to die out 50%, results
> log(0.5)/log(DFreg$coef[2])
[1] 37.51477

8.10

The Persistence of Inflation (Section 8.8)

Data are available in the file inflation.dat.


The last value in the file is missing and the first column contains time references, so
we drop the pertaining rows/columns.
The files inflation contains the data series of the quarterly U.S. inflation rate,
after seasonal adjustment. The inflation rate is expressed in % per year. Source:
Bureau of Labor Statistics.
> infl <- read.table(unzip("ch08.zip", "Chapter 8/inflation.dat"),
header = TRUE)
> infl <- infl[-nrow(infl), -1]
> infl <- ts(infl, start = c(1960, 1), freq = 4)

Univariate Time Series Models

10

10

15

346

1960

1970

1980

1990

2000

2010

Time

Figure 8.44

Quarterly inflation in the United States, 19602010

To investigate persistence of inflation, that is how strongly a current shock to inflation


affects future inflation rates, and how long the inflation rate needs to return to its
previous level Verbeek investigates the dynamic properties of the quarterly inflation
rate in the US.
See Fig. 8.44, for a graphical representation of the series
> library(lattice)
> xyplot(infl)
The ADF test for unit roots with two and four lags results 3.145 and 2.811
respectively rejecting the presence of a unit root at 5% level (in case of 4 lags we
are on the boundary of the rejecting region):
> library(urca)
> summary(ur.df(infl, type = "drift", lags = 2))
###############################################
# Augmented Dickey-Fuller Test Unit Root Test #
###############################################

Univariate Time Series Models

Test regression drift

Call:
lm(formula = z.diff ~ z.lag.1 + 1 + z.diff.lag)
Residuals:
Min
1Q
-16.6299 -0.9669

Median
0.0608

3Q
1.1217

Max
7.3940

Coefficients:
Estimate Std. Error t value
(Intercept) 0.76124
0.28860
2.638
z.lag.1
-0.18593
0.05911 -3.145
z.diff.lag1 -0.52310
0.07622 -6.863
z.diff.lag2 -0.29818
0.06921 -4.308
--Signif. codes: 0 "***" 0.001 "**" 0.01

Pr(>|t|)
0.00901
0.00192
8.57e-11
2.60e-05

**
**
***
***

"*" 0.05 "." 0.1 " " 1

Residual standard error: 2.373 on 197 degrees of freedom


Multiple R-squared: 0.3554,
Adjusted R-squared: 0.3456
F-statistic: 36.2 on 3 and 197 DF, p-value: < 2.2e-16

Value of test-statistic is: -3.1454 4.9552


Critical values for test statistics:
1pct 5pct 10pct
tau2 -3.46 -2.88 -2.57
phi1 6.52 4.63 3.81
> summary(ur.df(infl, type = "drift", lags = 4))
###############################################
# Augmented Dickey-Fuller Test Unit Root Test #
###############################################
Test regression drift

Call:
lm(formula = z.diff ~ z.lag.1 + 1 + z.diff.lag)
Residuals:
Min
1Q
-16.2318 -1.0063
Coefficients:

Median
0.0886

3Q
1.0133

Max
7.2619

347

348

Univariate Time Series Models

Estimate Std. Error t value


(Intercept) 0.72058
0.30166
2.389
z.lag.1
-0.17547
0.06242 -2.811
z.diff.lag1 -0.51017
0.08459 -6.031
z.diff.lag2 -0.30975
0.09210 -3.363
z.diff.lag3 -0.02201
0.08857 -0.249
z.diff.lag4 -0.09943
0.07281 -1.366
--Signif. codes: 0 "***" 0.001 "**" 0.01

Pr(>|t|)
0.017873
0.005447
8.16e-09
0.000929
0.803994
0.173639

*
**
***
***

"*" 0.05 "." 0.1 " " 1

Residual standard error: 2.378 on 193 degrees of freedom


Multiple R-squared: 0.3601,
Adjusted R-squared: 0.3436
F-statistic: 21.73 on 5 and 193 DF, p-value: < 2.2e-16

Value of test-statistic is: -2.8111 3.9564


Critical values for test statistics:
1pct 5pct 10pct
tau2 -3.46 -2.88 -2.57
phi1 6.52 4.63 3.81
By considering more lags (from 0 to 12) Verbeek observes that it becomes increasingly
less likely to reject the null hypothesis of unit root:
> a <- sapply(0:12, function(lag) ur.df(infl, type = "drift",
lags = lag)@teststat[1])
> names(a) <- c("DF", paste("ADF(", 1:12, ")", sep = ""))
> round(a, 3)
DF ADF(1) ADF(2) ADF(3) ADF(4) ADF(5) ADF(6) ADF(7)
-7.044 -4.388 -3.145 -3.133 -2.811 -2.489 -2.680 -3.168
ADF(8) ADF(9) ADF(10) ADF(11) ADF(12)
-2.897 -2.591 -2.277 -2.222 -2.403
Figure 8.45 reports the autocorrelation and partial autocorrelation functions.
Verbeek by plotting the autocorrelation function includes two standard error
bounds for k based on the estimated variance 1 + 2
21 + . . . + 2
2k1 . Observe that
this estimate of the variance holds under the hypothesis that q = 0 for q > k,
that is the process is a Moving Average process of order k. On the booksite
www.educatt.it/libri/materiali a version of the function acf2 is present, which
includes also the logical argument ma.test that can be set to FALSE or TRUE,
according respectively to the white noise or moving average hypotheses.
> source("acf2.r")
> t(acf2(infl, max.lag = 30, ma.test = TRUE))
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
ACF 0.61 0.59 0.60 0.46 0.49 0.49 0.36 0.29 0.30 0.26 0.26

Univariate Time Series Models

349

0.4

0.0

ACF
0.4

0.8

Series: infl

4
LAG

4
LAG

0.4

0.0

PACF
0.4

0.8

Figure 8.45 Sample autocorrelation and partial autocorrelation functions of inflation rate.
The scale for lags is years. The 2 is at the eight-th position since we have quarterly data.

PACF 0.61 0.35 0.28 -0.04 0.09 0.12 -0.11 -0.18 0.02 0.08 0.07
[,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21]
ACF
0.26 0.19 0.27 0.22 0.23 0.29 0.23 0.23 0.27 0.25
PACF 0.03 -0.05 0.21 -0.04 0.00 0.07 0.01 -0.02 0.02 0.04
[,22] [,23] [,24] [,25] [,26] [,27] [,28] [,29] [,30]
ACF
0.29 0.21 0.21 0.16 0.13 0.16 0.14 0.06 0.06
PACF 0.12 -0.19 -0.02 -0.06 -0.05 0.05 0.01 -0.08 0.02
The autocorrelation function confirms the high persistence of inflation. Verbeek
initially assumes that there are no unit roots and proposes the estimation of an
AR(3) model
Yt = + 1 Yt1 + 2 Yt2 + 3 Yt3 + t
since the partial autocorrelation function cuts off at lag 3.

350

Univariate Time Series Models

8.10.1

AR estimation

We can use the function dynlm in the package dynlm to obtain the estimate of the
AR(3) model by OLS:
> library(dynlm)
> ar3regr <- dynlm(infl ~ L(infl) + L(infl, 2) + L(infl,
3))
> summary(ar3regr)
Time series regression with "ts" data:
Start = 1960(4), End = 2010(4)
Call:
dynlm(formula = infl ~ L(infl) + L(infl, 2) + L(infl, 3))
Residuals:
Min
1Q
-16.6299 -0.9669

Median
0.0608

3Q
1.1217

Max
7.3940

Coefficients:
Estimate Std. Error t value
(Intercept) 0.76124
0.28860
2.638
L(infl)
0.29097
0.06823
4.264
L(infl, 2)
0.22492
0.06931
3.245
L(infl, 3)
0.29818
0.06921
4.308
--Signif. codes: 0 "***" 0.001 "**" 0.01

Pr(>|t|)
0.00901
3.11e-05
0.00138
2.60e-05

**
***
**
***

"*" 0.05 "." 0.1 " " 1

Residual standard error: 2.373 on 197 degrees of freedom


Multiple R-squared: 0.4909,
Adjusted R-squared: 0.4831
F-statistic: 63.31 on 3 and 197 DF, p-value: < 2.2e-16
The code dynlm(infl L(infl,1:3)) gives the same result.
The estimate of the mean of the process can be obtained as:
> mean(infl)
[1] 3.952869
or in an indirect way by using the estimate of the intercept and of the autoregressive
parameters:
> ar3regr$coef[[1]]/(1 - sum(ar3regr$coef[-1]))
[1] 4.094233
Estimation via maximum likelihood can be obtained with the function arima, see
Section 8.5 also for some other R functions performing ARMA model parameter
estimation.

Univariate Time Series Models

8.10.2

351

The Ljung-Box statistic - construction

The Ljung-Box statistics for the first 6 autocorrelations can be obtained by applying
Verbeeks relationship (8.67) to the residuals of the regression
QK = T (T + 2)

K
X
k=1

where

rk2

1
r2 .
T k k

is the square of the autocorrelation of order k. With regard to Q6 :

> T <- length(ar3regr$res)


> T * (T + 2) * sum((1/(T - 1:6)) * acf(ar3regr$res)$acf[2:7]^2)
[1] 10.67575

8.10.3

The Ljung-Box statistic - direct function

It is possible to make direct use of the function Box.test(x, lag, type, fitdf),
where x is a univariate time series object; lag specifies the number of lags to consider;
type the statistic to compute "Box-Pierce" or "Ljung-Box"; fitdf is the number of
degrees of freedom to be subtracted to lag, if x is a series of residuals, usually fitdf
= p+q (where p and q are the orders respectively of the AR and MA parts of an
ARMA model describing the process level) so the degrees fo freedom are lag (p + q),
provided of course that lag > fitdf.
The Ljung-Box statistics for the first 6 and 12 autocorrelations result:
> Box.test(residuals(ar3regr), lag = 6, type = "Ljung-Box",
fitdf = 3)
Box-Ljung test
data: residuals(ar3regr)
X-squared = 10.6758, df = 3, p-value = 0.01361
> Box.test(residuals(ar3regr), lag = 12, type = "Ljung-Box",
fitdf = 3)
Box-Ljung test
data: residuals(ar3regr)
X-squared = 16.8408, df = 9, p-value = 0.05127
To obtain AIC and BIC according to Verbeek formulae (8.68) and (8.69) use12
> T = length(infl)
> log(summary(ar3regr)$sigma^2) + 2 * (3 + 1)/T
[1] 1.767248
> log(summary(ar3regr)$sigma^2) + (3 + 1)/T * log(T)
[1] 1.83231
12 In the output Verbeek reports AIC and BIC computed based on the log likelihood, say
AIC = 2l/T + 2(p + q)/T where l is the log likelihood, and not on the variance of the residuals.
The function arima, see the next subsection, uses 2l + 2(p + q) for AIC.

352

Univariate Time Series Models

8.10.4

AR estimation via Maximum Likelihood

We can use the function arima to obtain the estimate of the AR(3) model:
> (ar3est <- arima(infl, c(3, 0, 0)))
Series: infl
ARIMA(3,0,0) with non-zero mean
Coefficients:
ar1
ar2
0.2925 0.2278
s.e. 0.0669 0.0681

ar3
0.2970
0.0681

intercept
3.7845
0.8628

sigma^2 estimated as 5.489: log likelihood=-463.63


AIC=937.26
AICc=937.56
BIC=953.85
where intercept is an estimate for the mean of the process.

8.10.5

AR(4) estimation

Verbeek extends the model by adding an additional autoregressive term; the estimate
of the model with OLS is:
> ar4regr <- dynlm(infl ~ L(infl) + L(infl, 2) + L(infl,
3) + L(infl, 4))
> summary(ar4regr)
Time series regression with "ts" data:
Start = 1961(1), End = 2010(4)
Call:
dynlm(formula = infl ~ L(infl) + L(infl, 2) + L(infl, 3) + L(infl,
4))
Residuals:
Min
1Q
-16.5054 -1.0348

Median
0.1046

3Q
1.0461

Max
7.3404

Coefficients:
Estimate Std. Error t value
(Intercept) 0.77308
0.29583
2.613
L(infl)
0.30700
0.07179
4.277
L(infl, 2)
0.23416
0.07170
3.266
L(infl, 3)
0.31272
0.07258
4.309
L(infl, 4) -0.04466
0.07266 -0.615
--Signif. codes: 0 "***" 0.001 "**" 0.01

Pr(>|t|)
0.00967
2.97e-05
0.00129
2.60e-05
0.53950

**
***
**
***

"*" 0.05 "." 0.1 " " 1

Univariate Time Series Models

Residual standard error: 2.38 on 195 degrees of freedom


Multiple R-squared: 0.4927,
Adjusted R-squared:
F-statistic: 47.35 on 4 and 195 DF, p-value: < 2.2e-16

353

0.4823

while by using maximum likelihood we have


> (ar4est <- arima(infl, c(4, 0, 0)))
Series: infl
ARIMA(4,0,0) with non-zero mean
Coefficients:
ar1
ar2
0.3055 0.2381
s.e. 0.0701 0.0701

ar3
0.3097
0.0711

ar4
-0.0440
0.0712

intercept
3.8046
0.8301

sigma^2 estimated as 5.478: log likelihood=-463.44


AIC=938.88
AICc=939.31
BIC=958.79
The estimate for the fourth autoregressive lag is not significant. The Ljung-Box
statistics for the first 6 and 12 autocorrelations result:
> Box.test(residuals(ar4regr), lag = 6, type = "Ljung-Box",
fitdf = 4)
Box-Ljung test
data: residuals(ar4regr)
X-squared = 11.2495, df = 2, p-value = 0.003608
> Box.test(residuals(ar4regr), lag = 12, type = "Ljung-Box",
fitdf = 4)
Box-Ljung test
data: residuals(ar4regr)
X-squared = 17.2358, df = 8, p-value = 0.02774

8.10.6

ARMA estimation

Verbeeks adds a moving average term to the AR(3) model. The model:
Yt = + 1 Yt1 + 2 Yt2 + 3 Yt3 + t + 1 t1
can be estimated with (Observe that the mean is returned named as intercept):
> (maest <- arima(infl, c(3, 0, 1)))
Series: infl
ARIMA(3,0,1) with non-zero mean

354

Univariate Time Series Models

Coefficients:
ar1
ar2
0.1047 0.3029
s.e. 0.2078 0.1055

ar3
0.3607
0.0871

ma1
0.2069
0.2221

intercept
3.8094
0.8235

sigma^2 estimated as 5.471: log likelihood=-463.31


AIC=938.63
AICc=939.05
BIC=958.54
and the Ljung-Box tests for the first 6 and 12 autocorrelations result
> sapply(c(6, 12), function(i) Box.test(residuals(maest),
lag = i, type = "Ljung-Box", fitdf = 4))
[,1]
[,2]
statistic 10.83081
17.11687
parameter 2
8
p.value
0.004447542
0.02891488
method
"Box-Ljung test"
"Box-Ljung test"
data.name "residuals(maest)" "residuals(maest)"

8.10.7

AR(6) estimation

Since the three estimated models still exhibit some residual serial correlation Verbeek
suggests to inspect the residual autocorrelation and partial autocorrelation functions,
see Fig. 8.46,
> library(astsa)
> t(acf2(ar3regr$res, max.lag = 30))
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
ACF 0.02 -0.02 -0.05 -0.08 0.09 0.18 -0.03 -0.16 -0.04 -0.01
PACF 0.02 -0.02 -0.05 -0.08 0.09 0.18 -0.04 -0.16 -0.01 0.01
[,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20]
ACF
0.00 0.04 -0.14 0.07 -0.05 0.00 0.11 -0.02 -0.05 0.07
PACF -0.05 -0.01 -0.12 0.14 -0.06 -0.03 0.10 0.00 -0.05 0.05
[,21] [,22] [,23] [,24] [,25] [,26] [,27] [,28] [,29] [,30]
ACF
0.06 0.17 0.01 0.04 -0.08 -0.04 0.08 0.09 -0.06 -0.04
PACF 0.08 0.20 -0.07 0.06 0.00 -0.07 0.06 0.04 -0.04 0.03
According to Verbeek the inclusion of a sixth lag seems to be appropriate. We have:
> ar6regr <- dynlm(infl ~ L(infl, 1:6))
> summary(ar6regr)
Time series regression with "ts" data:
Start = 1961(3), End = 2010(4)
Call:
dynlm(formula = infl ~ L(infl, 1:6))

Univariate Time Series Models

355

0.2

0.2

ACF

0.6

1.0

Series: ar3regr$res

4
LAG

4
LAG

0.2

0.2

PACF

0.6

1.0

Figure 8.46

Sample autocorrelation and partial autocorrelation functions of the residuals.

Residuals:
Min
1Q
-16.089 -1.049

Median
0.150

3Q
1.043

Max
7.286

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
0.66109
0.30552
2.164 0.03172 *
L(infl, 1:6)1 0.30011
0.07196
4.170 4.61e-05 ***
L(infl, 1:6)2 0.21447
0.07514
2.854 0.00479 **
L(infl, 1:6)3 0.24636
0.07775
3.168 0.00178 **
L(infl, 1:6)4 -0.10612
0.07759 -1.368 0.17301
L(infl, 1:6)5 0.05719
0.07591
0.753 0.45215
L(infl, 1:6)6 0.13007
0.07300
1.782 0.07636 .
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 2.367 on 191 degrees of freedom

356

Univariate Time Series Models

Multiple R-squared: 0.5012,


Adjusted R-squared:
F-statistic: 31.98 on 6 and 191 DF, p-value: < 2.2e-16
> (ar4est <- arima(infl, c(6, 0, 0)))
Series: infl
ARIMA(6,0,0) with non-zero mean
Coefficients:
ar1
ar2
0.2991 0.2186
s.e. 0.0695 0.0727

ar3
0.2474
0.0751

ar4
-0.1038
0.0753

ar5
0.0595
0.0736

ar6
0.1287
0.0707

0.4855

intercept
3.6914
1.0147

sigma^2 estimated as 5.336: log likelihood=-460.84


AIC=937.67
AICc=938.41
BIC=964.22
> sapply(c(6, 12), function(i) Box.test(residuals(ar4est),
lag = i, type = "Ljung-Box", fitdf = 6))
[,1]
[,2]
statistic 2.92888
13.39927
parameter 0
6
p.value
0
0.03711593
method
"Box-Ljung test"
"Box-Ljung test"
data.name "residuals(ar4est)" "residuals(ar4est)"
The three additional lags estimates are not significantly different from 0, we have the
lowest AIC but the BIC is better with AR(3).

8.10.8

Non complete models

Finally Verbeek suggests the estimation of a non-complete model, including for the
AR the first three and the sixth lags. OLS parameter estimates can be obtained by
using the function dynlm.
> summary(ar6regrrestr <- dynlm(infl ~ L(infl, c(1:3,
6))))
Time series regression with "ts" data:
Start = 1961(3), End = 2010(4)
Call:
dynlm(formula = infl ~ L(infl, c(1:3, 6)))
Residuals:
Min
1Q
-16.4714 -0.9931

Median
0.0571

3Q
1.1065

Max
7.4546

Coefficients:
(Intercept)

Estimate Std. Error t value Pr(>|t|)


0.64992
0.30139
2.156 0.032291 *

Univariate Time Series Models

L(infl,
L(infl,
L(infl,
L(infl,
--Signif.

c(1:3,
c(1:3,
c(1:3,
c(1:3,
codes:

6))1
6))2
6))3
6))6

0.27279
0.21183
0.23985
0.12020

0.06928
0.06983
0.07591
0.06657

3.937
3.034
3.160
1.806

0.000115
0.002750
0.001835
0.072532

357

***
**
**
.

0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1

Residual standard error: 2.368 on 193 degrees of freedom


Multiple R-squared: 0.4957,
Adjusted R-squared: 0.4852
F-statistic: 47.42 on 4 and 193 DF, p-value: < 2.2e-16
The model can also be estimated with maximum likelihood: the estimation of subsets
ARMA models can be obtained by using the argument fixed in the arima function.
Only the parameter corresponding to NA in the vector, which must have the same
length of the number of the parameters, will be estimated. Ljung-Box statistics
are also reported for lags 6 and 12 (The number of degrees of freedom is named
parameter).
> (ar6restr <- arima(infl, c(6, 0, 0), fixed = c(NA,
NA, NA, 0, 0, NA, NA)))
Series: infl
ARIMA(6,0,0) with non-zero mean
Coefficients:
ar1
ar2
0.2723 0.2170
s.e. 0.0672 0.0677

ar3
0.2414
0.0737

ar4
0
0

ar5
0
0

ar6
0.1206
0.0649

intercept
3.6820
1.0307

sigma^2 estimated as 5.395: log likelihood=-461.92


AIC=935.84
AICc=936.58
BIC=962.39
> sapply(c(6, 12), function(i) Box.test(residuals(ar6restr),
lag = i, type = "Ljung-Box", fitdf = 6))
[,1]
[,2]
statistic 5.246681
14.90244
parameter 0
6
p.value
0
0.02102923
method
"Box-Ljung test"
"Box-Ljung test"
data.name "residuals(ar6restr)" "residuals(ar6restr)"
Verbeek proposes three scalar measures for evaluating the persistence of inflation.

the sum of the coefficients in the autoregressive specification, which for the
AR(3) and the non complete AR(6) models results:
> sum(coef(ar3regr)[-1])
[1] 0.8140705

358

Univariate Time Series Models

> sum(coef(ar6regrrestr)[-1])
[1] 0.8446656

The maximum root in the auxiliary equations:


z 3 1 z 2 2 z 3 = 0
z 6 1 z 5 2 z 4 3 z 3 6 = 0
which must be inside the unit circle. The roots can be obtained with the function
polyroot, whose argument is the vector of the coefficients in the preceding
polynomial equations ordered according to the power of z. The function abs
returns the modulus.
> coefs <- rev(coef(ar3regr)[-1])
> abs(polyroot(c(-coefs, 1)))
[1] 0.5742232 0.5742232 0.9043114
> coefs <- rev(coef(ar6regrrestr)[-1])
> abs(polyroot(c(-coefs[1], 0, 0, -coefs[-1], 1)))
[1] 0.6130880 0.6344583 0.7332198 0.6130880 0.7332198 0.9375408

The estimates of the half-life shocks


h=

log(0.5)
P

p
log
j=1 j

> log(0.5)/log(sum(coef(ar3regr)[-1]))
[1] 3.369564
> log(0.5)/log(sum(coef(ar6regrrestr)[-1]))
[1] 4.105969

8.11

The Expectations Theory of the Term Structure


(Section 8.10)

Data can be read by means of the function read.table, having extracted the
file irates.dat from the compressed archive ch08.zip. The function ts(object,
start, frequency) creates a multiple time series from the columns of a table; in
this case we have a monthly frequency so frequency=12.
> irates <- read.table(unzip("ch08.zip", "Chapter 8/irates.dat"),
header = TRUE)
> irates <- ts(irates, start = c(1946, 12), frequency = 12)

Univariate Time Series Models

359

10

15

r1
r60

1970

1975

1980

1985

1990

Time

Figure 8.47

1 month and 5 year interes rates, 1970:11991:2

The file irates contains monthly interest rates for the United States taken from
McCulloch and Kwon (1993). The series start in December 1946 and ends in February
1991.
All interest rates are expressed in % per year. The variables are coded as:
ri interest rate for a maturity of i months (i = 1, 2, 3, 5, 6, 11, 12, 36, 60, 120).
Verbeek observes that in the text a subsample is used starting in January 1970.
> irates1m <- window(irates[, 1], start = c(1970, 1),
end = c(1991, 2))
To obtain the plot for the 1 month and 5 year interes rates use, as usual, the function
xyplot in the package lattice, see Fig. 8.47.
> library(lattice)
> xyplot(window(irates[, c(1, 9)], start = c(1970,
1), end = c(1991, 2)), superpose = TRUE)

360

Univariate Time Series Models

The OLS estimate of an AR(1) model for the 1 month interest rate can be obtained,
by making use of the function dynlm.
> library(dynlm)
> irates1moutput <- dynlm(irates1m ~ L(irates1m))
> summary(irates1moutput)
Time series regression with "ts" data:
Start = 1970(2), End = 1991(2)
Call:
dynlm(formula = irates1m ~ L(irates1m))
Residuals:
Min
1Q
-4.2955 -0.3207

Median
0.0160

3Q
0.2993

Max
2.9569

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.34902
0.15246
2.289
0.0229 *
L(irates1m) 0.95120
0.01963 48.466
<2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.8215 on 251 degrees of freedom
Multiple R-squared: 0.9035,
Adjusted R-squared: 0.9031
F-statistic: 2349 on 1 and 251 DF, p-value: < 2.2e-16
From which follows the estimate for the mean
> (uncmean <- as.numeric(irates1moutput$coef[1]/(1 irates1moutput$coef[2])))
[1] 7.151415
The sample average results:
> mean(irates1m)
[1] 7.302512
The Dickey-Fuller statistics, in presence of drift, to test for the presence of a unit root
can be obtained by means of the function ur.df available in the package urca.
> library(urca)
> summary(ur.df(irates1m, lags = 0, type = "drift"))
###############################################
# Augmented Dickey-Fuller Test Unit Root Test #
###############################################
Test regression drift

Univariate Time Series Models

361

Call:
lm(formula = z.diff ~ z.lag.1 + 1)
Residuals:
Min
1Q
-4.2955 -0.3207

Median
0.0160

3Q
0.2993

Max
2.9569

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.34902
0.15246
2.289
0.0229 *
z.lag.1
-0.04880
0.01963 -2.487
0.0135 *
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.8215 on 251 degrees of freedom
Multiple R-squared: 0.02404,
Adjusted R-squared: 0.02016
F-statistic: 6.184 on 1 and 251 DF, p-value: 0.01354

Value of test-statistic is: -2.4867 3.1029


Critical values for test statistics:
1pct 5pct 10pct
tau2 -3.44 -2.87 -2.57
phi1 6.47 4.61 3.79
To obtain the augmented Dickey-Fuller tests with one, three and six additional lags
included use:
> library(urca)
> sapply(c(1, 3, 6), function(i) summary(ur.df(irates1m,
type = "drift", lags = i))@teststat[1])
[1] -2.619786 -2.280889 -1.885723
The plot of the autocorrelation function of the residuals can be obtained by using
the function acf2, see Fig. 8.48. Observe the scale on the x axis which follows from
the monthly frequency of the time series: lag 12 corresponds to 1 (year) and lag 6 (6
months) corresponds to a half (0.5 years).
> library(astsa)
> t(acf2(irates1moutput$residuals, 20))
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
ACF 0.06 0.01 -0.11 -0.09 0.00 -0.04 -0.05 0.15 0.03 0.08
PACF 0.06 0.00 -0.11 -0.07 0.01 -0.05 -0.06 0.16 0.01 0.06
[,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20]
ACF
0.02 0.01 -0.06 0.09 -0.07 0.05 0.09 0.09 0.05 -0.16
PACF 0.04 0.03 -0.06 0.13 -0.07 0.04 0.11 0.06 0.02 -0.15

362

Univariate Time Series Models

0.2

0.2

ACF

0.6

1.0

Series: irates1moutput$residuals

0.5

1.0

1.5

1.0

1.5

0.2

0.2

PACF

0.6

1.0

LAG

0.5
LAG

Figure 8.48

Residual autocorrelation function, AR(1) model, 1970:11991:2

Verbeek proposes the following regressions, to study the expectations hypothesis, by


regressing a long interest rate on the short rate:
rnt = 1 + 2 r1t + ut
with n = 3, 12, 60. Let irates3m, irates12m and irates60m be the involved
dependent variables over the proper time window. We have:
> irates3m <- window(irates[, 3], start = c(1970, 1),
end = c(1991, 2))
> irates12m <- window(irates[, 7], start = c(1970,
1), end = c(1991, 2))
> irates60m <- window(irates[, 9], start = c(1970,
1), end = c(1991, 2))
> n3 <- lm(irates3m ~ irates1m)
> n12 <- lm(irates12m ~ irates1m)
> n60 <- lm(irates60m ~ irates1m)
> library(memisc)
> mtable(n3, n12, n60, summary.stats = c("R-squared"))

Univariate Time Series Models

363

Calls:
n3: lm(formula = irates3m ~ irates1m)
n12: lm(formula = irates12m ~ irates1m)
n60: lm(formula = irates60m ~ irates1m)
==========================================
n3
n12
n60
-----------------------------------------(Intercept)
0.321*** 1.292*** 3.352***
(0.066)
(0.128)
(0.217)
irates1m
1.009*** 0.947*** 0.739***
(0.009)
(0.017)
(0.028)
-----------------------------------------R-squared
0.982
0.929
0.735
==========================================
The values for n , see Verbeeks relationship (8.86), with = 0.95 result for the three
series:
> (1 - 0.95^3)/(1 - 0.95)/3
[1] 0.9508333
> (1 - 0.95^12)/(1 - 0.95)/12
[1] 0.7660665
> (1 - 0.95^60)/(1 - 0.95)/60
[1] 0.3179767
Forecasts under the hypothesis of random walk corresponds to the last observation:
> irates1m[length(irates1m)]
[1] 5.677
Using the estimate = 0.95 the 10 and 120 periods ahead forecasts result:
> uncmean + 0.95^10 * (irates1m[length(irates1m)] uncmean)
[1] 6.268628
> uncmean + 0.95^120 * (irates1m[length(irates1m)] uncmean)
[1] 7.148285
being the latter forecast very close to the unconditional mean.

8.12
8.12.1

Autoregressive Conditional Heteroscedasticity


A Brief Presentation of ARCH Processes

When a white noise time series {t } (e.g. the residuals from an ARMA model) shows
volatility clustering it can be modeled by means of an ARCH model.

364

Univariate Time Series Models

The following relationship defines an ARCH process of order p




t = t ht
ht = V ar(t |t ) = 0 + 1 2t1 + 2 2t2 + . . . + p 2tp ,

(8.21)

where 0 > 0, 1 0, . . . , p 0; the stochastic process t is a white noise, usually


a sequence of i.i.d. standard Normal random variables, but sometimes other random
variables are also considered, like the Students t for financial time series to take
into account a leptokurtic behaviour; ht is the conditional variance based on the
information set available at time t.
Let t be the deviations of the squared process 2t from the conditional variance
t = 2t ht = t2 ht ht = (t2 1)ht
t = 2t ht = 2t 0 1 2t1 . . . p 2tp .

(8.22)

We can obtain the autoregressive specification for the squared process by solving the
second relationship for 2t
2t = 0 + 1 2t1 + . . . + p 2tp + t .

(8.23)

We have always to remember that, according to relationship (8.21), the genesis of an


ARCH process is of a multiplicative type; however the autoregressive specification
(8.23) can be used to identify the order of the model and to obtain provisional
estimates of the s parameters which can serve as starting values to apply numerical
procedures related to maximum likelihood.
We remind that the process {t } is weakly stationary if 1 + . . . + p < 1.
The presence of conditional heteroscedasticity can be graphically detected by
examining the autocorrelation function of the squared process and by considering
a test consisting in T times the R2 of the regression of 2t on an intercept and p
lagged values of 2t , which is distributed as 2p .
The order p of an ARCH model can be also determined by examining the behaviour
of the partial autocorrelation function of the squared process. If p is too high a more
parsimonious GARCH(p,q) model can be adopted.
ht = V ar(t |t ) = 0 + 1 2t1 + . . . + p 2tp + 1 ht1 + . . . + q htq
where 0 > 0, i 0 and i 0. The process {t } is weakly stationary if 1 + . . . +
p + 1 + . . . + q < 1.
We can re-express 2t in a fashion similar to (8.22)
t = 2t ht
= 2t 0 1 2t1 . . . p 2tp 1 (2t1 t1 ) . . . q (2tq tq );
we have
2t = 0 + 1 2t1 + . . . + p 2tp + t + 1 (2t1 t1 ) + . . . + q (2tq tq )
and
2t = 0 + 1 2t1 + . . . + p 2tp + 1 2t1 + . . . + q 2tq + t 1 t1 . . . q tq

Univariate Time Series Models

365

Observe how the latter specification for the squared process resembles that of an
ARMA(max(p, q), q), but always remind the multiplicative nature of the process {t }.
Once the model has been estimated we have to check if the standardized residuals

pt
t
h
are white noise and if they follow the distribution assumed for t .
To check the white noise assumption the autocorrelation and partial autocorrelation
functions can be examined and tests both on the autocorrelations of the standardized
residuals and of squared standardized residuals performed.
The distributional assumption can be graphically tested by means of the QQ plot
and by having recourse to the Jarque-Bera and Shapiro-Wilk tests in case a normal
distribution was assumed, see Section 2.8.

8.12.2

A First Example

Identify a proper ARMA model for the time series ats stored in the workspace
exercise.RData, available on the booksite www.educatt.it/libri/materiali.
To load the R data set use the function load. By examining the behaviour of the time
series, see Fig. 8.49, and of its autocorrelation and partial autocorrelation functions,
see Fig. 8.50, one can conclude that an AR(2) model could fit the data:
> load("exercise.RData")
> library(TSA)
> plot(as.ts(ats))
> t(acf2(ats, 20, ma.test = TRUE))
we can also use, for completeness, the function armasubsets in the package TSA to
select the model with the lowest BIC statistic
> library(TSA)
> plot(armasubsets(ats, nar = 5, nma = 5))
> detach("package:TSA")
Figure 8.51 shows that the best model is an AR(2) with a drift presence. It can be
estimated by using the function dynlm.
> library(dynlm)
> regr <- dynlm(ats ~ L(ats, 1) + L(ats, 2))
> summary(regr)
Time series regression with "zoo" data:
Start = 2012-01-10, End = 2013-05-21
Call:
dynlm(formula = ats ~ L(ats, 1) + L(ats, 2))

366

0
15

10

as.ts(ats)

10

15

20

Univariate Time Series Models

100

200

300

400

500

Time

Figure 8.49

Residuals:
Min
1Q
-12.6518 -1.1904

Median
-0.0193

Graph of the ats series

3Q
1.2209

Max
13.6909

Coefficients:
Estimate Std. Error t value
(Intercept) 0.66923
0.14675
4.560
L(ats, 1)
0.51766
0.04423 11.704
L(ats, 2)
0.18209
0.04423
4.117
--Signif. codes: 0 "***" 0.001 "**" 0.01

Pr(>|t|)
6.44e-06 ***
< 2e-16 ***
4.50e-05 ***
"*" 0.05 "." 0.1 " " 1

Residual standard error: 2.669 on 495 degrees of freedom


Multiple R-squared: 0.4203,
Adjusted R-squared: 0.418
F-statistic: 179.5 on 2 and 495 DF, p-value: < 2.2e-16
By having a look at the residual time series, see Fig. 8.52, it is evident the presence
of heteroscedasticity.

Univariate Time Series Models

367

0.2

0.2

ACF

0.6

1.0

Series: ats

10
LAG

15

20

10
LAG

15

20

0.2

0.2

PACF

0.6

1.0

Figure 8.50
series

Autocorrelation function and partial autocorrelation function for the ats

> plot(regr$res)
By checking the behaviour of the autocorrelation function of the squared residuals, see
Fig. 8.53, the presence of autoregressive conditional heteroscedasticity is confirmed
and an ARCH(2) model is suggested.
> acf2(regr$res^2)
Tests for ARCH effects are performed by regressing the squared residuals on a
constant and p lagged squared residuals series.
> (etsumm <- summary(dynlm(regr$res^2 ~ L(I(regr$res^2),
1))))
Time series regression with "zoo" data:
Start = 2012-01-11, End = 2013-05-21
Call:
dynlm(formula = regr$res^2 ~ L(I(regr$res^2), 1))

368

errorlag5

errorlag4

errorlag3

errorlag2

errorlag1

Ylag5

Ylag4

Ylag3

Ylag2

Ylag1

(Intercept)

Univariate Time Series Models

200

200

190

BIC

190

190

180

180

170

Figure 8.51

Residuals:
Min
1Q
-43.154 -5.175

Median
-4.319

BIC comparison for the best models

3Q
Max
-0.740 175.485

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
4.99773
0.86551
5.774 1.37e-08 ***
L(I(regr$res^2), 1) 0.29382
0.04297
6.838 2.37e-11 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 18.06 on 495 degrees of freedom
Multiple R-squared: 0.08631,
Adjusted R-squared:
F-statistic: 46.76 on 1 and 495 DF, p-value: 2.374e-11

0.08446

T times the R2 gives the test statistic which is distributed according to a 2 random
variable with p degrees of freedom.

369

0
10

regr$res

10

Univariate Time Series Models

15400

15500

15600

15700

15800

Index

Figure 8.52

Residuals from the AR model

> etsumm$r.squared * (length(regr$res) - 1)


[1] 42.89536
> qchisq(0.95, 1)
[1] 3.841459
In the package FinTS the function ArchTest is available, which allows an LM test for
the presence of Autoregressive Conditional Heteroscedasticity in the residual series
to be performed:
> library(FinTS)
> ArchTest(regr$res, lags = 1, demean = FALSE)
ARCH LM-test; Null hypothesis: no ARCH effects
data: regr$res
Chi-squared = 42.8954, df = 1, p-value = 5.775e-11
> lags = 1:5
> fun <- function(i) ArchTest(regr$res, lags = i, demean = FALSE)
> sapply(lags, fun)

370

Univariate Time Series Models

ACF
0.0 0.2 0.4 0.6 0.8 1.0

Series: regr$res^2

10

15

20

25

30

20

25

30

PACF
0.0 0.2 0.4 0.6 0.8 1.0

LAG

10

15
LAG

Figure 8.53 Autocorrelation and partial autocorrelation functions of the residuals from
the AR model

statistic
parameter
p.value
method
data.name
statistic
parameter
p.value
method
data.name
statistic
parameter
p.value

[,1]
42.89536
1
5.774747e-11
"ARCH LM-test;
"regr$res"
[,2]
56.07503
2
6.660228e-13
"ARCH LM-test;
"regr$res"
[,3]
55.92459
3
4.359513e-12

Null hypothesis:

no ARCH effects"

Null hypothesis:

no ARCH effects"

Univariate Time Series Models

method
"ARCH LM-test;
data.name "regr$res"
[,4]
statistic 56.12527
parameter 4
p.value
1.887501e-11
method
"ARCH LM-test;
data.name "regr$res"
[,5]
statistic 56.11349
parameter 5
p.value
7.700796e-11
method
"ARCH LM-test;
data.name "regr$res"

Null hypothesis:

no ARCH effects"

Null hypothesis:

no ARCH effects"

Null hypothesis:

no ARCH effects"

371

It is possible to estimate jointly the parameters in the ARMA-ARCH specification,


by having recourse to the function garchFit in the package fGarch, and considering
different model specifications.
The main arguments13 of the function garchFit are:

formula a formula object describing the mean and variance equation of the
ARMA-GARCH/APARCH model. A pure GARCH(1,1) model is selected
when e.g. formula=~garch(1,1). To specify for example an ARMA(2,1)APARCH(1,1) use formula = ~arma(2,1)+apaarch(1,1).

data an optional timeSeries or data frame object containing the variables in the
model. If not found in data, the variables are taken from environment(formula),
typically the environment from which armaFit is called. If data is an univariate
series, then the series is converted into a numeric vector and the name of the
response in the formula will be neglected.

skew a numeric value, the skewness parameter of the conditional distribution.

shape a numeric value, the shape parameter of the conditional distribution.

cond.dist a character string naming the desired conditional distribution. Valid


values are "dnorm", "dged", "dstd", "dsnorm", "dsged", "dsstd" and "QMLE".
The default value is the normal distribution.

include.mean this flag determines if the parameter for the mean will be
estimated or not. If include.mean=TRUE this will be the case, otherwise the
parameter will be kept fixed during the process of parameter optimization.

include.shape logical flag which determines if the parameter for the shape of
the conditional distribution will be estimated or not. If include.shape=FALSE
then the shape parameter will be kept fixed during the process of parameter
optimization.

trace a logical flag. Should the optimization process of fitting the model
parameters be printed? By default trace=TRUE.

13 From

the R-help system. Have a look at ?fGarch::garchFit for more information.

372

Univariate Time Series Models

APARCH models and skew distributions for the errors are also implemented.
We start by considering the larger model Yt AR(4) and t ARCH(4)
> garchFit(formula = ~arma(4, 0) + garch(4, 0), data = ats,
trace = FALSE)
Title:
GARCH Modelling
Call:
garchFit(formula = ~arma(4, 0) + garch(4, 0), data = ats,
trace = FALSE)
Mean and Variance Equation:
data ~ arma(4, 0) + garch(4, 0)
<environment: 0x00000000144b8728>
[data = ats]
Conditional Distribution:
norm
Coefficient(s):
mu
ar1
0.60499579
0.51475833
omega
alpha1
1.79948380
0.78609851

ar2
0.18565747
alpha2
0.01896518

ar3
-0.01889278
alpha3
0.00000001

ar4
0.01062030
alpha4
0.00000001

Std. Errors:
based on Hessian
Error Analysis:
Estimate Std. Error t value Pr(>|t|)
mu
6.050e-01
8.374e-02
7.225 5.02e-13 ***
ar1
5.148e-01
4.313e-02
11.934 < 2e-16 ***
ar2
1.857e-01
3.694e-02
5.026 5.01e-07 ***
ar3
-1.889e-02
3.484e-02
-0.542
0.588
ar4
1.062e-02
3.047e-02
0.349
0.727
omega
1.799e+00
2.198e-01
8.188 2.22e-16 ***
alpha1 7.861e-01
1.071e-01
7.338 2.16e-13 ***
alpha2 1.897e-02
3.607e-02
0.526
0.599
alpha3 1.000e-08
4.095e-02
0.000
1.000
alpha4 1.000e-08
NA
NA
NA
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Log Likelihood:
-1079.044
normalized:

-2.158088

Univariate Time Series Models

373

Description:
Fri May 24 17:11:13 2013 by user: gabriele.cantaluppi
From the preceding results we observe that the coeffients pertaining the 3rd and 4th
autoregressive lags as well as 3 and 4 are not significantly different from 0, so we
can proceed estimating the model Yt AR(2) and t ARCH(1)
By using the function summary applied to the object resulting from garchFit
diagnostic results pertaining standardized residuals ht are also produced
t

> archmodel <- garchFit(formula = ~arma(2, 0) + garch(1,


0), data = ats, trace = FALSE)
> summary(archmodel)
Title:
GARCH Modelling
Call:
garchFit(formula = ~arma(2, 0) + garch(1, 0), data = ats,
trace = FALSE)
Mean and Variance Equation:
data ~ arma(2, 0) + garch(1, 0)
<environment: 0x0000000015507c70>
[data = ats]
Conditional Distribution:
norm
Coefficient(s):
mu
ar1
0.60017 0.52472

ar2
0.17563

omega
1.78951

alpha1
0.82856

Std. Errors:
based on Hessian
Error Analysis:
Estimate Std. Error t value Pr(>|t|)
mu
0.60017
0.08381
7.161 8.00e-13 ***
ar1
0.52472
0.03971
13.215 < 2e-16 ***
ar2
0.17563
0.03493
5.029 4.94e-07 ***
omega
1.78951
0.20572
8.699 < 2e-16 ***
alpha1
0.82856
0.10820
7.658 1.89e-14 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Log Likelihood:
-1077.762
normalized:

-2.155525

374

Univariate Time Series Models

Description:
Fri May 24 17:11:13 2013 by user: gabriele.cantaluppi

Standardised Residuals Tests:


Jarque-Bera Test
Shapiro-Wilk Test
Ljung-Box Test
Ljung-Box Test
Ljung-Box Test
Ljung-Box Test
Ljung-Box Test
Ljung-Box Test
LM Arch Test

R
R
R
R
R
R^2
R^2
R^2
R

Chi^2
W
Q(10)
Q(15)
Q(20)
Q(10)
Q(15)
Q(20)
TR^2

Statistic
1.325538
0.9976004
3.376161
8.124189
11.41074
6.433391
7.772612
14.90346
7.920425

p-Value
0.5154222
0.6966819
0.9711369
0.9187073
0.9348674
0.777633
0.9325749
0.7819045
0.7913176

Information Criterion Statistics:


AIC
BIC
SIC
HQIC
4.331050 4.373196 4.330852 4.347588
To assess if the standardized residuals are white noise the Ljung-Box tests up to order
10, 15, 20 are reported both for standardized residuals and their squared values. There
is no evidence of autocorrelation nor of conditional heteroscedasticity presence.
To assess the normality distributional assumption the Jarque-Bera statistic and
the Shapiro-Wilk test are reported; both tests do not reject the null hypothesis of
normality.
The function plot applied to an fGARCH object deriving from a garchFit estimate,
produces the graphs in Fig. 8.54 and 8.55.
> layout(matrix(1:8, 4, 2))
> plot(archmodel, which = 1:8)
> layout(matrix(1:6, 3, 2))
> plot(archmodel, which = 9:13)
In Fig. 8.54 are reported, by column:
1. the graphical representation of the time series yt , in the present case ats

2. the graphical representation of the conditional standard deviations ht


3. the graphical representation
of the time series yt with the 2 conditional standard

deviations 2 ht superposed
4. the autocorrelation function of the time series yt
5. the autocorrelation function of the squared series 2t
6. the cross correlation function between yt2 and yt
7. the graphical representation of the residuals t

Univariate Time Series Models

375

ACF of Squared Observations

200

300

400

500

10

15

20

Conditional SD

Cross Correlation

ACF

25

0.3

Lags

0.1
0

100

200

300

400

500

20

10

10

Lags

Series with 2 Conditional SD Superimposed

Residuals

20

res
100

200

300

400

500

100

200

300
Index

ACF of Observations

Conditional SD's
xcsd

500

400

500

0.0

400

6 10

Index

0.6

10 0

10

20

Index

15

Index

100

6 10

ACF

0.6

ACF

0.0

5
15

20

Time Series

10

15

20

25

100

200

Lags

300
Index

Figure 8.54

fGARCH plots

8. the graphical representation of the conditional standard deviations

ht

while in Fig. 8.55 are reported


9. the graphical representation of the standardized residuals
10. the autocorrelation function of the standardized residuals

t
ht
t
ht

11. the partial autocorrelation function of the standardized residuals


12. the cross correlation function between

2t
ht

and

13. the QQ plot for the standardized residuals

t
ht

t
ht

t
ht

A single graph can be extracted by using plot(archmodel,which=number), where


number is in the set {1, . . . , 13}.
To obtain forecasts the function predict can be used
> predict(archmodel, 4)
meanForecast meanError standardDeviation
1
2.691235 1.381014
1.381014

376

Univariate Time Series Models

Cross Correlation
0.15

0.05

0.05

ACF

0 1 2
2

sres

Standardized Residuals

100

200

300

400

500

10

qnorm QQ Plot

0.8

0 1 2

ACF of Standardized Residuals

0.4

4
0

10

Lags

0.0

ACF

20

Index

Sample Quantiles

10

15

20

25

20

Lags

Theoretical Quantiles

ACF

0.0

0.4

0.8

ACF of Squared Standardized Residuals

10

15

20

25

Lags

Figure 8.55

2
3
4

2.544390
2.407925
2.310529

1.973537
2.428413
2.763893

fGARCH plots

1.835684
2.140452
2.363383

Simulation of a GARCH process


The following code was used to generate the ats series.
> library(fGarch)
> set.seed(123456)
> spec = garchSpec(model = list(mu = 0.5, ar = c(0.5,
0.2), omega = 2, alpha = c(0.75), beta = 0))
> ats <- garchSim(spec, n = 500)
The reader can check the estimation result in presence of longer time series.
> ats <- garchSim(spec, n = 10000)
> summary(garchFit(formula = ~arma(2, 0) + garch(1, 0),
data = ats, trace = FALSE))

Univariate Time Series Models

Title:
GARCH Modelling
Call:
garchFit(formula = ~arma(2, 0) + garch(1, 0), data = ats,
trace=FALSE)
Mean and Variance Equation:
data ~ arma(2, 0) + garch(1, 0)
<environment: 0x0000000019fff210>
[data = ats]
Conditional Distribution:
norm
Coefficient(s):
mu
ar1
0.53781 0.49843

ar2
0.19361

omega
1.94839

alpha1
0.77242

Std. Errors:
based on Hessian
Error Analysis:
Estimate Std. Error t value Pr(>|t|)
mu
0.537809
0.018706
28.75
<2e-16 ***
ar1
0.498429
0.008842
56.37
<2e-16 ***
ar2
0.193607
0.007809
24.79
<2e-16 ***
omega
1.948387
0.047459
41.05
<2e-16 ***
alpha1 0.772418
0.023185
33.32
<2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Log Likelihood:
-21554.46
normalized:

-2.155446

Description:
Fri May 24 17:11:16 2013 by user: gabriele.cantaluppi

Standardised Residuals Tests:


Jarque-Bera Test
Shapiro-Wilk Test
Ljung-Box Test
Ljung-Box Test
Ljung-Box Test

R
R
R
R
R

Chi^2
W
Q(10)
Q(15)
Q(20)

Statistic
4.09366
NA
12.31859
15.11452
22.38218

p-Value
0.1291436
NA
0.2643003
0.4431998
0.3201377

377

378

Univariate Time Series Models

Ljung-Box Test
Ljung-Box Test
Ljung-Box Test
LM Arch Test

R^2
R^2
R^2
R

Q(10)
Q(15)
Q(20)
TR^2

17.56984
21.53238
22.21691
19.71963

0.06266804
0.1206646
0.3288577
0.07257902

Information Criterion Statistics:


AIC
BIC
SIC
HQIC
4.311891 4.315496 4.311891 4.313112

8.13

Volatility in Daily Exchange Rates (Section 8.11.3)

Data can be read by means of the function read.table, having extracted the file
usd.dat from the compressed archive ch08.zip.
> crates <- read.table(unzip("ch08.zip", "Chapter 8/usd.dat"),
header = TRUE)
The file usd contains daily exchange rate changes from 5 January 1999 to 28 February
2011 (T=3108), without gaps.
The following variables are available:

dlogusd100 change in log exchange rate dollar/euro (x 100)

dayoftheweek 1=monday, 2=tuesday, etc.

monday dummy for monday

tuesday dummy for tuesday

wednesday dummy for wednesday

thursday dummy for thursday

Since data are irregular (5 days in a week) it is preferable to create an undated time
series. The differenced US$/Euro exchange rate series is analyzed.
> yt <- as.ts(crates[, 3])
The graphical representation can be obtained as:
> library(lattice)
> xyplot(yt)
By regressing the series yt on a constant the residuals et are obtained that can
be modelled by means of an ARCH specification if conditional heteroscedasticity is
present.
> library(dynlm)
> et <- dynlm(yt ~ 1)$res
Tests for ARCH effects can be obtained by means of the function ArchTest in the
package FinTS. Verbeek considers 1 and 6 lags.

379

Univariate Time Series Models

1000

2000

3000

Time

Figure 8.56

Daily change in log exchange rate US$/DM, 2 January 1980-21 May 1987

> library(FinTS)
> ArchTest(et, lags = 1, demean = FALSE)
ARCH LM-test; Null hypothesis: no ARCH effects
data: et
Chi-squared = 136.3154, df = 1, p-value < 2.2e-16
> ArchTest(et, lags = 6, demean = FALSE)
ARCH LM-test; Null hypothesis: no ARCH effects
data: et
Chi-squared = 208.243, df = 6, p-value < 2.2e-16
Both tests reject the hypothesis of homoscedasticity.
The following four models are estimated: an ARCH(6), a GARCH(1,1), an
EGARCH(1,1) and a GARCH(1,1) model with t-distributed errors. The parameter
estimates of the first two and the fourth models can be obtained by using the function
garchFit available in the package fGarch.

380

Univariate Time Series Models

> library(fGarch)
> arch6 <- garchFit(formula = ~garch(6, 0), data = as.vector(et),
include.mean = FALSE, trace = FALSE)
> summary(arch6)
Title:
GARCH Modelling
Call:
garchFit(formula = ~garch(6, 0), data = as.vector(et),
include.mean = FALSE, trace = FALSE)
Mean and Variance Equation:
data ~ garch(6, 0)
<environment: 0x0000000016613a18>
[data = as.vector(et)]
Conditional Distribution:
norm
Coefficient(s):
omega
alpha1
0.237534 0.072566
alpha6
0.078275

alpha2
0.026908

alpha3
0.084783

alpha4
0.112259

alpha5
0.095242

Std. Errors:
based on Hessian
Error Analysis:
Estimate Std. Error t value Pr(>|t|)
omega
0.23753
0.01524
15.590 < 2e-16 ***
alpha1
0.07257
0.02194
3.308 0.000939 ***
alpha2
0.02691
0.02074
1.298 0.194433
alpha3
0.08478
0.02270
3.736 0.000187 ***
alpha4
0.11226
0.02451
4.579 4.67e-06 ***
alpha5
0.09524
0.02284
4.169 3.05e-05 ***
alpha6
0.07827
0.01945
4.025 5.69e-05 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Log Likelihood:
-3045.473
normalized:

-0.979882

Description:
Fri May 24 17:11:19 2013 by user: gabriele.cantaluppi

Univariate Time Series Models

381

Standardised Residuals Tests:


Jarque-Bera Test
Shapiro-Wilk Test
Ljung-Box Test
Ljung-Box Test
Ljung-Box Test
Ljung-Box Test
Ljung-Box Test
Ljung-Box Test
LM Arch Test

R
R
R
R
R
R^2
R^2
R^2
R

Chi^2
W
Q(10)
Q(15)
Q(20)
Q(10)
Q(15)
Q(20)
TR^2

Statistic
203.2454
0.992023
5.577922
18.3471
21.25551
9.355185
23.09638
43.9503
11.90132

p-Value
0
4.006602e-12
0.8493912
0.2448572
0.382233
0.4987593
0.08211484
0.001528166
0.4536374

Information Criterion Statistics:


AIC
BIC
SIC
HQIC
1.964269 1.977876 1.964258 1.969154
The estimate of the parameter 2 is not significantly different from 0, but we have
to observe that the current version of the function garchFit does not allow the
estimation of non complete models.
The function plot applied to an fGARCH object deriving from a garchFit estimate,
produces the graphs in Fig. 8.57 and 8.58.
> layout(matrix(1:8, 4, 2))
> plot(arch6, which = 1:8)
> layout(matrix(1:6, 3, 2))
> plot(arch6, which = 9:13)
Tests for the autocorrelation of standardized residuals do no reject the null of white
noise, while the tests for autocorrelation of squared residuals give evidence of the
presence of conditional heteroscedasticity up to lag 20; this behaviour is not so evident
from the analysis of the autocorrelation function, see Fig. 8.58.
The Jarque-Bera and Shapiro-Wilk statistics reject the null of a normal distribution
for the standardized residuals. The QQ plot gives evidence that the tails are fatter
than those of a normal distribution, see Fig. 8.59.
> plot(arch6, which = 13)
The function predict can be used to obtain forecasts
> predict(arch6, 4)
meanForecast meanError standardDeviation
1
0 0.5944364
0.5944364
2
0 0.5804800
0.5804800
3
0 0.5721766
0.5721766
4
0 0.5828789
0.5828789
The code for estimating a GARCH(1,1) model is:

382

Univariate Time Series Models

ACF of Squared Observations

0.0
1000

1500

2000

2500

3000

500

15

20

25

Cross Correlation

1000

1500

2000

2500

3000

30

20

10

10

35

Lags

Series with 2 Conditional SD Superimposed

Residuals

20

30

0
4

res

Index

30

0.06 0.04

Conditional SD

500

1000

1500

2000

2500

3000

500

1000

1500

2000

Index

Index

ACF of Observations

Conditional SD's

2500

3000

2500

3000

0.5

0.0

xcsd

0.6

1.5

10

Lags

ACF
0

Index

0.5

500

1.5

ACF

0.6

ACF

0
4

Time Series

10

15

20

25

30

35

500

Lags

1000

1500

2000

Index

Figure 8.57

fGARCH plots

> garch1_1 <- garchFit(formula = ~garch(1, 1), data = as.vector(et),


include.mean = FALSE, trace = FALSE)
> summary(garch1_1)
Title:
GARCH Modelling
Call:
garchFit(formula = ~garch(1, 1), data = as.vector(et),
include.mean = FALSE, trace = FALSE)
Mean and Variance Equation:
data ~ garch(1, 1)
<environment: 0x00000000170f00c8>
[data = as.vector(et)]
Conditional Distribution:
norm

Univariate Time Series Models

Cross Correlation

ACF

0.00
0.06

0
4

sres

0.06

Standardized Residuals

500

1000

1500

2000

2500

3000

30

20

10

10

Index

Lags

ACF of Standardized Residuals

qnorm QQ Plot

20

30

ACF

0.0

0.4

0.8

Sample Quantiles

10

15

20

25

30

35

Lags

Theoretical Quantiles

0.0

ACF

0.4

0.8

ACF of Squared Standardized Residuals

10

15

20

25

30

35

Lags

Figure 8.58

Coefficient(s):
omega
alpha1
0.0016224 0.0308557

fGARCH plots

beta1
0.9658062

Std. Errors:
based on Hessian
Error Analysis:
Estimate Std. Error t value Pr(>|t|)
omega 0.0016224
0.0006968
2.328
0.0199 *
alpha1 0.0308557
0.0041378
7.457 8.86e-14 ***
beta1 0.9658062
0.0044704 216.047 < 2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Log Likelihood:
-2978.583
normalized:

-0.9583602

383

384

Univariate Time Series Models

qnorm QQ Plot

2
0
2

Sample Quantiles

Theoretical Quantiles

Figure 8.59 QQ plot for the standardardized residuals from a GARCH(1,1) model under
the normality assumption

Description:
Fri May 24 17:11:19 2013 by user: gabriele.cantaluppi

Standardised Residuals Tests:


Jarque-Bera Test
Shapiro-Wilk Test
Ljung-Box Test
Ljung-Box Test
Ljung-Box Test
Ljung-Box Test
Ljung-Box Test
Ljung-Box Test
LM Arch Test

R
R
R
R
R
R^2
R^2
R^2
R

Chi^2
W
Q(10)
Q(15)
Q(20)
Q(10)
Q(15)
Q(20)
TR^2

Statistic
135.6136
0.993768
4.401897
15.36537
18.42054
3.850639
6.46133
11.21603
4.583747

p-Value
0
2.814197e-10
0.9274011
0.4254333
0.5597264
0.9538334
0.970914
0.9404265
0.9704592

Univariate Time Series Models

385

qnorm QQ Plot

0
2

Sample Quantiles

Theoretical Quantiles

Figure 8.60 QQ plot for the standardardized residuals from a GARCH(1,1) model under
the normality assumption

Information Criterion Statistics:


AIC
BIC
SIC
HQIC
1.918651 1.924483 1.918649 1.920745
Also in this case the tests for the autocorrelation of standardized residuals do no
reject the null of white noise, and tests for autocorrelation of squared residuals do not
give evidence of the presence of conditional heteroscedasticity.
The Jarque-Bera and Shapiro-Wilk statistics reject the null of a normal distribution
for the standardized residuals and the QQ plot gives evidence that the tails are fatter
than those of a normal distribution, see Fig. 8.60.
> plot(garch1_1, which = 13)
The code for estimating a GARCH(1,1) model with t distributed errors is:
> garch1_1t <- garchFit(formula = ~garch(1, 1), data = as.vector(et),
cond.dist = "std", include.mean = FALSE, trace = FALSE)
> summary(garch1_1t)

386

Univariate Time Series Models

Title:
GARCH Modelling
Call:
garchFit(formula = ~garch(1, 1), data=as.vector(et), cond.dist =
"std", include.mean = FALSE, trace = FALSE)
Mean and Variance Equation:
data ~ garch(1, 1)
<environment: 0x00000000179a6cc0>
[data = as.vector(et)]
Conditional Distribution:
std
Coefficient(s):
omega
alpha1
0.0018883
0.0313289

beta1
0.9648050

shape
10.0000000

Std. Errors:
based on Hessian
Error Analysis:
Estimate Std. Error t value Pr(>|t|)
omega 1.888e-03
8.235e-04
2.293
0.0218 *
alpha1 3.133e-02
4.817e-03
6.504 7.84e-11 ***
beta1 9.648e-01
5.246e-03 183.899 < 2e-16 ***
shape 1.000e+01
1.497e+00
6.681 2.38e-11 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Log Likelihood:
-2952.596
normalized:

-0.9499987

Description:
Fri May 24 17:11:20 2013 by user: gabriele.cantaluppi

Standardised Residuals Tests:


Jarque-Bera Test
Shapiro-Wilk Test
Ljung-Box Test
Ljung-Box Test
Ljung-Box Test
Ljung-Box Test

R
R
R
R
R
R^2

Chi^2
W
Q(10)
Q(15)
Q(20)
Q(10)

Statistic
136.7413
0.9937392
4.436802
15.43505
18.49223
3.906787

p-Value
0
2.608938e-10
0.9254986
0.4205569
0.5550169
0.951454

Univariate Time Series Models

387

qstd QQ Plot

0
2

Sample Quantiles

Theoretical Quantiles

Figure 8.61 QQ plot for the standardardized residuals from a GARCH(1,1) model with
t distributed errors

Ljung-Box Test
Ljung-Box Test
LM Arch Test

R^2
R^2
R

Q(15)
Q(20)
TR^2

6.532642
11.22761
4.622786

0.9693467
0.9401049
0.9694076

Information Criterion Statistics:


AIC
BIC
SIC
HQIC
1.902571 1.910347 1.902568 1.905363
Also in this case the tests for the autocorrelation of standardized residuals do no
reject the null of white noise, and tests for autocorrelation of squared residuals do not
give evidence of the presence of conditional heteroscedasticity.
The Jarque-Bera and Shapiro-Wilk statistics reject the null of a normal distribution
for the standardized residuals and the QQ plot shows how the t distribution can better
capture the behaviour on the tails of standardized residuals, see Fig. 8.61.
> plot(garch1_1t, which = 13)
Also the Information Criterion Statistics give evidence of a better fit.

388

Univariate Time Series Models

To variance specification of an EGARCH(1,1) process is


)
(
|t1 |
t1
+ p
ht = exp + log(ht1 ) + p
ht1
ht1
to estimate an EGARCH(1,1) model we have to construct the likelihood function
> "Exponential GARCH likelihood"
[1] "Exponential GARCH likelihood"
> T <- length(et)
> egarchllik <- function(par) {
omega <- par[1]
beta <- par[2]
gamma <- par[3]
alpha <- par[4]
sigma2 <- rep(var(et), T)
for (i in 2:T) {
sigma2[i] <- exp(omega + beta * log(sigma2[i 1]) + gamma * et[i - 1]/sigma2[i - 1]^0.5 +
alpha * abs(et[i - 1])/sigma2[i - 1]^0.5)
}
llik <- -sum(dnorm(et, mean = mean(et), sd = sigma2^0.5,
log = TRUE))
llik
}
and use a numerical procedure to obtain estimates and their standard errors
> out <- optim(c(0, 0.5, 0, 0.5), egarchllik,
control = list(reltol = 1e-15), hessian = TRUE)
which can be extracted from the object out as
> out$par
[1] -0.062172568 0.994893276 -0.008454532 0.074520295
> diag(solve(out$hessian))^0.5
[1] 0.007948912 0.001939568 0.005554193 0.009590379
To extract conditional standard deviations use
>
>
>
>
>
>

omegahat <- out$par[1]


betahat <- out$par[2]
gammahat <- out$par[3]
alphahat <- out$par[4]
sigma2hat <- rep(var(et), T)
for (i in 2:T) sigma2hat[i] <- exp(omegahat + betahat *
log(sigma2hat[i - 1]) + gammahat * et[i - 1]/sigma2hat[i 1]^0.5 + alphahat * abs(et[i - 1])/sigma2hat[i 1]^0.5)

Univariate Time Series Models

389

Figure 8.62 shows the conditional standard deviations implied by the EGARCH model
and the QQ plot for checking the normality distribution of the standardized residuals.
> xyplot(as.ts(sigma2hat^0.5))
> stdres <- et/sigma2hat^0.5
> qqnorm(stdres)
The function egarch in the package egarch is also available for estimating the
parameters of an EGARCH(1,1) model, but it refers to the following EGARCH model
specification:

#)
(
"




t1
t1
t1
2
2
+ 1 p
log(t ) = 0 + 1 log(t1 ) + 1 p
E p

ht1
ht1
ht1
different from that proposed by Verbeek, see ?egarch::egarch.

390

0.4

0.6

0.8

1.0

1.2

Univariate Time Series Models

1000

2000

3000

Time

Normal QQ Plot

0
2
4

Sample Quantiles

Theoretical Quantiles

9
Multivariate Time Series Models
9.1

Spurious Regression (Section 9.2.1)

Data on two stochastic processes Xt and Yt , generated by two independent random


walks, see Verbeeks relationships (9.15) and (9.16):
Yt = Yt1 + 1t ,

1t IID(0, 12 )

Xt = Xt1 + 2t ,

2t IID(0, 22 )

where 1t and 2t mutually independent


are available in the data set spurious.dat. This can be read by means of the function
read.table, having extracted the file spurious.dat from the compressed archive
ch09.zip.
> spurreg <- read.table(unzip("ch09.zip", "Chapter 9/spurious.dat"),
header = TRUE)
The regression model:
Yt = + Xt + t
is estimated to show the spurious regression results. The estimates in Verbeeks Table
9.1 can be obtained by means of the following code, having computed the DurbinWatson statistic by using the function dwtest in the package lmtest, see Section
4.2.2.
> tab9.1 <- lm(Y ~ X, data = spurreg)
> summary(tab9.1)
Call:
lm(formula = Y ~ X, data = spurreg)
Residuals:
Min
1Q Median
-6.4625 -2.5223 -0.0414

3Q
2.1179

Max
8.2879

Coefficients:
Estimate Std. Error t value Pr(>|t|)

392

Multivariate Time Series Models

(Intercept) 3.90971
0.24618
15.88
<2e-16 ***
X
-0.44348
0.04733
-9.37
<2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 3.27 on 198 degrees of freedom
Multiple R-squared: 0.3072,
Adjusted R-squared:
F-statistic: 87.8 on 1 and 198 DF, p-value: < 2.2e-16
> library(lmtest)
> dwtest(tab9.1)
Durbin-Watson test

0.3037

data: tab9.1
DW = 0.1331, p-value < 2.2e-16
alternative hypothesis: true autocorrelation is greater than 0
As Verbeek remarks, the usual t and F tests may be misleading in situations like the
present one, where we can observe a quite reasonable R-squared and a low DurbinWatson statistic1 , which gives information about the possible presence of a unit
root in the residuals giving also evidence about the non-existence of a cointegrating
relationship between {Xt } and {Yt } as would appear by considering the spurious
relationship. He suggests to include lagged values of both the dependent and the
independent variables in the regression to avoid the spurious regression problem.
Lets perform 1000 Monte Carlo replications of the above spurious regression
experiment to show the large variability level that characterizes parameter estimates,
since there is no actual presence of a relationship between {Xt } and {Yt }.
> set.seed(12345)
> library(lmtest)
> sim <- function(n = 200) {
X <- c(0, cumsum(rnorm(n - 1)))
Y <- c(0, cumsum(rnorm(n - 1)))
res <- lm(Y ~ X)
res1 <- list(int = res$coef[1], slope = res$coef[2],
int.pv=summary(res)$coeff[1,4],slope.pv=summary(res)$coeff[2,
4], r2 = summary(res)$r.squared, dw = dwtest(res)$statistic)
}
> a <- replicate(1000, sim(n = 200))
The function sim generates the data for the present experiment2 ; it returns the
list res1 with elements: the intercept and the slope estimates, the corresponing tstatistics, the multiple R-squared and the Durbin-Watson statistic.
1 We remind that dw 2 2, so approximately 0 dw 4, being dw = 0 when = 1 that is in
presence of a positive unit root, dw = 2 when = 0, and dw = 4 when = 1 that is in presence of
a negative unit root.
2 Observe that to generate a random walk the following for loop can also be used:
X<-rep(0,n)
for (i in 2:n) X[i]<-X[i-1]+rnorm(1)
but it is less efficient and slower; check the two versions of the code for n=1000000.

Multivariate Time Series Models

393

The function replicate (a shortcut for sapply) returns the matrix a, which
contains the results for the 1000 replications of the experiment; in the rows of a are the
values of the intercept, the slope, their significances, the R2 and the Durbin-Watson
statistic, obtained for each replication of the experiment.
By computing the following summary statistics we can observe that 89.5%
replications present a significant3 estimate for the intercept, 82.5% a significant
estimate for the slope, 73.8% significant estimates for both the intercept and the slope,
33.7% a multiple R-squared larger than 0.3 and 98.8% a value for the Durbin-Watson
statistic lower than 0.25.
> sum((a[3, ] < 0.05))/1000
[1] 0.895
> sum((a[4, ] < 0.05))/1000
[1] 0.825
> sum((a[3, ] < 0.05) * (a[4, ] < 0.05))/1000
[1] 0.738
> sum(a[5, ] > 0.3)/1000
[1] 0.337
> sum(a[6, ] < 0.25)/1000
[1] 0.988
Summary statistics for the intercept and slope give evidence that their estimates are
quite different from simulation to simulation, see also Fig. 9.1, giving evidence of the
presence of spurious relationships. Namely {Xt } and {Yt } do not have any reciprocal
relationships.
> summary(as.numeric(a[1, ]))
Min. 1st Qu.
Median
Mean 3rd Qu.
Max.
-21.1800 -4.5880 -0.2781 -0.2166
3.7530 27.2800
> summary(as.numeric(a[2, ]))
Min.
1st Qu.
Median
Mean
3rd Qu.
-2.5020000 -0.3608000 0.0091750 0.0004262 0.3686000
Max.
2.7430000
> layout(matrix(1:2, 1, 2))
> hist(as.numeric(a[1, ]), freq = FALSE, main = "",
xlab = "intercept")
> hist(as.numeric(a[2, ]), freq = FALSE, main = "",
xlab = "slope")

9.2

Long-run Purchasing Power Parity (Part 2) (Section 9.3)

The analyisis of Long-run Purchasing Power Parity, started in Section 8.9 (Verbeeks
Section 8.5), is continued. To import the data from the file ppp2.wf1, which is a work
3 At

the 5% level.

394

0.0

0.00

0.1

0.01

0.2

0.3

Density

0.03
0.02

Density

0.4

0.04

0.5

0.05

0.6

Multivariate Time Series Models

20

10

intercept

Figure 9.1

20

30

3 2 1

slope

Histograms for the intercept and slope estimates

file of EViews, invoke first the package hexView and next the command readEViews.
> library(hexView)
> ppp <- readEViews(unzip("ch08.zip", "Chapter 8/ppp2.wf1"))
Skipping boilerplate variable
Skipping boilerplate variable
Verbeek observes that the relationship st = pt pt , where st , pt and pt are
respectively the log of the spot exchange rate, the log of domestic prices and that
of foreign prices, may be interpreted as an equilibrium (long-run) or cointegrating
relationship.
In the example, observations for Euro area and the UK from January 1988 until
December 2010 are considered.
See Section 8.9 for the analysis to detect the non-stationarity of the real exchange
rate rst = st pt + pt .
Verbeek suggests to test if the cointegrating relationship involving the log exchange
rate st and the log of price ratio pt pt can be established. The results in Section

Multivariate Time Series Models

395

8.9 suggested st as I(1).


Augmented Dickey Fuller tests can be performed to establish if also the ratio pt pt
is I(1), by having recourse to the function ur.df available in the package urca. So
Verbeeks Table 9.4 can be reproduced with the following code4 .
>
>
>
>
>

library(urca)
x <- ppp$LOGCPIEURO - ppp$LOGCPIUK
a <- c(0:6, 12, 24, 36)
names(a) <- c("DF", paste("ADF(", a[-1], ")", sep = ""))
f <- function(k) {
nt <- summary(ur.df(x, type = "drift", lags = k))@teststat[1]
wt <- summary(ur.df(x, type = "trend", lags = k))@teststat[1]
return(c(nt, wt))
}
> out <- sapply(a, f)
> rownames(out) <- c("Without trend", "With trend")
> print(t(round(out, 3)))
Without trend With trend
DF
-2.487
-2.564
ADF(1)
-2.533
-2.622
ADF(2)
-2.518
-2.639
ADF(3)
-2.137
-2.288
ADF(4)
-2.070
-2.229
ADF(5)
-2.037
-2.213
ADF(6)
-2.103
-2.227
ADF(12)
-2.989
-3.041
ADF(24)
-3.131
-3.424
ADF(36)
-2.027
-1.975
Remembering that the 5% critical values for the Dickey Fuller statistics are 2.88,
3.43 respectively for the situations with only a drift and with both a drift and
the trend, the hypothesis of non-stationarity cannot be rejected. The ADF(24) is
marginally significant.
The parameters in the cointegrating regression (see Verbeeks Table 9.5)
st = + (pt pt ) + t
can be estimated by having, as usual, recourse to the function lm.
> a <- lm(log(FXEP) ~ I(LOGCPIEURO - LOGCPIUK), data = ppp)
> summary(a)
Call:
lm(formula = log(FXEP) ~ I(LOGCPIEURO - LOGCPIUK), data = ppp)
Residuals:
4 As

Verbeek observes in Sections 8.5, 9.3 and 9.5.4, the exchange rates are accidentally not
converted to logs when producing the results. Here they are converted to logs for obtaining results.

396

Multivariate Time Series Models

Min
1Q
-0.24561 -0.08619

Median
0.01368

3Q
0.06477

Max
0.22179

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
0.38246
0.01806 21.172 < 2e-16 ***
I(LOGCPIEURO - LOGCPIUK) 1.01657
0.28125
3.614 0.000358 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.1045 on 274 degrees of freedom
Multiple R-squared: 0.04551,
Adjusted R-squared: 0.04203
F-statistic: 13.06 on 1 and 274 DF, p-value: 0.0003581
Tests on the residuals for establishing the possible presence of nonstationarity can
also be performed.
The cointegrating regression Durbin-Watson (CRDW) statistic can be obtained with
the function dwtest available in the package lmtest; pay attention to consider only
the value of the statistic and not its p-value, which is computed against the null of
no autocorrelation presence and not for no cointegration. See Verbeeks Table 9.3 for
the 5% critical values for the CRDW test for no cointegration.
> library(lmtest)
> dwtest(a)$stat
DW
0.04120717
The Augmented Dickey Fuller cointegration test is also performed on the residuals,
see Verbeeks Table 9.6. We can have recourse again to the function f constructed
above, by considering only the first row of the resulting output, which is referred to
the situation with the presence of only the drift.
>
>
>
>

x <- a$residuals
a <- 0:6
out <- sapply(a, f)
print(t(round(out[1, ], 3)))
[,1]
[,2]
[,3]
[,4]
[,5]
[,6]
[,7]
[1,] -1.497 -1.478 -1.431 -1.479 -1.628 -1.522 -1.392
Also in this case only the value of the statistic can be considered. See Verbeeks Table
9.2 for the 1%, 5% and 10% asymptotic critical values residual-based unit root ADF
test for no cointegration (with constant term). In the present case the 5% critical
value is 3.34 since two variables were considered in the cointegration relationship.
So the null hypothesis of a unit root cannot be rejected.
Verbeek then suggests to consider a more general cointegrating relationships between
the three variables st , pt and pt , by estimating the parameters in the model
st = + pt + pt + t ,

Multivariate Time Series Models

397

see Verbeeks Table 9.7, and performing the corresponding tests on the residuals as
made above, see Verbeeks Table 9.8.
> a <- lm(log(FXEP) ~ LOGCPIEURO + LOGCPIUK, data = ppp)
> summary(a)
Call:
lm(formula = log(FXEP) ~ LOGCPIEURO + LOGCPIUK, data = ppp)
Residuals:
Min
1Q
-0.24406 -0.08568

Median
0.01416

3Q
0.06590

Max
0.22198

Coefficients:
Estimate Std. Error t value
(Intercept)
0.4193
0.2082
2.014
LOGCPIEURO
1.0076
0.2863
3.520
LOGCPIUK
-1.0151
0.2819 -3.601
--Signif. codes: 0 "***" 0.001 "**" 0.01

Pr(>|t|)
0.045027 *
0.000506 ***
0.000376 ***
"*" 0.05 "." 0.1 " " 1

Residual standard error: 0.1047 on 273 degrees of freedom


Multiple R-squared: 0.04562,
Adjusted R-squared: 0.03863
F-statistic: 6.525 on 2 and 273 DF, p-value: 0.001706
> library(lmtest)
> dwtest(a)$stat
DW
0.04120736
> x <- a$residuals
> a <- 0:6
> out <- sapply(a, f)
> print(t(round(out[1, ], 3)))
[,1]
[,2]
[,3]
[,4]
[,5]
[,6]
[,7]
[1,] -1.508 -1.489 -1.439 -1.485 -1.633 -1.528 -1.399
The 5% critical value for the residual-based unit root ADF test for no cointegration
(with constant term) is, from Verbeeks Table 9.2, 3.74 and again, the null hypothesis
of no cointegrating relationship presence cannot be rejected. Verbeek concludes that
the sample period is just not long enough to find sufficient evidence for a cointegrating
relationship.

9.3

Long-run Purchasing Power Parity (Part 3) (Section


9.5.4)

The same data set of the preceding Section is considered.


The Johansens technique is applied to analyse the existence of one or more
cointegrating relationships between the three variables st , pt and pt .

398

Multivariate Time Series Models

Verbeek suggests to temporary consider p = 3 as the maximum order of the lags in


the autoregressive representation he gives in relationship (9.42), that is:
~ t = + 1 Y
~t1 + 2 Y
~t2 + 3 Y
~t3 + ~t
Y
and to include the presence of unrestricted intercepts. He remarks that the first step
of the Johansens technique consists in determining the order p.
Later we will obtain maximum eigenvalue tests for cointegration can be obtained by
means of the function ca.jo available in the package urca, see Section 9.4, which
does not deal with unrestricted intercepts.
In order to perform the current analysis we can have recourse to the function
johansen available in the package oekfinm by Wolfgang Scherrer. The package was
built under R version 2.13.1, so You need that version of R to apply the johansen
function (R version 2.13.1 can be installed in a new folder). The arguments of the
johansen function are y: the data to analyse; r an integer specifying the cointegration
rank; p the order of the VAR; det the deterministic component, which can be "none"
or "const" or "linear"; and restricted which is a logical variable specifying if
restrictions have to be imposed on the deterministic components.
Eigenvalues and trace tests are the elements lambda and tracestats in the list
produced by the function johansen.
>
>
>
>
>

sppstar <- ppp[, c("FXEP", "LOGCPIEURO", "LOGCPIUK")]


sppstar[, 1] <- log(sppstar[, 1])
library(oekfinm)
tab9.10<-johansen(sppstar,r=0,p=3,det="const",restr=FALSE)
tab9.10$tracestat
stat
90%
95%
99%
r=0 37.520270 26.70 29.38 34.87
r=1 10.761479 13.31 15.34 19.69
r=2 4.053898 2.71 3.84 6.64
> tab9.10$lambda
r=1
r=2
r=3
0.09336701 0.02427051 0.01473973

Results are consistent with those reported by Verbeek. Verbeek observes that the
test results may be sensitive to the number of lags that are included and shows what
happens by including a lag of p = 13, see Verbeeks Table 9.11.
> tab9.11<-johansen(sppstar,r=0,p=13,det="const",restr=FALSE)
> tab9.11$tracestat
stat
90%
95%
99%
r=0 25.386442 26.70 29.38 34.87
r=1 8.444536 13.31 15.34 19.69
r=2 3.358870 2.71 3.84 6.64
> tab9.11$lambda
r=1
r=2
r=3
0.06238690 0.01915137 0.01269016
The test in this case does not reject the null of presence of 0 cointegrating vectors.

Multivariate Time Series Models

9.4

399

Money Demand and Inflation (Section 9.6)

Verbeek considers an empirical example to study cointegration in a five-dimensional


vector process.
Quarterly data are considered for the United States from 1954:1 to 1994:4 (T = 164)
for the following variables:

cpr commercial paper rate (in % per year)

infl quarterly inflation rate (change in log prices), % per year

m log of real M1 money stock

quarter quarter of observation (yyyyqq)

tbr treasury bill rate

y log real GDP (in billions of 1987 dollars).

They can be read by means of the function readEViews, available in the package
hexView, having extracted the work file of EViews money.wf1 from the compressed
archive ch09.zip. We can create a multiple time series from the columns of a table
or a data.frame with the function ts(object, start, frequency); in this case we
have to specify frequency=4 since we are dealing with quarterly data.
We drop the column with the time reference.
> library(hexView)
> money <- readEViews(unzip("ch09.zip", "Chapter 9/money.wf1"))
Skipping boilerplate variable
Skipping boilerplate variable
> money <- money[, -4]
> money <- ts(money, start = c(1954, 1), frequency = 4)
Verbeek considers first three theoretical relationships governing the long-run
behaviour of these variables, that can be assumed as theoretical cointegrating
relationships. Verbeek performs three separate OLS estimates. The parameter
estimates for the equations describing the money demand, the inflation rate and
the commercial paper rate:
mt = 1 + 14 yt + 15 tbrt + 1t
inf lt = 2 + 25 tbrt + 2t
cprt = 3 + 35 tbrt + 3t
are obtained by means of the function dynlm available in the package dynlm and the
results organized by the function mtable available in the package memisc.
>
>
>
>
>

library(dynlm)
library(lmtest)
library(urca)
demols <- dynlm(M ~ Y + TBR, data = money)
inflols <- dynlm(INFL ~ TBR, data = money)

400

Multivariate Time Series Models

> commpapols <- dynlm(CPR ~ TBR, data = money)


> library(memisc)
> mtable(demols, inflols, commpapols, summary.stats = "R-squared")
Calls:
demols: dynlm(formula = M ~ Y + TBR, data = money)
inflols: dynlm(formula = INFL ~ TBR, data = money)
commpapols: dynlm(formula = CPR ~ TBR, data = money)
=============================================
demols
inflols
commpapols
--------------------------------------------(Intercept)
3.189***
1.158***
0.461***
(0.122)
(0.333)
(0.065)
Y
0.423***
(0.016)
TBR
-0.031***
0.558***
1.038***
(0.002)
(0.053)
(0.010)
--------------------------------------------R-squared
0.815
0.409
0.984
=============================================
The cointegrating regression Durbin-Watson (CRDW) statistic and the residual-based
unit root ADF statistic for no cointegration (with constant term) can be computed
by using the functions dwtest and ur.df; remember to check Verbeeks Tables 9.2
and 9.3 for the proper critical values.
> dwtest(demols)$stat
DW
0.1989882
> ur.df(demols$res, type = "drift", lags = 6)@teststat[1]
[1] -3.163297
> dwtest(inflols)$stat
DW
0.7841199
> ur.df(inflols$res, type = "drift", lags = 6)@teststat[1]
[1] -1.915916
> dwtest(commpapols)$stat
DW
0.704969
> ur.df(commpapols$res, type = "drift", lags = 6)@teststat[1]
[1] -4.0456
The Durbin-Watson statistics, according to the critical values from Verbeeks Table
9.3, which are based on the assumption of random walks possibly not satisfied
by the money supply and GDP series , suggest to reject the null hypothesis of

Multivariate Time Series Models

401

no cointegration at the 5% level for the last two equations, while, according to the
residual-based unit root ADF statistic for no cointegration (with constant term), the
hypothesis of no cointegration does not hold only for the third equation, describing
risk premium5 .
Verbeek suggests to perform a multivariate vector analysis to have stronger evidence
for the existence of cointegrating relationships between the five variables. He does
also perform a graphical analysis of the residuals of the three equations to check for
stationarity, see Figures 9.2, 9.3 and 9.4.
> plot(demols$res, type = "l")
> abline(h = 0)
> plot(inflols$res, type = "l")
> abline(h = 0)
> plot(commpapols$res, type = "l")
> abline(h = 0)
To perform the Johansen procedure, the maximum length p in the vector
autoregressive model has to be chosen. Verbeek suggests the orders 5 and 6 and
reports in Table 9.14 the trace and maximum eigenvalue tests for cointegration.
The maximum eigenvalue tests for cointegration can be obtained by means of the
function ca.jo available in the package urca, see Verbeeks Table 9.106 . The main
arguments of the function ca.jo are: x, the data matrix to be investigated for
cointegration; type, the test to be conducted, either "eigen" or "trace"; ecdet,
which can be set to "none" for no intercept in cointegration, "const" for constant
term in cointegration and "trend" for trend variable in cointegration; K, the lag
order of the series (levels) in the VAR, spec which determines the specification of
the VECM and can be "longrun" or "transitory". See the help ?urca::ca.jo for
more information.
> allvar <- money[, c("M", "INFL", "CPR", "Y", "TBR")]
> library(urca)
> summary(ca.jo(allvar, ecdet = "const", type = "trace", K = 5,
spec = "longrun"))
######################
# Johansen-Procedure #
5 Remember that both the CRDW and ADF statistics are null if a unit root is present, that is
under the null hypothesis of no cointegration. Appropriate 5% critical values from Verbeeks Table
9.2 for the ADF test are 3.34 for possible cointegrating relationships involving 2 variables and
3.74 for possible cointegrating relationships involving 3 variables; while with regard to the CRDW
test from Verbeeks Table 9.3 the 5% critical values are 0.20 for possible cointegrating relationships
involving 2 variables and 0.25 for possible cointegrating relationships involving 3 variables (number
of observations: 200).
6 Critical values for the max-eigenvalue test are taken from Osterwald-Lenum (1992). Though quite
similar, they differ somewhat from the critical values reported by Verbeek in Table 9.9. Observe that
tests are reversed with respect to Verbeeks output.

402

0.00
0.05
0.10
0.15

demols$res

0.05

0.10

0.15

Multivariate Time Series Models

1960

1970

1980

1990

Time

Figure 9.2

Residuals of money demand regression

######################
Test type: trace statistic,
without linear trend and constant in cointegration
Eigenvalues (lambda):
[1] 2.4582e-01 1.7317e-01 1.0279e-01 5.1003e-02 2.2397e-02 7.2499e-16
Values of teststatistic and critical values of test:

r
r
r
r
r

<= 4
<= 3
<= 2
<= 1
= 0

test
|
3.60
| 11.93
| 29.17
| 59.41
| 104.26

10pct
7.52
17.85
32.00
49.65
71.86

5pct
9.24
19.96
34.91
53.12
76.07

1pct
12.97
24.60
41.07
60.16
84.45

403

2
0
2
4

inflols$res

Multivariate Time Series Models

1960

1970

1980

1990

Time

Figure 9.3

Residuals of Fisher regression

Eigenvectors, normalised to first column:


(These are the cointegration relations)
M.l5
INFL.l5
CPR.l5
Y.l5
TBR.l5 constant
M.l5
1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
INFL.l5
0.027280 0.033616 0.018488 0.262027 -0.199582 -0.019474
CPR.l5
-0.070161 -1.144488 0.016785 -0.303908 -0.069297 -0.032074
Y.l5
-0.432566 -0.624159 -0.565694 -0.669171 -1.419789 -0.122782
TBR.l5
0.099270 1.210553 0.021598 0.046574 0.212904 0.045646
constant -3.282767 -0.923727 -2.172837 -0.326824 4.972280 -5.362733
Weights W:
(This is the loading matrix)
M.l5
INFL.l5
CPR.l5
Y.l5
TBR.l5
M.d
-0.009147 0.0047624 -0.0262606 -0.0013178 -0.00076057
INFL.d -2.819053 -1.3318367 -2.7246923 0.1724486 0.26087439

constant
4.9488e-14
8.2817e-12

404

1.0
0.5
0.0
0.5

commpapols$res

1.5

2.0

2.5

Multivariate Time Series Models

1960

1970

1980

1990

Time

Figure 9.4

Residuals of risk premium regression

CPR.d
2.351476 0.2971658 -1.5680125 0.2378696 -0.01334883 -2.3114e-12
Y.d
-0.053129 0.0022981 -0.0055104 0.0017462 -0.00051361 8.3207e-14
TBR.d
1.749904 -0.0049994 -1.5903476 0.2031231 -0.06967642 -7.2022e-13
> summary(ca.jo(allvar, ecdet = "const", type = "trace", K = 6,
spec = "longrun"))
######################
# Johansen-Procedure #
######################
Test type: trace statistic,
without linear trend and constant in cointegration
Eigenvalues (lambda):
[1] 2.6902e-01 1.9711e-01 1.2999e-01 7.2538e-02 1.7324e-02 2.4472e-16
Values of teststatistic and critical values of test:

Multivariate Time Series Models

r
r
r
r
r

<= 4
<= 3
<= 2
<= 1
= 0

test
|
2.76
| 14.66
| 36.66
| 71.35
| 120.86

10pct
7.52
17.85
32.00
49.65
71.86

5pct
9.24
19.96
34.91
53.12
76.07

405

1pct
12.97
24.60
41.07
60.16
84.45

Eigenvectors, normalised to first column:


(These are the cointegration relations)
M.l6
INFL.l6
CPR.l6
Y.l6
TBR.l6
constant
M.l6
1.000000 1.0000000 1.000000 1.000000 1.000000 1.0000000
INFL.l6
0.027522 0.0048977 0.017315 0.619718 -0.098270 -0.0049827
CPR.l6
-0.128709 0.4810277 -0.067558 -0.077271 -0.072337 -0.1438921
Y.l6
-0.439317 -0.3650629 -0.537060 -1.773107 -0.716382 0.1387489
TBR.l6
0.158646 -0.4620892 0.107309 -0.419051 0.130337 0.1247853
constant -3.188704 -4.0767044 -2.332719 8.273357 -0.491810 -7.2161580
Weights W:
(This is the loading matrix)
M.l6
INFL.l6
CPR.l6
Y.l6
TBR.l6
constant
M.d
-0.0033322 -0.018747 -0.0138332 -0.00072754 -0.00139569 2.9591e-14
INFL.d -3.3139036 1.641475 -6.2992151 0.07074960 0.49103886 7.1952e-12
CPR.d
3.2451187 -0.653724 -1.0908952 0.13929680 0.01714385 -1.9459e-12
Y.d
-0.0536283 -0.012044 0.0023838 0.00107612 0.00019001 4.2005e-14
TBR.d
1.8023190 -0.224897 -1.6813019 0.12665321 -0.08587212 -1.1704e-13
> summary(ca.jo(allvar, ecdet = "const", type = "eigen", K = 5,
spec = "longrun"))
######################
# Johansen-Procedure #
######################
Test type: maximal eigenvalue statistic (lambda max),
without linear trend and constant in cointegration

Eigenvalues (lambda):
[1] 2.4582e-01 1.7317e-01 1.0279e-01 5.1003e-02 2.2397e-02 7.2499e-16
Values of teststatistic and critical values of test:
test 10pct 5pct 1pct
r <= 4 | 3.60 7.52 9.24 12.97
r <= 3 | 8.32 13.75 15.67 20.20
r <= 2 | 17.25 19.77 22.00 26.81

406

Multivariate Time Series Models

r <= 1 | 30.24 25.56 28.14 33.24


r = 0 | 44.86 31.66 34.40 39.79
Eigenvectors, normalised to first column:
(These are the cointegration relations)
M.l5
INFL.l5
CPR.l5
Y.l5
TBR.l5 constant
M.l5
1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
INFL.l5
0.027280 0.033616 0.018488 0.262027 -0.199582 -0.019474
CPR.l5
-0.070161 -1.144488 0.016785 -0.303908 -0.069297 -0.032074
Y.l5
-0.432566 -0.624159 -0.565694 -0.669171 -1.419789 -0.122782
TBR.l5
0.099270 1.210553 0.021598 0.046574 0.212904 0.045646
constant -3.282767 -0.923727 -2.172837 -0.326824 4.972280 -5.362733
Weights W:
(This is the loading matrix)
M.l5
INFL.l5
CPR.l5
Y.l5
TBR.l5
constant
M.d
-0.009147 0.0047624 -0.0262606 -0.0013178 -0.00076057 4.9488e-14
INFL.d -2.819053 -1.3318367 -2.7246923 0.1724486 0.26087439 8.2817e-12
CPR.d
2.351476 0.2971658 -1.5680125 0.2378696 -0.01334883 -2.3114e-12
Y.d
-0.053129 0.0022981 -0.0055104 0.0017462 -0.00051361 8.3207e-14
TBR.d
1.749904 -0.0049994 -1.5903476 0.2031231 -0.06967642 -7.2022e-13
> summary(ca.jo(allvar, ecdet = "const", type = "eigen", K = 6,
spec = "longrun"))
######################
# Johansen-Procedure #
######################
Test type: maximal eigenvalue statistic (lambda max),
without linear trend and constant in cointegration

Eigenvalues (lambda):
[1] 2.6902e-01 1.9711e-01 1.2999e-01 7.2538e-02 1.7324e-02 2.4472e-16
Values of teststatistic and critical values of test:

r
r
r
r
r

<= 4
<= 3
<= 2
<= 1
= 0

|
|
|
|
|

test
2.76
11.90
22.00
34.69
49.51

10pct
7.52
13.75
19.77
25.56
31.66

5pct
9.24
15.67
22.00
28.14
34.40

1pct
12.97
20.20
26.81
33.24
39.79

Eigenvectors, normalised to first column:

Multivariate Time Series Models

407

(These are the cointegration relations)


M.l6
INFL.l6
CPR.l6
Y.l6
TBR.l6
constant
M.l6
1.000000 1.0000000 1.000000 1.000000 1.000000 1.0000000
INFL.l6
0.027522 0.0048977 0.017315 0.619718 -0.098270 -0.0049827
CPR.l6
-0.128709 0.4810277 -0.067558 -0.077271 -0.072337 -0.1438921
Y.l6
-0.439317 -0.3650629 -0.537060 -1.773107 -0.716382 0.1387489
TBR.l6
0.158646 -0.4620892 0.107309 -0.419051 0.130337 0.1247853
constant -3.188704 -4.0767044 -2.332719 8.273357 -0.491810 -7.2161580
Weights W:
(This is the loading matrix)
M.l6
INFL.l6
CPR.l6
Y.l6
TBR.l6
constant
M.d
-0.0033322 -0.018747 -0.0138332 -0.00072754 -0.00139569 2.9591e-14
INFL.d -3.3139036 1.641475 -6.2992151 0.07074960 0.49103886 7.1952e-12
CPR.d
3.2451187 -0.653724 -1.0908952 0.13929680 0.01714385 -1.9459e-12
Y.d
-0.0536283 -0.012044 0.0023838 0.00107612 0.00019001 4.2005e-14
TBR.d
1.8023190 -0.224897 -1.6813019 0.12665321 -0.08587212 -1.1704e-13
Verbeek suggests to restrict the rank of the long-run cointegrating matrix to be equal
to 2 and obtains maximum likelihood estimates of the cointegrating vectors and
the error correction model. In the package urca the function cajorls is available
which performs ordinary least squares regression of a restricted VECM. To obtain
the restricted model in Verbeeks Table 9.15 we have first to change the order of
the columns in allvar since restrictions will be referred to the first 2 variables, reapply the function ca.jo assigning the results to an object, and then use the function
cajorls with the argument r=2 establishing the number of restrictions.
> allvar <- allvar[, c(1, 3, 2, 4, 5)]
> restr.coint <- ca.jo(allvar, ecdet = "const", type = "eigen",
K = 6, spec = "longrun")
> restr.coint1 <- cajorls(restr.coint, r = 2)
The element beta in restr.coint1, see str(restr.coint1), contains the OLS
estimates of cointegrating vectors (after normalization) in a restricted VECM.
Observe that summary(cajools(restr.coint)) and summary(restr.coint1$rlm)
return the OLS regressions respectively of the unrestricted and restricted VECM.
> round(restr.coint1$beta, 3)
ect1
ect2
M.l6
1.000 0.000
CPR.l6
0.000 1.000
INFL.l6
0.023 -0.037
Y.l6
-0.424 0.122
TBR.l6
0.028 -1.018
constant -3.376 -1.456

408

Multivariate Time Series Models

The matrix of adjustment coefficients can be obtained with


> restr.coint1$rlm$coef[1:2, ]
M.d
CPR.d INFL.d
Y.d
TBR.d
ect1 -0.0220795 2.59139 -1.6724 -0.0656728 1.57742
ect2 -0.0085891 -0.73214 1.2161 0.0011087 -0.34016
Observe that signs are opposite to Verbeekss.
To obtain standard errors we can use:
> coef <- sapply(1:5, function(i) coeftest(cajorls(restr.coint,
r = 2, reg = i)$rlm)[1:2, 1])
> tstat <- sapply(1:5, function(i) coeftest(cajorls(restr.coint,
r = 2, reg = i)$rlm)[1:2, 2])
> colnames(coef) <- colnames(tstat) <- paste(colnames(allvar),
".d", sep = "")
> coef
M.d
CPR.d INFL.d
Y.d
TBR.d
ect1 -0.0220795 2.59139 -1.6724 -0.0656728 1.57742
ect2 -0.0085891 -0.73214 1.2161 0.0011087 -0.34016
> tstat
M.d
CPR.d INFL.d
Y.d
TBR.d
ect1 0.0109021 1.19861 2.36702 0.0126428 1.08157
ect2 0.0025669 0.28221 0.55731 0.0029767 0.25465

10
Models based on panel data
10.1

Explaining Individual Wages (Section 10.3)

Verbeek considers the application of the Between, the Fixed effects, the OLS and of
the Random effects estimators to deal with a panel data linear model for an individual
wage equation. The data are saved in the file males.dta, which is in the Stata format
and is available in the compressed archive ch10.zip. To read data we have first to
invoke the package foreign and next the command read.dta.
> library(foreign)
> wages <- read.dta(unzip("ch10.zip", "Chapter 10/males.dta"))
Data are taken from the Youth Sample of the National Longitudinal Survey held in
the USA, and comprise a sample of 545 full-time working males who completed their
schooling by 1980 and were then followed over the period 1980-1987. The males in the
sample are young, with an age in 1980 ranging from 17 to 23, and entered the labour
market fairly recently, with an average of 3 years of experience at the beginning of
the sample period.
The following variables are available:

NR: Observations number

YEAR: Year of observation

School: Years of schooling

Exper: Age-6-School

Exper2: Experience Squared

LogExper: Log(1+Experience)

Union: Wage set by collective bargaining

Mar: Married

Black: Black

Hisp: Hispanic

Health: Has health disability

Rural: Lives in rural area

NE: Lives in North East

410

Models based on panel data

NC: Lives in Northern Central

S: Lives in South

Wage: Log of hourly wage

12 Dummy variables for the different industries

9 Dummy variables for the occupational status

Verbeek supposes that log wages are explained by years of schooling, years of
experience and its square, dummy variables for being a union member, working in
the public sector and being married and two racial dummies.
The package plm can be used to deal with models based on panel data; the plm
procedures are thoroughly described in Croissant and Millo (2008).
A data.frame containing panel data must be characterized by the presence of two
variables defining respectively individual and time indices. The first two columns in
the data.frame wages contain such information.
> wages[1:5, 1:2]
NR YEAR
1 13 1982
2 13 1981
3 13 1986
4 13 1983
5 13 1984
Repeated measurements on each statistical unit are not ordered according to time;
the following code can be used to reorder the data frame when needed.
> i <- order(wages[, 1], wages[, 2])
> wages <- wages[i, ]
> wages[1:5, 1:2]
NR YEAR
7 13 1980
2 13 1981
1 13 1982
4 13 1983
5 13 1984
The panel model may be estimated with the function plm which requires the following
arguments:

formula: the usual formula for a linear model.


Instrumental variables estimation may be obtained by using a two-part formula,
the second part indicating the instrumental variables. To specify that y depends
on x1, x2 and x3, with x1 and x2 endogenous and z1 and z2 external
instruments, use:
formula = y ~ x1 + x2 + x3 | x3 + z1 + z2,
or
formula = y ~ x1 + x2 + x3 | . - x1 - x2 + z1 + z2;

Models based on panel data

411

data, subset and na.action specify respectively the data.frame, a possible


subset to analyze and the missing value treatment;

effect specifies the kind of effect to introduce in the model; the argument may
assume one of the values individual, time or twoways;

model specifies the kind of model: within, random, ht, between, pooling or
fd; the options are referred respectively to the within or fixed effects estimator,
to the random effects estimator, to the Hausman Taylor estimator, the between
effects estimator, to the pooling estimator which is equivalent to OLS and to
the first-difference estimator.

random.method defines the method of estimation of the variance components in


the random effects model; the options are swar, walhus, amemiya and nerlove;

inst.method defines the instrumental variable transformation (bvk or


baltagi),

index defines the indices for the individual and the time; it needs to be used
whenever these two variables are not placed in the first two columns of the
data.frame we are analysing.

See the help ?plm::plm for more information.


With the following instructions the parameter estimates are obtained for the models
presented by Verbeeks in Table 10.2, that is a between, an OLS and a random effects
model.
> library(plm)
> between <- plm(WAGE ~ SCHOOL + EXPER + EXPER2 + UNION +
MAR + BLACK + HISP + PUB, data = wages, model = "between")
> fixed <- plm(WAGE ~ SCHOOL + EXPER + EXPER2 + UNION +
MAR + BLACK + HISP + PUB, data = wages, model = "within")
> ols <- plm(WAGE ~ SCHOOL + EXPER + EXPER2 + UNION +
MAR + BLACK + HISP + PUB, data = wages, model = "pooling")
> random <- plm(WAGE ~ SCHOOL + EXPER + EXPER2 + UNION +
MAR + BLACK + HISP + PUB, data = wages, model = "random")
We can inspect the output for each different model separately by using the function
summary.
The functions pvcovHC or vcovHC.plm, vcovBK and vcovSCC can be used to obtain
robust parameter covariance matrix estimators for the fixed effects, random effects
and pooling models. The arguments of pvcovHC which computes robust covariance
matrices a la White are x: an object of class "plm" which should be the result of
a random effects or a within model or a model of class "pgmm"; method: one of
"arellano","white1","white2"; type: one of "HC0", "HC1", "HC2", "HC3", "HC4";
cluster: one of "group" or "time". vcovBK and vcovSCC respectively compute
unconditional robust parameter covariance matrix estimators a la Beck and Katz for
panel models and nonparametric robust covariance matrix estimators a la Driscoll
and Kraay for panel models with cross-sectional and serial correlation. See the R
help system for more information.

412

Models based on panel data

Output for the between effects estimation method


> summary(between)
Oneway (individual) effect Between Model
Call:
plm(formula = WAGE ~ SCHOOL + EXPER + EXPER2 + UNION + MAR +
BLACK + HISP + PUB, data = wages, model = "between")
Balanced Panel: n=545, T=8, N=4360
Residuals :
Min. 1st Qu.
-1.130 -0.240

Median 3rd Qu.


0.028
0.232

Max.
1.740

Coefficients :
Estimate Std. Error t-value Pr(>|t|)
(Intercept) 0.4903902 0.2211917 2.2170 0.0270394 *
SCHOOL
0.0947911 0.0109178 8.6822 < 2.2e-16 ***
EXPER
-0.0502077 0.0503689 -0.9968 0.3193120
EXPER2
0.0051068 0.0032142 1.5888 0.1126871
UNION
0.2743194 0.0471273 5.8208 1.009e-08 ***
MAR
0.1445897 0.0412654 3.5039 0.0004968 ***
BLACK
-0.1391368 0.0489084 -2.8448 0.0046132 **
HISP
0.0054832 0.0427436 0.1283 0.8979738
PUB
-0.0563215 0.1090691 -0.5164 0.6057992
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Total Sum of Squares:
83.06
Residual Sum of Squares: 64.819
R-Squared
: 0.2196
Adj. R-Squared : 0.21598
F-statistic: 18.8539 on 8 and 536 DF, p-value: < 2.22e-16
Output for the fixed effects or within estimation method
> summary(fixed)
Oneway (individual) effect Within Model
Call:
plm(formula = WAGE ~ SCHOOL + EXPER + EXPER2 + UNION + MAR +
BLACK + HISP + PUB, data = wages, model = "within")
Balanced Panel: n=545, T=8, N=4360
Residuals :

Models based on panel data

Min. 1st Qu.


-4.17000 -0.12600

Median
0.00992

3rd Qu.
0.15900

Max.
1.47000

Coefficients :
Estimate Std. Error t-value Pr(>|t|)
EXPER
0.11645699 0.00843090 13.8131 < 2.2e-16
EXPER2 -0.00428857 0.00060544 -7.0834 1.668e-12
UNION
0.08120303 0.01931592 4.2039 2.683e-05
MAR
0.04510613 0.01831141 2.4633
0.01381
PUB
0.03492672 0.03860819 0.9046
0.36571
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05

***
***
***
*

"." 0.1 " " 1

Total Sum of Squares:


572.05
Residual Sum of Squares: 470.1
R-Squared
: 0.17822
Adj. R-Squared : 0.15574
F-statistic: 165.256 on 5 and 3810 DF, p-value: < 2.22e-16
> library(lmtest)
> coeftest(fixed, vcov = pvcovHC(fixed))
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
EXPER
0.11645699 0.01070551 10.8782 < 2.2e-16
EXPER2 -0.00428857 0.00068517 -6.2591 4.301e-10
UNION
0.08120303 0.02270999 3.5757 0.0003537
MAR
0.04510613 0.02096824 2.1512 0.0315259
PUB
0.03492672 0.03762350 0.9283 0.3532994
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05

***
***
***
*

"." 0.1 " " 1

Output for the OLS estimation method


> summary(ols)
Oneway (individual) effect Pooling Model
Call:
plm(formula = WAGE ~ SCHOOL + EXPER + EXPER2 + UNION + MAR +
BLACK + HISP + PUB, data = wages, model = "pooling")
Balanced Panel: n=545, T=8, N=4360
Residuals :
Min. 1st Qu.
-5.2700 -0.2490
Coefficients :

Median 3rd Qu.


0.0332 0.2960

Max.
2.5600

413

414

Models based on panel data

Estimate Std. Error t-value


(Intercept) -0.03437245 0.06467230 -0.5315
SCHOOL
0.09936782 0.00468289 21.2194
EXPER
0.08913805 0.01012149 8.8068
EXPER2
-0.00284682 0.00070771 -4.0226
UNION
0.17990427 0.01721460 10.4507
MAR
0.10762116 0.01570528 6.8525
BLACK
-0.14382268 0.02356305 -6.1037
HISP
0.01565030 0.02081966 0.7517
PUB
0.00354615 0.03747396 0.0946
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*"

Pr(>|t|)
0.5951
< 2.2e-16
< 2.2e-16
5.854e-05
< 2.2e-16
8.271e-12
1.126e-09
0.4523
0.9246

***
***
***
***
***
***

0.05 "." 0.1 " " 1

Total Sum of Squares:


1236.5
Residual Sum of Squares: 1005.8
R-Squared
: 0.18659
Adj. R-Squared : 0.1862
F-statistic: 124.759 on 8 and 4351 DF, p-value: < 2.22e-16
> coeftest(ols, vcov = pvcovHC(ols))
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.0343724 0.1201077 -0.2862 0.774754
SCHOOL
0.0993678 0.0092085 10.7909 < 2.2e-16 ***
EXPER
0.0891380 0.0124250 7.1741 8.514e-13 ***
EXPER2
-0.0028468 0.0008687 -3.2771 0.001057 **
UNION
0.1799043 0.0274501 6.5539 6.259e-11 ***
MAR
0.1076212 0.0260702 4.1281 3.726e-05 ***
BLACK
-0.1438227 0.0500258 -2.8750 0.004060 **
HISP
0.0156503 0.0391447 0.3998 0.689319
PUB
0.0035462 0.0501168 0.0708 0.943594
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Output for the random effects estimation method
> summary(random)
Oneway (individual) effect Random Effect Model
(Swamy-Arora's transformation)
Call:
plm(formula = WAGE ~ SCHOOL + EXPER + EXPER2 + UNION + MAR +
BLACK + HISP + PUB, data = wages, model = "random")
Balanced Panel: n=545, T=8, N=4360
Effects:

Models based on panel data

415

var std.dev share


idiosyncratic 0.1234 0.3513 0.539
individual
0.1055 0.3248 0.461
theta: 0.6429
Residuals :
Min. 1st Qu.
-4.5800 -0.1450

Median 3rd Qu.


0.0234 0.1860

Max.
1.5400

Coefficients :
Estimate Std. Error t-value
(Intercept) -0.10431133 0.11083404 -0.9411
SCHOOL
0.10102372 0.00892187 11.3232
EXPER
0.11178514 0.00827093 13.5154
EXPER2
-0.00405745 0.00059198 -6.8540
UNION
0.10641339 0.01786690 5.9559
MAR
0.06254646 0.01677617 3.7283
BLACK
-0.14400263 0.04764392 -3.0225
HISP
0.01972690 0.04263026 0.4627
PUB
0.03015546 0.03646707 0.8269
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*"

Pr(>|t|)
0.3466808
< 2.2e-16
< 2.2e-16
8.189e-12
2.791e-09
0.0001952
0.0025218
0.6435709
0.4083261

***
***
***
***
***
**

0.05 "." 0.1 " " 1

Total Sum of Squares:


656.8
Residual Sum of Squares: 539.65
R-Squared
: 0.17837
Adj. R-Squared : 0.178
F-statistic: 118.07 on 8 and 4351 DF, p-value: < 2.22e-16
> coeftest(random, vcov = pvcovHC(random))
t test of coefficients:
Estimate Std. Error t value
(Intercept) -0.10431133 0.11498178 -0.9072
SCHOOL
0.10102372 0.00888409 11.3713
EXPER
0.11178514 0.01052801 10.6179
EXPER2
-0.00405745 0.00067325 -6.0266
UNION
0.10641339 0.02080771 5.1141
MAR
0.06254646 0.01896620 3.2978
BLACK
-0.14400263 0.05018606 -2.8694
HISP
0.01972690 0.03987080 0.4948
PUB
0.03015546 0.03378064 0.8927
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*"

Pr(>|t|)
0.3643519
< 2.2e-16
< 2.2e-16
1.813e-09
3.287e-07
0.0009823
0.0041327
0.6207870
0.3720755

***
***
***
***
***
**

0.05 "." 0.1 " " 1

To obtain a unique summary output for the results of different objects of class plm
class, we have to invoke the package tonymisc which allows the function mtable in

416

Models based on panel data

the package memisc to work also in presence of objects of class plm.


> library(tonymisc)
> library(memisc)
> mtable(between, fixed, ols, random)
Calls:
between: plm(formula = WAGE ~ SCHOOL + EXPER + EXPER2 + UNION + MAR +
BLACK + HISP + PUB, data = wages, model = "between")
fixed: plm(formula = WAGE ~ SCHOOL + EXPER + EXPER2 + UNION + MAR +
BLACK + HISP + PUB, data = wages, model = "within")
ols: plm(formula = WAGE ~ SCHOOL + EXPER + EXPER2 + UNION + MAR +
BLACK + HISP + PUB, data = wages, model = "pooling")
random: plm(formula = WAGE ~ SCHOOL + EXPER + EXPER2 + UNION + MAR +
BLACK + HISP + PUB, data = wages, model = "random")
========================================================
between
fixed
ols
random
-------------------------------------------------------(Intercept)
0.490*
-0.034
-0.104
(0.221)
(0.065)
(0.111)
SCHOOL
0.095***
0.099*** 0.101***
(0.011)
(0.005)
(0.009)
EXPER
-0.050
0.116*** 0.089*** 0.112***
(0.050)
(0.008)
(0.010)
(0.008)
EXPER2
0.005
-0.004*** -0.003*** -0.004***
(0.003)
(0.001)
(0.001)
(0.001)
UNION
0.274*** 0.081*** 0.180*** 0.106***
(0.047)
(0.019)
(0.017)
(0.018)
MAR
0.145*** 0.045*
0.108*** 0.063***
(0.041)
(0.018)
(0.016)
(0.017)
BLACK
-0.139**
-0.144*** -0.144**
(0.049)
(0.024)
(0.048)
HISP
0.005
0.016
0.020
(0.043)
(0.021)
(0.043)
PUB
-0.056
0.035
0.004
0.030
(0.109)
(0.039)
(0.037)
(0.036)
-------------------------------------------------------R-squared
0.220
0.178
0.187
0.178
adj. R-squared
0.216
0.156
0.186
0.178
F (omnibus)
18.854
165.256
124.759
118.070
p-val (omnibus)
0.000
0.000
0.000
0.000
N
545
4360
4360
4360
========================================================
The function ercomp returns the variance components from a random effects panel
model and returns the estimate of here named theta, see Verbeek p. 382.

Models based on panel data

417

> ercomp(random)
var std.dev share
idiosyncratic 0.1234 0.3513 0.539
individual
0.1055 0.3248 0.461
theta: 0.6429
The Hausman test can be performed by having recourse to the function phtest
> phtest(fixed, random)
Hausman Test
data: WAGE~SCHOOL + EXPER + EXPER2 + UNION + MAR + BLACK + HISP + PUB
chisq = 31.7531, df = 5, p-value = 6.649e-06
alternative hypothesis: one model is inconsistent
Observe that the function plm returns only:

the between R2 for the Between estimator,

the within R2 for the Fixed effects and the Random Effects estimators,

the overall R2 for the OLS estimator.

It is possible to compute the three goodness of fit statistics for all the estimators by
having recourse to Verbeeks relationships (10.29)-(10.31).


 FE

2
Rwithin
F E = corr2 yit
yiF E , yit yi
FE
where yit
yiF E = (xit x
i )0 F E )
 


2
Rbetween
B = corr2 yiB , yi

where yiB = x
0i B

2
= corr2 {
Roverall
()
yit , yit }

where yit = x
0it b With regard to the between estimator we first need the complete
data for the variables involved in the between model (that is the model matrix for
the OLS estimator), say XdataOLS:
> XdataOLS <- model.matrix(ols)
then we have
>
>
>
>

yit.hat <- XdataOLS %*% coef(between)


yi.hat <- tapply(yit.hat, wages$NR, mean)
yi.bar <- tapply(wages$WAGE, wages$NR, mean)
yiB.hat <- model.matrix(between) %*% coef(between)

Observe that in this case we have a regular panel (that is complete time data for each
statistical unit are available). The goodness of fit statistics result:

418

Models based on panel data

> (Between.withinR2 <- cor((yit.hat - rep(yi.hat, each = 8)),


(wages$WAGE - rep(yi.bar, each = 8)))^2)
[,1]
[1,] 0.04698327
> (Between.betweenR2 <- cor(yiB.hat, yi.bar)^2)
[,1]
[1,] 0.2196042
> (Between.overallR2 <- cor(yit.hat, wages$WAGE)^2)
[,1]
[1,] 0.1371153
With regard to the fixed effects estimator let Xdata be the complete data for the
variables involved in the corresponding model (which is a subset of the columns of
the model matrix for the OLS estimator):
> Xdata <- model.matrix(ols)[, c("EXPER", "EXPER2",
"UNION", "MAR", "PUB")]
and
>
>
>
>
>

yit.hat <- Xdata %*% coef(fixed)


yi.hat <- tapply(yit.hat, wages$NR, mean)
yi.bar <- tapply(wages$WAGE, wages$NR, mean)
ff <- function(i) tapply(Xdata[, i], wages$NR, mean)
modelmatrix <- sapply(1:dim(model.matrix(fixed))[2],
ff)
> yiB.hat <- modelmatrix %*% coef(fixed)
The corresponding goodness of fit statistics result:
> (Fixed.withinR2 <- cor((yit.hat - rep(yi.hat, each = 8)),
(wages$WAGE - rep(yi.bar, each = 8)))^2)
[,1]
[1,] 0.1782206
> (Fixed.betweenR2 <- cor(yiB.hat, yi.bar)^2)
[,1]
[1,] 0.0005952516
> (Fixed.overallR2 <- cor(yit.hat, wages$WAGE)^2)
[,1]
[1,] 0.06416925
For the OLS estimator we have:
>
>
>
>

yit.hat <- XdataOLS %*% coef(ols)


yi.hat <- tapply(yit.hat, wages$NR, mean)
yi.bar <- tapply(wages$WAGE, wages$NR, mean)
ff <- function(i) tapply(model.matrix(ols)[, i],

Models based on panel data

419

wages$NR, mean)
> modelmatrix <- sapply(1:dim(model.matrix(ols))[2],
ff)
> yiB.hat <- modelmatrix %*% coef(ols)
with the corresponding goodness of fit statistics
> (OLS.withinR2 <- cor((yit.hat - rep(yi.hat, each = 8)),
(wages$WAGE - rep(yi.bar, each = 8)))^2)
[,1]
[1,] 0.1679288
> (OLS.betweenR2 <- cor(yiB.hat, yi.bar)^2)
[,1]
[1,] 0.2026535
> (OLS.overallR2 <- cor(yit.hat, wages$WAGE)^2)
[,1]
[1,] 0.1865882
And finally, for the random effects estimator:
>
>
>
>

yit.hat <- XdataOLS %*% coef(random)


yi.hat <- tapply(yit.hat, wages$NR, mean)
yi.bar <- tapply(wages$WAGE, wages$NR, mean)
ff <- function(i) tapply(model.matrix(random)[, i],
wages$NR, mean)
> modelmatrix <- sapply(1:dim(model.matrix(random))[2],
ff)
> yiB.hat <- modelmatrix %*% coef(random)
with the corresponding goodness of fit statistics
> (random.withinR2 <- cor((yit.hat - rep(yi.hat, each = 8)),
(wages$WAGE - rep(yi.bar, each = 8)))^2)
[,1]
[1,] 0.1776096
> (random.betweenR2 <- cor(yiB.hat, yi.bar)^2)
[,1]
[1,] 0.183495
> (random.overallR2 <- cor(yit.hat, wages$WAGE)^2)
[,1]
[1,] 0.1807709

10.2

Explaining Capital Structure (Section 10.5)

Verbeek applies a dynamic linear panel model to the Flannery and Rangan (2006)
theory for explaining adjustments performed by firms to reach their target capital
structure.

420

Models based on panel data

Data are available in the file debtratio, a Stata file containing information on
US firms over the years 1987-2001. The panel is unbalanced. Data are taken from
Compustat.
> library(foreign)
> debtratio <- read.dta(unzip("ch10.zip", "Chapter 10/debtratio.dta"))
We can check the structure of the panel data with the function pdim available in the
package plm
> library(plm)
> pdim(debtratio)
Unbalanced Panel: n=5449, T=1-16, N=27762
The following variables are available. (Except for bdr and mdr all variables are already
lagged and refer to the (end of the) previous year).

gvkey: firm identification number

yeara: year of observation

mdr: market debt ratio

bdr: book debt ratio

lagebit_ta: earnings before interest and taxes / total assets (lagged)

lagmb: ratio of market value to book value of assets

lagdep_ta: depreciation expenses / total assets

laglnta: log of total assets

lagfa_ta: fixed assets / total assets

lagrd_dum: dummy, 1 if rd_da is missing

lagrd_ta: R&D expenditures / total assets

lagindmedian: industry median debt ratio

lagrated: dummy, 1 if firm has public debt rating

In Verbeeks Table 10.3 results pertaining the OLS, the within effects and the firstdifference estimators are reported. Robust standard errors have been computed.
Output for the OLS estimation method
> ols <- plm(mdr ~ lag(mdr, 1) + lagebit_ta + lagmb +
lagdep_ta + laglnta + lagfa_ta + lagrd_dum +
lagrd_ta + lagindmedian + lagrated, model = "pooling",
data = debtratio)
> summary(ols)
Oneway (individual) effect Pooling Model
Call:

Models based on panel data

421

plm(formula = mdr ~ lag(mdr, 1) + lagebit_ta + lagmb + lagdep_ta +


laglnta + lagfa_ta + lagrd_dum + lagrd_ta + lagindmedian +
lagrated, data = debtratio, model = "pooling")
Unbalanced Panel: n=3777, T=1-15, N=19573
Residuals :
Min. 1st Qu. Median 3rd Qu.
-0.7840 -0.0593 -0.0225 0.0558

Max.
0.7690

Coefficients :
Estimate Std. Error t-value Pr(>|t|)
(Intercept)
0.05818177 0.01089409
5.3407 9.364e-08 ***
lag(mdr, 1)
0.88350360 0.00455677 193.8880 < 2.2e-16 ***
lagebit_ta
-0.03233775 0.00570742 -5.6659 1.483e-08 ***
lagmb
0.00164320 0.00078139
2.1029 0.0354844 *
lagdep_ta
-0.26051795 0.03346611 -7.7845 7.344e-15 ***
laglnta
-0.00067042 0.00060575 -1.1068 0.2684048
lagfa_ta
0.02012146 0.00514792
3.9087 9.312e-05 ***
lagrd_dum
0.00688957 0.00202285
3.4059 0.0006609 ***
lagrd_ta
-0.12020508 0.01423761 -8.4428 < 2.2e-16 ***
lagindmedian 0.03212249 0.00910841
3.5267 0.0004218 ***
lagrated
0.00713406 0.00291144
2.4504 0.0142803 *
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Total Sum of Squares:
1184.8
Residual Sum of Squares: 306.93
R-Squared
: 0.74093
Adj. R-Squared : 0.74052
F-statistic: 5594.77 on 10 and 19562 DF, p-value: < 2.22e-16
> library(lmtest)
> coeftest(ols, vcov = pvcovHC)
t test of coefficients:

(Intercept)
lag(mdr, 1)
lagebit_ta
lagmb
lagdep_ta
laglnta
lagfa_ta
lagrd_dum
lagrd_ta
lagindmedian

Estimate
0.05818177
0.88350360
-0.03233775
0.00164320
-0.26051795
-0.00067042
0.02012146
0.00688957
-0.12020508
0.03212249

Std. Error t value Pr(>|t|)


0.01073874
5.4179 6.100e-08 ***
0.00530071 166.6764 < 2.2e-16 ***
0.00667448 -4.8450 1.276e-06 ***
0.00067849
2.4219 0.0154502 *
0.03511707 -7.4186 1.232e-13 ***
0.00060327 -1.1113 0.2664423
0.00556306
3.6170 0.0002988 ***
0.00215206
3.2014 0.0013699 **
0.01293281 -9.2946 < 2.2e-16 ***
0.00975079
3.2943 0.0009883 ***

422

Models based on panel data

lagrated
0.00713406 0.00279809
2.5496 0.0107916 *
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Output for the Within estimation method
> within <- plm(mdr ~ lag(mdr, 1) + lagebit_ta + lagmb +
lagdep_ta + laglnta + lagfa_ta + lagrd_dum +
lagrd_ta + lagindmedian + lagrated, model = "within",
data = debtratio)
> summary(within)
Oneway (individual) effect Within Model
Call:
plm(formula = mdr ~ lag(mdr, 1) + lagebit_ta + lagmb + lagdep_ta +
laglnta + lagfa_ta + lagrd_dum + lagrd_ta + lagindmedian +
lagrated, data = debtratio, model = "within")
Unbalanced Panel: n=3777, T=1-15, N=19573
Residuals :
Min. 1st Qu.
Median
-0.61700 -0.04940 -0.00208

3rd Qu.
0.04310

Max.
0.57600

Coefficients :
Estimate Std. Error t-value Pr(>|t|)
lag(mdr, 1)
5.3498e-01 7.6646e-03 69.7992 < 2.2e-16 ***
lagebit_ta
-5.0033e-02 8.0860e-03 -6.1876 6.260e-10 ***
lagmb
2.2776e-03 1.1358e-03 2.0052
0.04495 *
lagdep_ta
-1.2395e-01 5.7544e-02 -2.1541
0.03125 *
laglnta
3.8030e-02 2.0593e-03 18.4678 < 2.2e-16 ***
lagfa_ta
5.9344e-02 1.2635e-02 4.6969 2.664e-06 ***
lagrd_dum
5.9768e-05 5.8840e-03 0.0102
0.99190
lagrd_ta
-6.5676e-02 2.7093e-02 -2.4241
0.01536 *
lagindmedian 1.6722e-01 1.8959e-02 8.8201 < 2.2e-16 ***
lagrated
2.0590e-02 4.6521e-03 4.4259 9.670e-06 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Total Sum of Squares:
307.38
Residual Sum of Squares: 202.75
R-Squared
: 0.3404
Adj. R-Squared : 0.27454
F-statistic: 814.653 on 10 and 15786 DF, p-value: < 2.22e-16
> coeftest(within, vcov = pvcovHC)

Models based on panel data

423

t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
lag(mdr, 1)
5.3498e-01 1.1903e-02 44.9438 < 2.2e-16 ***
lagebit_ta
-5.0033e-02 1.1097e-02 -4.5085 6.575e-06 ***
lagmb
2.2776e-03 1.0083e-03 2.2589 0.0239022 *
lagdep_ta
-1.2395e-01 7.0913e-02 -1.7480 0.0804852 .
laglnta
3.8030e-02 3.0676e-03 12.3974 < 2.2e-16 ***
lagfa_ta
5.9344e-02 1.7073e-02 3.4759 0.0005104 ***
lagrd_dum
5.9768e-05 8.0735e-03 0.0074 0.9940935
lagrd_ta
-6.5676e-02 2.6391e-02 -2.4886 0.0128350 *
lagindmedian 1.6722e-01 2.2355e-02 7.4800 7.823e-14 ***
lagrated
2.0590e-02 5.8272e-03 3.5334 0.0004114 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
To obtain the first-difference we need to have recourse to the following trick by
applying OLS to the differenced series since the argument model="fd" doesnt work
correctly, with the current version (1.3-1) of plm, on unbalanced data with holes and
the current data frame has some holes1 .
Output for the First-difference estimation method
> fdmod01 <- plm(diff(mdr) ~ diff(lag(mdr, 1)) + diff(lagebit_ta) +
diff(lagmb) + diff(lagdep_ta) + diff(laglnta) +
diff(lagfa_ta) + diff(lagrd_dum) + diff(lagrd_ta) +
diff(lagindmedian) + diff(lagrated), model = "pooling",
data = debtratio)
> summary(fdmod01)
Oneway (individual) effect Pooling Model
Call:
plm(formula = diff(mdr) ~ diff(lag(mdr, 1)) + diff(lagebit_ta) +
diff(lagmb) + diff(lagdep_ta) + diff(laglnta) + diff(lagfa_ta) +
diff(lagrd_dum)+diff(lagrd_ta)+diff(lagindmedian)+diff(lagrated),
data = debtratio, model = "pooling")
Unbalanced Panel: n=2996, T=1-14, N=15039
Residuals :
Min. 1st Qu.
Median
-0.83700 -0.05340 -0.00987

3rd Qu.
0.05090

Max.
0.76400

1 The code should be


> fd <- plm(mdr
lag(mdr, 1) + lagebit_ta + lagmb + lagdep_ta + laglnta + lagfa_ta +
lagrd_dum + lagrd_ta + lagindmedian + lagrated, model = "fd", data = debtratio)
Results are consistent with those present in the third edition of Verbeeks book.

424

Models based on panel data

Coefficients :
Estimate Std. Error t-value Pr(>|t|)
(Intercept)
0.0088779 0.0010578
8.3927 < 2.2e-16 ***
diff(lag(mdr, 1)) -0.1138871 0.0093735 -12.1499 < 2.2e-16 ***
diff(lagebit_ta)
-0.0451704 0.0076222 -5.9261 3.169e-09 ***
diff(lagmb)
0.0027903 0.0011550
2.4157
0.01572 *
diff(lagdep_ta)
0.1095609 0.0659193
1.6620
0.09652 .
diff(laglnta)
0.0644041 0.0039349 16.3672 < 2.2e-16 ***
diff(lagfa_ta)
0.1055631 0.0157632
6.6968 2.206e-11 ***
diff(lagrd_dum)
-0.0170642 0.0078069 -2.1858
0.02885 *
diff(lagrd_ta)
-0.0592139 0.0278048 -2.1296
0.03322 *
diff(lagindmedian) 0.1815726 0.0250867
7.2378 4.781e-13 ***
diff(lagrated)
0.0094495 0.0063567
1.4865
0.13716
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Total Sum of Squares:
238.04
Residual Sum of Squares: 231.35
R-Squared
: 0.02813
Adj. R-Squared : 0.028109
F-statistic: 43.4973 on 10 and 15028 DF, p-value: < 2.22e-16
> coeftest(fdmod01, vcov = pvcovHC)
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
0.00887786 0.00094621 9.3825 < 2.2e-16
diff(lag(mdr, 1)) -0.11388713 0.01198604 -9.5016 < 2.2e-16
diff(lagebit_ta)
-0.04517037 0.01012320 -4.4621 8.176e-06
diff(lagmb)
0.00279029 0.00110282 2.5301
0.01141
diff(lagdep_ta)
0.10956092 0.07875898 1.3911
0.16422
diff(laglnta)
0.06440411 0.00510375 12.6190 < 2.2e-16
diff(lagfa_ta)
0.10556310 0.01796875 5.8748 4.323e-09
diff(lagrd_dum)
-0.01706422 0.00907665 -1.8800
0.06013
diff(lagrd_ta)
-0.05921391 0.02865177 -2.0667
0.03878
diff(lagindmedian) 0.18157256 0.02607573 6.9633 3.462e-12
diff(lagrated)
0.00944946 0.00656023 1.4404
0.14977
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " "

***
***
***
*
***
***
.
*
***

By having invoked the packages memisc and tonymisc the function mtable can be
used to collect results in a unique output
> mtable(ols, within)
Calls:
ols: plm(formula = mdr ~ lag(mdr, 1) + lagebit_ta + lagmb + lagdep_ta +
laglnta + lagfa_ta + lagrd_dum + lagrd_ta + lagindmedian +
lagrated, data = debtratio, model = "pooling")

Models based on panel data

425

within: plm(formula = mdr~lag(mdr,1)+lagebit_ta+lagmb+lagdep_ta+


laglnta + lagfa_ta + lagrd_dum + lagrd_ta + lagindmedian +
lagrated, data = debtratio, model = "within")
====================================
ols
within
-----------------------------------(Intercept)
0.058***
(0.011)
lag(mdr, 1)
0.884*** 0.535***
(0.005)
(0.008)
lagebit_ta
-0.032*** -0.050***
(0.006)
(0.008)
lagmb
0.002*
0.002*
(0.001)
(0.001)
lagdep_ta
-0.261*** -0.124*
(0.033)
(0.058)
laglnta
-0.001
0.038***
(0.001)
(0.002)
lagfa_ta
0.020*** 0.059***
(0.005)
(0.013)
lagrd_dum
0.007*** 0.000
(0.002)
(0.006)
lagrd_ta
-0.120*** -0.066*
(0.014)
(0.027)
lagindmedian
0.032*** 0.167***
(0.009)
(0.019)
lagrated
0.007*
0.021***
(0.003)
(0.005)
-----------------------------------R-squared
0.741
0.340
adj. R-squared
0.741
0.275
F (omnibus)
5594.769
814.653
p-val (omnibus)
0.000
0.000
N
19573
19573
====================================
Finally Verbeek obtains estimates for the current dynamic panel data model by using
the Anderson-Hiaso instrumental varaibles estimators and the Arellano-Bond GMM
estimators, which can be obtained by making use of the following code
Anderson-Hsiao instrumental variables2
instrumented diff(lag(mdr,1)), instrument diff(lag(mdr, 2))
> fd <- plm(diff(mdr) ~ diff(lag(mdr, 1)) + diff(lagebit_ta) +
diff(lagmb) + diff(lagdep_ta) + diff(laglnta) +
2 Results

are consistent with those present in the third edition of Verbeeks book.

426

Models based on panel data

diff(lagfa_ta) + diff(lagrd_dum) + diff(lagrd_ta) +


diff(lagindmedian) + diff(lagrated) | . - diff(lag(mdr,
1)) + diff(lag(mdr, 2)), model = "pooling", data = debtratio)
> summary(fd)
Oneway (individual) effect Pooling Model
Instrumental variable estimation
(Balestra-Varadharajan-Krishnakumar's transformation)
Call:
plm(formula = diff(mdr) ~ diff(lag(mdr, 1)) + diff(lagebit_ta) +
diff(lagmb) + diff(lagdep_ta) + diff(laglnta) + diff(lagfa_ta) +
diff(lagrd_dum)+diff(lagrd_ta)+diff(lagindmedian)+diff(lagrated) |
. - diff(lag(mdr, 1)) + diff(lag(mdr, 2)), data = debtratio,
model = "pooling")
Unbalanced Panel: n=2371, T=1-13, N=11732
Residuals :
Min. 1st Qu.
-4.2000 -0.3410

Median 3rd Qu.


0.0378 0.3680

Max.
7.1200

Coefficients :
(Intercept)
diff(lag(mdr, 1))
diff(lagebit_ta)
diff(lagmb)
diff(lagdep_ta)
diff(laglnta)
diff(lagfa_ta)
diff(lagrd_dum)
diff(lagrd_ta)
diff(lagindmedian)
diff(lagrated)

Estimate Std. Error


-0.016634
0.018670
7.033029
5.494342
1.207597
0.970555
0.244267
0.185376
-1.858345
1.577202
-0.521408
0.455800
-1.091279
0.927732
-0.023127
0.056904
0.881936
0.759792
-3.377779
2.746221
-0.272466
0.222713

t-value Pr(>|t|)
-0.8910
0.3730
1.2800
0.2006
1.2442
0.2134
1.3177
0.1876
-1.1783
0.2387
-1.1439
0.2527
-1.1763
0.2395
-0.4064
0.6844
1.1608
0.2458
-1.2300
0.2187
-1.2234
0.2212

Total Sum of Squares:


175.88
Residual Sum of Squares: 6851.9
R-Squared
: 0.0080057
Adj. R-Squared : 0.0079982
F-statistic: -1142.01 on 10 and 11721 DF, p-value: 1
> coeftest(fd, vcov = pvcovHC)
t test of coefficients:

(Intercept)
diff(lag(mdr, 1))

Estimate Std. Error


-0.0166345 0.0058748
7.0330294 0.2525266

t value Pr(>|t|)
-2.8315 0.004641 **
27.8507 < 2.2e-16 ***

Models based on panel data

427

diff(lagebit_ta)
1.2075966 0.1054279 11.4542 < 2.2e-16 ***
diff(lagmb)
0.2442670 0.0183950 13.2790 < 2.2e-16 ***
diff(lagdep_ta)
-1.8583446 0.7100345 -2.6173 0.008875 **
diff(laglnta)
-0.5214084 0.0537632 -9.6982 < 2.2e-16 ***
diff(lagfa_ta)
-1.0912794 0.1782544 -6.1220 9.534e-10 ***
diff(lagrd_dum)
-0.0231265 0.0790461 -0.2926 0.769856
diff(lagrd_ta)
0.8819365 0.2833939
3.1121 0.001862 **
diff(lagindmedian) -3.3777791 0.2218749 -15.2238 < 2.2e-16 ***
diff(lagrated)
-0.2724663 0.0514317 -5.2976 1.194e-07 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
with instrumental variables, with constant3
> fd <- plm(diff(mdr) ~ diff(lag(mdr, 1)) + diff(lagebit_ta) +
diff(lagmb) + diff(lagdep_ta) + diff(laglnta) +
diff(lagfa_ta) + diff(lagrd_dum) + diff(lagrd_ta) +
diff(lagindmedian) + diff(lagrated) | . - diff(lag(mdr,
1)) + lag(mdr, 2), model = "pooling", data = debtratio)
> summary(fd)
Oneway (individual) effect Pooling Model
Instrumental variable estimation
(Balestra-Varadharajan-Krishnakumar's transformation)
Call:
plm(formula = diff(mdr) ~ diff(lag(mdr, 1)) + diff(lagebit_ta) +
diff(lagmb) + diff(lagdep_ta) + diff(laglnta) + diff(lagfa_ta) +
diff(lagrd_dum)+diff(lagrd_ta)+diff(lagindmedian)+diff(lagrated) |
. - diff(lag(mdr,1))+lag(mdr,2),data=debtratio,model="pooling")
Unbalanced Panel: n=2996, T=1-14, N=15039
Residuals :
Min. 1st Qu.
-1.5400 -0.0875

Median 3rd Qu.


0.0069 0.0937

Max.
1.3400

Coefficients :
(Intercept)
diff(lag(mdr, 1))
diff(lagebit_ta)
diff(lagmb)
diff(lagdep_ta)
diff(laglnta)
diff(lagfa_ta)
3 Results

Estimate Std. Error


0.0015331 0.0018349
1.3581587 0.1294915
0.2027082 0.0249465
0.0468071 0.0042789
-0.2268575 0.1110867
-0.0532186 0.0121026
-0.1658858 0.0349078

t-value
0.8355
10.4884
8.1257
10.9391
-2.0422
-4.3973
-4.7521

Pr(>|t|)
0.403434
< 2.2e-16
4.791e-16
< 2.2e-16
0.041152
1.104e-05
2.032e-06

are consistent with those present in the third edition of Verbeeks book.

***
***
***
*
***
***

428

Models based on panel data

diff(lagrd_dum)
-0.0211894 0.0126926 -1.6694 0.095052 .
diff(lagrd_ta)
0.1265905 0.0480137 2.6365 0.008384 **
diff(lagindmedian) -0.5839094 0.0783180 -7.4556 9.432e-14 ***
diff(lagrated)
-0.0524240 0.0116591 -4.4964 6.963e-06 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Total Sum of Squares:
238.04
Residual Sum of Squares: 611.01
R-Squared
: 0.0065943
Adj. R-Squared : 0.0065894
F-statistic: -917.329 on 10 and 15028 DF, p-value: 1
> coeftest(fd, vcov = pvcovHC)
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
0.0015331 0.0011090
1.3825
0.16684
diff(lag(mdr, 1))
1.3581587 0.0484357 28.0404 < 2.2e-16 ***
diff(lagebit_ta)
0.2027082 0.0212571
9.5360 < 2.2e-16 ***
diff(lagmb)
0.0468071 0.0032282 14.4996 < 2.2e-16 ***
diff(lagdep_ta)
-0.2268575 0.1495174 -1.5173
0.12922
diff(laglnta)
-0.0532186 0.0099653 -5.3404 9.410e-08 ***
diff(lagfa_ta)
-0.1658858 0.0360364 -4.6033 4.193e-06 ***
diff(lagrd_dum)
-0.0211894 0.0163588 -1.2953
0.19524
diff(lagrd_ta)
0.1265905 0.0495030
2.5572
0.01056 *
diff(lagindmedian) -0.5839094 0.0469868 -12.4271 < 2.2e-16 ***
diff(lagrated)
-0.0524240 0.0116359 -4.5054 6.675e-06 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Arellano Bond one-step
The function pgmm can be used to perform generalized method of moments estimation
for static or dynamic models with panel data. The main argument are: formula: a
symbolic description for the model to be estimated. The prefered interface is now
to indicate a multi-part formula, the first two parts describing the covariates and
the gmm instruments and, if any, the third part the normal instruments; data: a
data.frame; effect: the effects introduced in the model, one of "twoways" (the
default) or "individual"; model one of "onestep" (the default) or "twosteps";
transformation the kind of transformation to apply to the model: either "d" (the
default value) for the difference GMM model or "ld" for the system GMM; fsm: the
matrix for the one step estimator: one of "I" (identity matrix) or "G" (= D0 D where
D is the first-difference operator) if transformation="d", one of "GI" or "full" if
transformation="ld".
> gmm <- pgmm(mdr ~ lag(mdr) + lagebit_ta + lagmb +
lagdep_ta + laglnta + lagfa_ta + lagrd_dum +

Models based on panel data

429

lagrd_ta + lagindmedian + lagrated | lag(mdr,


2:99), data = debtratio, effect = "individual",
fsm = "I")
> summary(gmm)
Oneway (individual) effect One step model
Call:
pgmm(formula = mdr ~ lag(mdr) + lagebit_ta + lagmb + lagdep_ta +
laglnta + lagfa_ta + lagrd_dum + lagrd_ta + lagindmedian +
lagrated | lag(mdr, 2:99), data = debtratio, effect = "individual",
fsm = "I")
Unbalanced Panel: n=5449, T=1-16, N=27762
Number of Observations Used:
Residuals
Min. 1st Qu.
-0.96820 0.00000

Median
0.00000

15039

Mean
0.00106

3rd Qu.
0.00000

Max.
0.86950

Coefficients
Estimate Std. Error
lag(mdr)
0.4538191 0.0505589
lagebit_ta
0.0490861 0.0151092
lagmb
0.0206848 0.0022631
lagdep_ta
-0.0227627 0.0957791
laglnta
0.0271979 0.0065750
lagfa_ta
-0.0057013 0.0235412
lagrd_dum
-0.0178886 0.0105722
lagrd_ta
0.0209480 0.0324596
lagindmedian 0.1221109 0.0383102
lagrated
-0.0089387 0.0077516
--Signif. codes: 0 "***" 0.001 "**"

z-value Pr(>|z|)
8.9761 < 2.2e-16 ***
3.2488 0.001159 **
9.1402 < 2.2e-16 ***
-0.2377 0.812146
4.1366 3.526e-05 ***
-0.2422 0.808639
-1.6920 0.090640 .
0.6454 0.518697
3.1874 0.001435 **
-1.1531 0.248851
0.01 "*" 0.05 "." 0.1 " " 1

Sargan Test: chisq(104) = 434.3268 (p.value=< 2.22e-16)


Autocorrelation test (1): normal = -9.836012 (p.value=< 2.22e-16)
Autocorrelation test (2): normal = -3.352629 (p.value=0.00040024)
Wald test for coefficients: chisq(10) = 451.9441 (p.value=< 2.22e-16)
Arellano Bond two-steps
> gmm <- pgmm(mdr ~ lag(mdr) + lagebit_ta + lagmb +
lagdep_ta + laglnta + lagfa_ta + lagrd_dum +
lagrd_ta + lagindmedian + lagrated | lag(mdr,
2:99), data = debtratio, effect = "individual",

430

Models based on panel data

model = "twosteps", fsm = "I")


> summary(gmm)
Oneway (individual) effect Two steps model
Call:
pgmm(formula = mdr ~ lag(mdr) + lagebit_ta + lagmb + lagdep_ta +
laglnta + lagfa_ta + lagrd_dum + lagrd_ta + lagindmedian +
lagrated | lag(mdr, 2:99), data = debtratio, effect = "individual",
model = "twosteps", fsm = "I")
Unbalanced Panel: n=5449, T=1-16, N=27762
Number of Observations Used:
Residuals
Min.
-0.945700

1st Qu.
0.000000

Median
0.000000

15039

Mean
0.001129

3rd Qu.
0.000000

Max.
0.860300

Coefficients
Estimate Std. Error
lag(mdr)
0.3868871 0.0725827
lagebit_ta
0.0371411 0.0174741
lagmb
0.0155606 0.0026849
lagdep_ta
0.0784848 0.1090149
laglnta
0.0298020 0.0082703
lagfa_ta
0.0187859 0.0279833
lagrd_dum
-0.0191112 0.0117298
lagrd_ta
-0.0044324 0.0352264
lagindmedian 0.0878274 0.0439521
lagrated
-0.0087387 0.0098295
--Signif. codes: 0 "***" 0.001 "**"

z-value Pr(>|z|)
5.3303 9.805e-08 ***
2.1255 0.033545 *
5.7955 6.812e-09 ***
0.7199 0.471559
3.6035 0.000314 ***
0.6713 0.502013
-1.6293 0.103253
-0.1258 0.899870
1.9983 0.045689 *
-0.8890 0.373987
0.01 "*" 0.05 "." 0.1 " " 1

Sargan Test: chisq(104) = 418.3848 (p.value=< 2.22e-16)


Autocorrelation test (1): normal = -6.258917 (p.value=1.9383e-10)
Autocorrelation test (2): normal = -3.619932 (p.value=0.00014734)
Wald test for coefficients: chisq(10) = 218.1505 (p.value=< 2.22e-16)

11
References
Belgorodski N, Greiner M, Tolksdorf K and Schueller K 2012 rriskDistributions: Fitting
distributions to given data or known quantiles. R package version 1.8. http://CRAN.Rproject.org/package=rriskDistributions
Bolker B and R Development Core Team 2012 bbmle: Tools for general maximum likelihood
estimation. R package version 1.0.5.2. http://CRAN.R-project.org/package=bbmle
Brockwell PJ and Davis RA 1991 Time Series: Theory and Methods, Springer Verlag.
Chambers JM, Cleveland WS, Kleiner B and Tukey PA 1983 Graphical Methods for Data Analysis,
Wadsworth & Brooks/Cole.
Chan K-S, Ripley B 2012 TSA: Time Series Analysis. R package version 1.01. http://CRAN.Rproject.org/package=TSA
Chausse P 2010 Computing Generalized Method of Moments and Generalized Empirical Likelihood
with R. Journal of Statistical Software 34(11), 135, http://www.jstatsoft.org/v34/i11/.
Cookson JA 2012 tonymisc: Functions for Econometrics Output. R package version 1.1.1.
http://CRAN.R-project.org/package=tonymisc
Cribari-Neto F 2004 Asymptotic Inference Under Heteroskedasticity of Unknown Form.
Computational Statistics & Data Analysis 45, 215233.
Croissant Y and Millo G 2008 Panel Data Econometrics in R: The plm Package. Journal of Statistical
Software 27(2), 143, http://www.jstatsoft.org/v27/i02/.
Davidson R and MacKinnon JG 1993 Estimation and Inference in Econometrics, Oxford University
Press.
Elff M 2013 memisc: Tools for Management of Survey Data, Graphics, Programming, Statistics, and
Simulation. R package version 0.96-4. http://CRAN.R-project.org/package=memisc
Faraway
JJ
2002
Practical
Regression
and
Anova
using
R,
July
2002,
http://stat.ethz.ch/CRAN/doc/contrib/Faraway-PRA.pdf.
Flannery MJ and Rangan KP 2006 Partial Adjustment toward Target Capital. Journal of Financial
Economics 41, 4173.
Fox J and Weisberg S 2011 An R Companion to Applied Regression, Second Edition. Thousand
Oaks. CA: Sage. http://socserv.socsci.mcmaster.ca/jfox/Books/Companion
Fox J, Nie Z and Byrnes J 2013 sem: Structural Equation Models. R package version 3.1-3.
http://CRAN.R-project.org/package=sem
Graves S 2012 FinTS: Companion to Tsay (2005) Analysis of Financial Time Series. R package
version 0.4-4. http://CRAN.R-project.org/package=FinTS
Hannan EJ Rissanen J 1982 Recursive Estimation of Mixed Autoregressive-Moving Average Order.
Biometrika 69(1), 8194.
Hardin JW Hilbe JM 2007 Generalized Linear Models and Extensions, Stata Press.
Hyndman RJ, Khandakar Y 2008 Automatic Time Series Forecasting: The forecast Package for R.
Journal of Statistical Software 27(3), 122, http://www.jstatsoft.org/v27/i3/.
Hyndman RJ with contributions from G Athanasopoulos, S Razbash, D Schmidt, Z Zhou and Y
Khan 2013 forecast: Forecasting functions for time series and linear models. R package version
4.03. http://CRAN.R-project.org/package=forecast
Jackman S 2012 pscl: Classes and Methods for R Developed in the Political Science Computational
Laboratory, Stanford University. Department of Political Science, Stanford University. Stanford,
California. R package version 1.04.4. URL http://pscl.stanford.edu/
Jarque CM, Bera A 1987 A Test for Normality of Observations and Regression Residuals.
International Statistical Review 55(2), 163172.
Johnson NL, Kemp AW, Kotz S 2005 Univariate Discrete Distributions. Wiley.
Johnston J, Di Nardo J 1997 Econometric Methods, 4th edn. McGraw-Hill.

432

Kleiber C, Zeileis A 2008 Applied Econometrics with R, Springer-Verlag http://CRAN.Rproject.org/package=AER.


Koenker R 2012 Quantile Regression in R: a vignette. 120, http://cran.rproject.org/web/packages/quantreg/vignettes/rq.pdf.
Koenker R 2013 quantreg: Quantile Regression. R package version 4.97. http://CRAN.Rproject.org/package=quantreg
Konnerth K 2010 egarch: EGARCH simulation and fitting. R package version 1.0.0. http://CRAN.Rproject.org/package=egarch
Long JS, Ervin LH 2000 Using Heteroskedasticity Consistent Standard Errors in the Linear
Regression Model. The American Statistician 54, 217224.
Longhow
Lam
2010
An
introduction
to
R,
January
2010,
http://www.splusbook.com/RIntro/RCourse.pdf.
Lumley T using Fortran code by Alan Miller 2009 leaps: regression subset selection. R package
version 2.9. http://CRAN.R-project.org/package=leaps
MacKinnon JG, White H 1985 Some Heteroskedasticity-Consistent Covariance Matrix Estimators
with Improved Finite Sample Properties. Journal of Econometrics 29, 305325.
Mann HB, Wald A 1943 On the Statistical Treatment of Linear Stochastic Difference Equations.
Econometrica 11, 173220.
McCulloch JH and Kwon HC 1993 U.S. Term Structure Data, 1947-1991. Ohio State working paper
93-6, Ohio State University.
McLeod AI, Zhang Y 2007 Faster ARMA maximum likelihood estimation Computational Statistics
Data Analysis 52(4) http://dx.doi.org/10.1016/j.csda.2007.07.020
McLeod AI, Zhang Y 2008 Improved Subset Autoregression: With R Package. Journal of Statistical
Software 28(2 http://www.jstatsoft.org/v28/i02/.
Mood AM, Graybill F, Boes DC 1974 Introduction to the Theory of Statistics, McGraw-Hill.
Murrell P 2010 hexView: Viewing Binary Files. R package version 0.3-2. http://CRAN.Rproject.org/package=hexView
Osterwald-Lenum M 1992 A Note with Quantiles of the Asymptotic Distribution of the Maximum
Likelihood Cointegration Rank Test Statistics. Oxford Bulletin of Economics and Statistics 55(3),
461472.
Pfaff B 2008 Analysis of Integrated and Cointegrated Time Series with R. Second Edition. Springer,
New York.
Rigby RA and Stasinopoulos DM 2005 Generalized additive models for location, scale and
shape,(with discussion), Appl. Statist., 54, part 3, pp 507-554.
Rigby
B
and
Stasinopoulos
M
2010
The
gamlss.family
distributions,
http://finzi.psych.upenn.edu/R/library/gamlss.dist/doc/Distributions-2010.pdf.
R Development Core Team 2013 R: A language and environment for statistical computing, R
Foundation for Statistical Computing, Vienna, Austria http://www.R-project.org.
R Core Team 2013 foreign: Read Data Stored by Minitab, S, SAS, SPSS, Stata, Systat, dBase, ....
R package version 0.8-53. http://CRAN.R-project.org/package=foreign
Sarkar D 2008 Lattice: Multivariate Data Visualization with R, Springer.
Shapiro SS, Wilk MB 1965 An analysis of variance test for normality (complete samples). Biometrika
52(3 and 4), 591611.
Shumway RH and Stoffer DS 2011 Time Series Analysis and Its Applications. With R Examples,
Springer Verlag.
Stasinopoulos M, Rigby B with contributions from C Akantziliotou, G Heller, R Ospina, N Motpan,
F McElduff, V Voudouris and M Djennad. 2012 gamlss.dist: Distributions to be used for GAMLSS
modelling.. R package version 4.2-0. http://CRAN.R-project.org/package=gamlss.dist
Stoffer D 2012 astsa: Applied Statistical Time Series Analysis. R package version 1.1. http://CRAN.Rproject.org/package=astsa
Toomet O, Henningsen A 2008 Sample Selection Models in R: Package sampleSelection. Journal of
Statistical Software 27(7 http://www.jstatsoft.org/v27/i07/.
Trapletti A and Hornik K 2012 tseries: Time Series Analysis and Computational Finance. R package
version 0.10-30.
Venables WN Ripley BD 2002 Modern Applied Statistics with S. Fourth Edition. Springer, New
York.
Verbeek M 2008 A guide to modern econometrics, 3rd edn. John Wiley.
Verbeek M 2012 A guide to modern econometrics, 4th edn. John Wiley.
White H 1980 A Heteroskedasticity-Consistent Covariance Matrix and a Direct Test for
Heteroskedasticity. Econometrica 48, 817838.
W
urtz D, Chalabi Y and Ellis A 2009 A Discussion of Time Series Objects for R in Finance,
https://www.rmetrics.org/ebooks-tseries.

433

Wuertz D, Chalabi Y with contribution from M Miklovic, C Boudt, P Chausse and others 2012
fGarch: Rmetrics - Autoregressive Conditional Heteroskedastic Modelling. R package version
2150.81. http://CRAN.R-project.org/package=fGarch
Wuertz D, many others and see the SOURCE file 2012 fArma: ARMA Time Series Modelling. R
package version 2160.78. http://CRAN.R-project.org/package=fArma
Zappa D, Bramante R, Nai Ruscone M 2012 Appunti di Metodi Statistici per la Finanza e le
Assicurazioni, Educatt.
Zeileis A 2004 Econometric Computing with HC and HAC Covariance Matrix Estimators. Journal
of Statistical Software 11(10), 117, http://www.jstatsoft.org/v11/i10/.
Zeileis A 2006 Object-oriented Computation of Sandwich Estimators. Journal of Statistical Software
16(9), 1-16. URL http://www.jstatsoft.org/v16/i09/.
Zeileis A 2011 dynlm: Dynamic Linear Regression. R package version 0.3-1. http://CRAN.Rproject.org/package=dynlm
Zeileis A, Hothorn T 2002 Diagnostic Checking in Regression Relationships. R News 2(3), 7-10.
http://CRAN.R-project.org/doc/Rnews/
Zhelonkin M, Genton MG, Ronchetti E 2013 ssmrob: Robust estimation and inference in sample
selection models. R package version 0.2. http://CRAN.R-project.org/package=ssmrob

A
Some useful R functions
This Appendix includes some excerpts from the documentation available on the help
of the R system and some advice regarding the creation of graphs.
The topics regard:

how to install R

how to install and update packages

functions useful to read/import data

how to write a formula to define a model

how to estimate the parameters of a linear model

the package Deducer

436

Some useful R functions

A.1

How to Install R

You can find the R installation files on http://www.R-project.org.

A.2

How to Install and Update Packages

If a package is not present on your R installation, You can download it by using the
option Install package(s) from the menu Packages in the R Console.
It is also possible to use the function install.packages whose main argument is
pkgs a character vector with the names of the packages whose current versions should
be downloaded from the repositories, e.g.
install.packages("lmtest")
You can update the packages available on Your system by using
update.packages(ask=FALSE)
The following code (do not execute if not really needed!) will install all the packages
available on the CRAN site which are not present on Your system (more than 4,000
packages requiring more than 4 GigaBytes of space).
a<-new.packages()
if (length(a)>0) install.packages(a)
The latter code can be useful when R is used without an Internet connection.
??"keyword1 keyword2"
will search keyword1 and keyword2 on the help documentation related to all the
installed packages.

A.3

Data Reading

On Verbeeks site data are saved in the text, Stata and EViews formats, and are
compressed in zip files. We describe the procedures to uncompress a zip file and read
data.
Once having read data in R it is possible to check the consistency of the imported
data with the information contained in the txt file available in the zip file by using
the functions summary, head and tail.

A.3.1

zip files

To see the content of a zip file use the function unzip, available in the utils library,
which is automatically loaded when R starts.
unzip(zipfile, files = NULL, list = FALSE, overwrite = TRUE,
junkpaths = FALSE, exdir = ".", unzip = "internal", setTimes = FALSE)

zipfile: The pathname of the zip file: tilde expansion (see path.expand) will
be performed.

Some useful R functions

437

files: A character vector of recorded filepaths to be extracted: the default is


to extract all files.

list: If TRUE, list the files and extract none.

See the R help for the remaining options and more information.

A.3.2

Reading from a text file

To read from a text file use the command read.table, available in the utils library.
It reads a file in table format and creates a data frame from it, with cases
corresponding to lines and variables to fields in the file.
read.table(file, header = FALSE, sep = "", quote = "\"'",
dec = ".", row.names, col.names,
as.is = !stringsAsFactors,
na.strings = "NA", colClasses = NA, nrows = -1,
skip = 0, check.names = TRUE, fill = !blank.lines.skip,
strip.white = FALSE, blank.lines.skip = TRUE,
comment.char = "#",
allowEscapes = FALSE, flush = FALSE,
stringsAsFactors = default.stringsAsFactors(),
fileEncoding = "", encoding = "unknown", text)

file: the name of the file which the data are to be read from. Each row of the
table appears as one line of the file.
file can also be a complete URL. (For the supported URL schemes, see the
URLs section of the help for url.)

header: a logical value indicating whether the file contains the names of the
variables as its first line. If missing, the value is determined from the file format:
header is set to TRUE if and only if the first row contains one fewer field than
the number of columns.

sep: the field separator character. Values on each line of the file are separated
by this character. If sep = (the default for read.table) the separator is white
space, that is one or more spaces, tabs, newlines or carriage returns.

dec: the character used in the file for decimal points.

row.names: a vector of row names. This can be a vector giving the actual row
names, or a single number giving the column of the table which contains the
row names, or character string giving the name of the table column containing
the row names.
If there is a header and the first row contains one fewer field than the number
of columns, the first column in the input is used for the row names. Otherwise
if row.names is missing, the rows are numbered.

col.names: a vector of optional names for the variables. The default is to use
V followed by the column number.

438

Some useful R functions

as.is: the default behavior of read.table is to convert character variables (which


are not converted to logical, numeric or complex) to factors. The variable as.is
controls the conversion of columns not otherwise specified by colClasses. Its
value is either a vector of logicals (values are recycled if necessary), or a vector
of numeric or character indices which specify which columns should not be
converted to factors.

na.strings: a character vector of strings which are to be interpreted as NA


values. Blank fields are also considered to be missing values in logical, integer,
numeric and complex fields.

See the R help for the remaining arguments and more information.

A.3.3

Reading from a Stata file

To read a Stata file use the function read.dta which reads a file in Stata version 5-11
binary format into a data frame. The function is available in the package foreign.
read.dta(file, convert.dates = TRUE, convert.factors = TRUE,
missing.type = FALSE,
convert.underscore = FALSE, warn.missing.labels = TRUE)
See the R help for more information.

A.3.4

Reading from an EViews file

To read an EViews file use the function readEViews, which is available in the package
foreign.
readEViews(filename, as.data.frame = TRUE)
The messages Skipping boilerplate variable will be returned which warn about
the fact that the 2 variables c and resid, which are always created by default by
EViews, and thus are present in the file you are converting, are not read.
See the R help for more information.

A.3.5

Reading from a Microsoft Excel file

See Appendix A.6.

Some useful R functions

A.4

439

formula{stats}

Description. The generic function formula1 and its specific methods provide a way
of extracting formulae which have been included in other objects.
Usage formula(x, ...)
Arguments x an R object. ... further arguments passed to or from other methods.
Details The models fit by, e.g., the lm and glm functions are specified in a compact
symbolic form. The ~ operator is basic in the formation of such models.
An expression of the form y ~ model is interpreted as a specification that the
response y is modelled by a linear predictor specified symbolically by model. Such a
model consists of a series of terms separated by + operators.
The terms themselves consist of variable and factor names separated by the
interaction : operator.
Such a term is interpreted as the interaction of all the variables and factors
appearing in the term.
In addition to + and :, a number of other operators are useful in model formulae.
The * operator denotes factor crossing:
a*b interpreted as a+b+a:b.
The ^ operator indicates crossing to the specified degree.
(a+b+c)^2 is identical to (a+b+c)*(a+b+c) which in turn expands to a formula
containing the main effects for a, b and c together with their second-order interactions.
The %in% operator indicates that the terms on its left are nested within those on the
right.
a + b %in% a expands to the formula a + a:b.
The - operator removes the specified terms. (a+b+c)^2 - a:b is identical to a + b
+ c + b:c + a:c.
It can also used to remove the intercept term: y ~ x - 1 is a line through the
origin.
A model with no intercept can be also specified as y ~ x + 0 or y ~ 0 + x.
While formulae usually involve just variable and factor names, they can also involve
arithmetic expressions. The formula log(y) ~ a + log(x) is quite legal. When such
arithmetic expressions involve operators which are also used symbolically in model
formulae, there can be confusion between arithmetic and symbolic operator use.
To avoid this confusion, the function I() can be used to bracket those portions of a
model formula where the operators are used in their arithmetic sense.
For example, in the formula y ~ a + I(b+c), the term b+c is to be interpreted as
the sum of b and c.
There are two special interpretations of . in a formula.
1 The

function is available in the package stats which is automatically loaded when R starts.

440

Some useful R functions

The usual one is in the context of a data argument of model fitting functions and
means all columns not otherwise in the formula: see terms.formula.
In the context of update.formula, only, it means what was previously in this part
of the formula.

References
Chambers, J. M. and Hastie, T. J. (1992) Statistical models. Chapter 2 of Statistical
Models in S eds J. M. Chambers and T. J. Hastie, Wadsworth & Brooks/Cole.

Examples

In Section 3.1.4 we used the function


update(regr3.2, . ~ . + prefarea:bedrooms, data = housing)
to add the interaction term between prefarea and bedrooms to the linear model
regr3.2 resulting from:
regr3.1<-lm(log(price) ~ log(lotsize) + bedrooms + bathrms +
airco, data=housing)
regr3.2<-update(regr3.1, . ~ . + driveway + recroom + fullbase +
gashw + garagepl + prefarea + stories, data = housing)

In Section 3.3.3 we used the syntax


model.matrix(~MALE * EDUC + MALE * LNEXPER, indwages)
to define a model matrix containing the main effects of MALE, EDUC and
LNEXPER and their interactions, that is (in a cumbersome mode):
model.matrix(~MALE + EDUC + LNEXPER + MALE:EDUC + MALE:LNEXPER,
indwages)
and then the syntax
model.matrix(~MALE + EDUC * LNEXPER)
for
model.matrix(~MALE + EDUC + LNEXPER + EDUC:LNEXPER)

Some useful R functions

441

linear model

A.5

The function lm is used to fit linear models. It can be used to carry out regression,
single stratum analysis of variance and analysis of covariance (although aov may
provide a more convenient interface for these). It is available in the package stats
which is automatically loaded when R starts.
Usage
lm(formula, data, subset, weights, na.action,
method = "qr", model = TRUE, x = FALSE, y = FALSE, qr = TRUE,
singular.ok = TRUE, contrasts = NULL, offset, ...)
Arguments

formula: an object of class formula (or one that can be coerced to that class): a
symbolic description of the model to be fitted. The details of model specification
are given under Details.

data: an optional data frame, list or environment (or object coercible by


as.data.frame to a data frame) containing the variables in the model. If not
found in data, the variables are taken from environment(formula), typically
the environment from which lm is called.

subset: an optional vector specifying a subset of observations to be used in the


fitting process.

weights: an optional vector of weights to be used in the fitting process. Should


be NULL or a numeric vector. If non-NULL, weighted least squares is used
with weights weights (that is, minimizing sum(w*e^2)); otherwise ordinary
least squares is used. See also Details,

na.action: a function which indicates what should happen when the data
contain NAs. The default is set by the na.action setting of options, and is
na.fail if that is unset. The factory-fresh default is na.omit. Another possible
value is NULL, no action. Value na.exclude can be useful.

method: the method to be used; for fitting, currently only method = "qr" is
supported; method = "model.frame" returns the model frame (the same as
with model = TRUE, see below).

model, x, y, qr: logicals. If TRUE the corresponding components of the fit


(the model frame, the model matrix, the response, the QR decomposition) are
returned.

singular.ok: logical. If FALSE (the default in S but not in R) a singular fit is


an error.

contrasts: an optional list. See the contrasts.arg of model.matrix.default.

offset: this can be used to specify an a priori known component to be included


in the linear predictor during fitting. This should be NULL or a numeric vector of
length equal to the number of cases. One or more offset terms can be included
in the formula instead or as well, and if more than one are specified their sum
is used. See model.offset.

442

Some useful R functions

...: additional arguments to be passed to the low level regression fitting


functions (see below).

Details
Models for lm are specified symbolically. See Section A.4.
If the formula includes an offset, this is evaluated and subtracted from the response.
If response is a matrix a linear model is fitted separately by least-squares to each
column of the matrix.
See model.matrix for some further details. The terms in the formula will be reordered so that main effects come first, followed by the interactions, all second-order,
all third-order and so on: to avoid this pass a terms object as the formula (see aov
and demo(glm.vr) for an example).
A formula has an implied intercept term. To remove this use either y { x - 1 or y {
0 + x. See formula for more details of allowed formulae.
Non-NULL weights can be used to indicate that different observations have different
variances (with the values in weights being inversely proportional to the variances); or
equivalently, when the elements of weights are positive integers wi , that each response
yi is the mean of wi unit-weight observations (including the case that there are wi
observations equal to yi and the data have been summarized).
lm calls the lower level functions lm.fit, etc, see below, for the actual numerical
computations. For programming only, you may consider doing likewise.
All of weights, subset and offset are evaluated in the same way as variables in formula,
that is first in data and then in the environment of formula.
Value
lm returns an object of class "lm" or for multiple responses of class c("mlm", "lm").
The functions summary and anova are used to obtain and print a summary and
analysis of variance table of the results. The generic accessor functions coefficients,
effects, fitted.values and residuals extract various useful features of the value returned
by lm.
An object of class "lm" is a list containing at least the following components:

coefficients: a named vector of coefficients

residuals: the residuals, that is response minus fitted values.

fitted.values: the fitted mean values.

rank: the numeric rank of the fitted linear model.

weights: (only for weighted fits) the specified weights.

df.residual: the residual degrees of freedom.

call: the matched call.

terms: the terms object used.

contrasts: (only where relevant) the contrasts used.

xlevels: (only where relevant) a record of the levels of the factors used in
fitting.

offset: the offset used (missing if none were used).

Some useful R functions

443

y: if requested, the response used.

x: if requested, the model matrix used.

model: if requested (the default), the model frame used.

na.action: (where relevant) information returned by model.frame on the special


handling of NAs.

In addition, non-null fits will have components assign, effects and (unless not
requested) qr relating to the linear fit, for use by extractor functions such as summary
and effects.
Using time series
Considerable care is needed when using lm with time series.
Unless na.action = NULL, the time series attributes are stripped from the variables
before the regression is done. (This is necessary as omitting NAs would invalidate the
time series attributes, and if NAs are omitted in the middle of the series the result
would no longer be a regular time series.)
Even if the time series attributes are retained, they are not used to line up series, so
that the time shift of a lagged or differenced regressor would be ignored. It is good
practice to prepare a data argument by ts.intersect(..., dframe = TRUE), then
apply a suitable na.action to that data frame and call lm with na.action = NULL so
that residuals and fitted values are time series.
Note
Offsets specified by offset will not be included in predictions by predict.lm, whereas
those specified by an offset term in the formula will be.
Author(s)
The design was inspired by the S function of the same name described in Chambers
(1992). The implementation of model formula by Ross Ihaka was based on Wilkinson
& Rogers (1973).
References
Chambers, J. M. (1992) Linear models. Chapter 4 of Statistical Models in S eds J.
M. Chambers and T. J. Hastie, Wadsworth Brooks/Cole.
Wilkinson, G. N. and Rogers, C. E. (1973) Symbolic descriptions of factorial models
for analysis of variance. Applied Statistics, 22, 392-9.

We observe that with time series it is possible to use the function dynlm available
in the package dynlm. See the R help system

444

Some useful R functions

A.6

Deducer

The package Deducer contains an intuitive, cross-platform graphical data analysis


system. It uses menus and dialogs to guide the user efficiently through the data
manipulation and analysis process, and has an Excel like spreadsheet for easy data
frame visualization and editing.
After having called library(Deducer) the R GUI menus are enriched with the
options Deducer, Data, Analysis and Plots.
In particular the option Open Data in the menu Deducer allows SPSS, SAS,
DBase, Stata, Systat, ARFF, Epiinfo, Minitab and Excel files to be imported in
R as data.frames.
Estratto da G. Cantaluppi, Computational Laboratory for Economics. Notes for the students, Educatt, 2013
426

Some useful R functions

A.6

Deducer

Some useful R functions

427

The library Deducer contains an intuitive, cross-platform graphical data analysis


system. It uses menus and dialogs to guide the user eciently through the data
manipulation and analysis process, and has an Excel like spreadsheet for easy data
frame visualization and editing.
After having called library(Deducer) the R GUI menus are enriched with the
options Deducer, Data, Analysis and Plots.
In particular the option Open Data in the menu Deducer allows SPSS, SAS,
DBase, Stata, Systat, ARFF, Epiinfo, Minitab and Excel les to be imported in
R as data.frames.

The menu Data contains tools for data manipulation and the menu Analysis tools
for statistical analyses. Figures A.1 and A.2-A.3 report the interface to dene a linear
model and the output2 .
The menus can be further enriched by calling the library DeducerExtras, with tools
for inferential statistics and multivariate statistical analysis, DeducerPlugInScaling
with tools for reliability and factor analysis, DeducerSpatial with tools for spatial
statistics, DeducerSurvival with tools for survival analysis, and DeducerText with
tools for the analysis of textual data.
Have a look at http://www.deducer.org for more information.

Figure A.1

Interface to dene a linear model with Deducer

2 The corresponding R code is


indwages$EDUC <- as.factor(indwages$EDUC)
summary(lm(LNWAGE ~ MALE + EDUC + LNEXPER, data = indwages))

The menu Data contains tools for data manipulation and the menu Analysis tools
for statistical analyses. Figures A.1 and A.2-A.3 report the interface to define a linear
model and the output2 .
The menus can be further enriched by calling the package DeducerExtras, with
tools for inferential statistics and multivariate statistical analysis, DeducerPlugInScaling
with tools for reliability and factor analysis, DeducerSpatial with tools for spatial
statistics, DeducerSurvival with tools for survival analysis, and DeducerText with
tools for the analysis of textual data.
Have a look at http://www.deducer.org for more information.

2 The corresponding R code is


indwages$EDUC <- as.factor(indwages$EDUC)
summary(lm(LNWAGE ~ MALE + EDUC + LNEXPER, data = indwages))

Some useful R functions

Figure A.1

Interface to define a linear model with Deducer

445

446

Some useful R functions

Figure A.2

Linear model output and diagnostics

Some useful R functions

Figure A.3

Linear model component + residual and added-variable plots

447

B
Addendum 3rd edition
This Chapter can be downloaded from the booksite www.educatt.it/libri/materiali.

EDUCatt - Ente per il Diritto allo Studio Universitario dellUniversit Cattolica


Largo Gemelli 1, 20123 Milano - tel. 02.7234.22.35 - fax 02.80.53.215
e-mail: editoriale.dsu@educatt.it (produzione); librario.dsu@educatt.it (distribuzione)
web: www.educatt.it/libri

Euro 23,00

Das könnte Ihnen auch gefallen