Sie sind auf Seite 1von 44

ARTICLE IN PRESS

Journal of Econometrics 137 (2007) 68111


www.elsevier.com/locate/jeconom

Asymptotic distribution of the cointegrating


vector estimator in error correction models
with conditional heteroskedasticity
Byeongseon Seo
Department of Economics, Soongsil University and Texas A&M University, Seoul 156-743, Korea
Available online 12 May 2006

Abstract
This paper explores the asymptotic distribution of the cointegrating vector estimator in error
correction models with conditionally heteroskedastic errors. Asymptotic properties of the maximum
likelihood estimator (MLE) of the cointegrating vector, which estimates the cointegrating vector and
the multivariate GARCH process jointly, are provided. The MLE of the cointegrating vector follows
mixture normal, and its asymptotic distribution depends on the conditional heteroskedasticity and
the kurtosis of standardized innovations. The reduced rank regression (RRR) estimator and the
regression-based cointegrating vector estimators do not consider conditional heteroskedasticity, and
thus the efciency gain of the MLE emerges as the magnitude of conditional heteroskedasticity
increases. The simulation results indicate that the relative power of the t-statistics based on the MLE
improves signicantly as the GARCH effect increases.
r 2006 Elsevier B.V. All rights reserved.
JEL classification: C13; C32
Keywords: Cointegrating vector; Efciency gain; Multivariate GARCH

1. Introduction
The notion of cointegration was developed by Engle and Granger (1987), and since then
has been considered important in the recent development of time series econometrics.
Many statistical methods have been developed for the analysis of the cointegrated systems,
Tel.: +82 2 820 0552; fax: +82 2 824 4384.

E-mail address: seo@ssu.ac.kr.


0304-4076/$ - see front matter r 2006 Elsevier B.V. All rights reserved.
doi:10.1016/j.jeconom.2006.03.008

ARTICLE IN PRESS
B. Seo / Journal of Econometrics 137 (2007) 68111

69

and several methods of estimating the cointegrating vector have been proposed. Another
development, generalized autoregressive conditional heteroskedasticity (GARCH), was
made by Engle (1982) and Bollerslev (1986) to explain the time-varying volatility in the
data. This paper explores the asymptotic properties of the maximum likelihood estimator
(MLE) of the cointegrating vector in the vector error correction model with conditional
heteroskedasticity. Because the existing estimation methods do not consider conditional
heteroskedasticity in the data, this study is useful and required.
The main objective is to develop the asymptotic properties of the MLE of the
cointegrating vector, which estimates the error correction model and the multivariate
GARCH process jointly. The existing estimation methods, including the reduced rank
regression (RRR) and the regression-based estimators, allow for, but do not treat explicitly
conditional heteroskedasticity. Their asymptotic distributions are invariant to conditional
heteroskedasticity. However, these estimators ignore the information coming from
conditional heteroskedasticity. Many authors, including Bollerslev et al. (1992), show
that economic variables such as stock prices and exchange rates have time-varying
variances. The clustered volatility and thick-tailed distribution are typical characteristics of
these variables. Although there is vast literature on the cointegrating vector and GARCH,
the literature on the distribution theory for the cointegrating vector estimator with
conditionally heteroskedastic errors is still sparse. This paper lls this gap in the literature
by developing an asymptotic theory for the cointegrating vector estimator in error
correction models with conditional heteroskedasticity.
In this paper, we nd that the MLE of the cointegrating vector follows mixture normal,
and its asymptotic distribution depends on the conditional heteroskedasticity and the
kurtosis of standardized errors. The RRR and the regression-based cointegrating vector
estimators do not consider conditional heteroskedasticity in the data, and thus the MLE
improves efciency signicantly. Statistical inference on the cointegrating vector also
depends on heteroskedasticity. The simulation study reveals that the efciency gain of the
MLE emerges signicantly as the GARCH effect increases.
The limiting distribution of the cointegrating vector estimator with heteroskedastic
errors has been explored by Li et al. (2001) and Seo (2001). Li et al. (2001) investigated the
limiting distribution of the cointegrating vector estimator in the partially nonstationary
vector autoregressive model with ARCH(1) errors. We consider the multivariate GARCH
errors, which is a natural extension considering the stylized facts of the real data. The
distribution theory of the cointegrating vector estimator, found by Li et al. (2001), depends
on two correlated Brownian motions, which implies nonstandard asymptotic distribution.
In this paper, we show that the MLE of the cointegrating vector follows the mixed normal
distribution, and provide an explicit analysis of efciency gain. This study also extends Seo
(2001) by allowing for multiple cointegration rank.
There are other related papers by Wong and Li (1997), Ling and McAleer (2003), Ling
and Li (1998, 2003), and Seo (1999). Wong and Li (1997) and Ling and McAleer (2003)
consider the vector autoregressive model with the GARCH errors, but they do not
consider nonstationarity and cointegration. Ling and Li (1998, 2003) and Seo (1999)
explore the asymptotic theory for unit root tests with conditional heteroskedasticity. Here,
we consider the cointegrating vector, and thus extend the former results to the
nonstationary cointegrated
models.
p
d
We denote ! as convergence in probability, ! as convergence in distribution,
respectively, and ) as weak convergence with respect to the uniform metric. BMO

ARTICLE IN PRESS
B. Seo / Journal of Econometrics 137 (2007) 68111

70

represents a Brownian motion with long-run variance O. Also,  is the integer operator,
j  j is the Euclidean norm, and vec is the column-stacking operator.
The paper is organized as follows. Section 2 introduces the model and the cointegrating vector estimators. Section 3 develops the asymptotic theory for the
cointegrating vector estimators. The error correction model with an intercept is analyzed
in Section 4. Section 5 deals with simulation results on the properties of the cointegrating
vector estimators.
2. The model
Consider a p-dimensional time series xt generated by the error correction model (ECM)
as follows:
!0
l
X
Ir
Dxt a
xt1
Gi Dxti ut ,
(1)
b
i1
where a is the p  r adjustment vector, and b is the p  r  r cointegrating vector.
We assume that the cointegration rank is known and equals r. Thus, if we denote Eq. (1)
as PLxt ut , then the rank of P P1 is r. We use the normalization of the
cointegrating vector with respect to the rst r elements of xt . According to our
normalization, the cointegrating relationship wt is dened as follows:
wt b x1t b0 x2t ,

(2)

where x1t is r-dimensional and x2t is p  r-dimensional.


As dened in Engle and Granger (1987), the cointegrating relationship is stationary. Our
model is based on the normalization (2). The cointegrating vector can be identied from
this representation. The same normalization has been used in many studies such as Phillips
(1991).
The error process ut is assumed to be a vector-valued Martingale difference sequence
(MDS) satisfying Eut jFt1 0 and Eut u0t jFt1 Ot , where Ft is the s-eld generated
by xti for i 0; 1; 2; . . . . Thus, our model allows for the time-varying conditional
variance, which generalizes the error condition of Engle and Granger (1987) and Johansen
(1988, 1991).
Many models of multivariate conditional heteroskedasticity have been developed to
explain time-varying covariance, common persistence, and volatility causality. Bollerslev
et al. (1988) proposed vector GARCH and diagonal GARCH models. Each element of
covariance follows the GARCH process, and thus we need to estimate a huge number of
parameters.1 Bollerslev (1990) proposed a multivariate GARCH model with constant
conditional correlation. This model reduces the number of parameters to a manageable
size and it satises positive deniteness, and so the model has been used in many empirical
studies.
Our model is based on the constant-correlation GARCH specication, which has been
proposed by Bollerslev (1990).
0

Ot L1 St L1 ,
1

(3)

For example, if p 3, the vector GARCH model has 78 parameters and the diagonal GARCH model has 18
parameters even though we assume a minimal lag order.

ARTICLE IN PRESS
B. Seo / Journal of Econometrics 137 (2007) 68111

71

where L is a lower triangular matrix and St is a diagonal matrix as follows:


1
0 2
1
0
0 
0
s1t 0
1
0
0  0
C
B
Bl
0 C
B 0 s22t 0   
0  0C
C
B 21 1
C
B
C
B
2
C
B 0
B
0
s



0
3t
C; L B l31 l32 1    0 C
St B
C.
B .
B .
.. C
..
..
..
.. C
..
..
..
C
B .
.
B
B .
.C
.
.
.
. C
.
.
.
A
@ .
A
@
2
l
l
l



1
0
0
0    spt
p1
p2
p3
Dene et Lut , where et is an orthogonalized innovation of ut , satisfying Eet jFt1
0 and Eet e0t jFt1 St . We assume that each element of et follows the GARCH process
as follows:
s2jt oj cj e2jt1 fj s2jt1 ,

(4)

where oj 40, cj X0, and fj X0 for j 1; 2; . . . ; p.


We note that our model is the vector error correction model with multivariate GARCH
errors. The RRR estimator is based on the mean equation, but the MLE estimates the
mean and volatility equations jointly. We use the multivariate GARCH model with
constant correlation coefcient, and our analysis can be extended to other specications
such as the factor GARCH and the asymmetric GARCH models.
If we denote X t1 b as the vector of stationary regressors and U as its coefcient matrix,
then the mean equation (1) can be written as follows:
Dxt UX t1 b ut ,

(5)

where X t1 b w0t1 b; Dx0t1 ; . . . ; Dx0tl 0 and U a; G1 ; G2 ; . . . ; Gl .


0
We dene the parameter vector y b ; y02 0 , where b vecb, y2 vecU0 ; g0 ; l0 0 ,
0
0
0
0 0
g g1 ; g2 ; . . . ; gp , gj oj ; cj ; fj for j 1; 2; . . . ; p, and l l21 ; l31 ; l32 ; . . . ; lpp1 0 .
Let y0 be the true parameter value. We denote ut ut y0 , et et y0 , and St St y0 .
Dene the parameter space Y as y 2 Y  Rk , where k r2p  r ppl p  1=2 3.
Let S ESt o1 be the unconditional variance of the orthogonalized errors et , which
requires fj cj o1 for all j 1; 2; . . . ; p. Thus, the volatility process is stationary, which
implies a moving average representation.
We dene the following:
s 2jt

1
X
oj
cj
fkj e2jtk1 ,
1  fj
k0

for j 1; 2; . . . ; p.
The process s 2jt follows the law of motion (4) with innite past history. However, based
on a sample of fx1 ; x2 ; . . . ; xn g, the volatility process s 2jt cannot be observed by an
econometrician.
The volatility process (4), given the startup condition s2j0 oj =1  fj , has a moving
average representation in the form
s2jt

t1
X
oj
cj
fkj e2jtk1 ,
1  fj
k0

for t 1; 2; 3; . . . ; n and j 1; 2; . . . ; p.

ARTICLE IN PRESS
B. Seo / Journal of Econometrics 137 (2007) 68111

72

The distribution theory of the GARCH process has been based on the unobserved
volatility representation because the nite horizon representation is not stationary. As
discussed in Lee and Hansen (1994) and Lumsdaine (1996), the initial conditions are
asymptotically negligible, and the distribution theory using the nite horizon representation is asymptotically equivalent to that of the innite horizon representation given some
regularity conditions. This paper develops the distribution theory in accordance with the
asymptotic equivalence of these two volatility representations.
The log-likelihood function, with the auxiliary condition that ut jFt1 N0; Ot , is given
by
Ln y n1

n
X

l t y,

(6)

t1

where
l t y  0:5 log jOt yj  0:5u0t yO1
t yut y
 0:5 log jSt yj  0:5e0t yS1
t yet y
!
p
X
e2jt y
2
log sjt y 2
,
 0:5
sjt y
j1
where s2jt y and ejt y satisfy Eqs. (1)(4).
The MLE y^ n can be dened as follows:
y^ n arg max Ln y.

(7)

y2Y

We use the following derivatives:


"
#
p
x2t1 ejt y hj Lx2t2 ejt1 yZjt y
ql t y X 0
Aj 

,

s2jt y
s2jt y
qb
j1
q2 l t y
0
qb qb

p
X
j1

"
A0j Aj  

hj Lx2t2 ejt1 yhj Lx02t2 ejt1 y


x2t1 x02t1

2
1 2Zjt y
s2jt y
s4jt y

hj Lx2t2 ejt1 yx02t1 ejt y


x2t1 ejt yhj Lx02t2 ejt1 y
2
4
sjt y
s4jt y
#
hj Lx2t2 x02t2 Zjt y

,
s2jt y
2

P
k
where Zjt y e2jt y=s2jt y  1, hj Lx2t2 ejt1 y cj t1
k0 fj x2tk2 ejtk1 y, and Aj is
the jth row of A La for j 1; 2; . . . ; p.
P
Because the MLE y^ n maximizes the likelihood function, we get nt1 ql t y^ n =qy 0. In
our model, the conditional variance depends on the cointegrating vector, and thus the rstorder condition accompanies the volatility adjustment. The Hessian matrix and the outer
product of gradients also entail the volatility adjustment.

ARTICLE IN PRESS
B. Seo / Journal of Econometrics 137 (2007) 68111

73

The likelihood function depends on a number of parameters, and redundant parameters


may lead to the singularity error. In particular, the Hessian matrix tends to be nearsingular when the volatility equation is specied with redundant parameters. Thus, it is
necessary to achieve the parsimonious specication by using the associated diagnostic
tests. In some cases, the factor GARCH model and the conditional error correction model
can be used to reduce the number of redundant parameters. If the likelihood function is
specied, the computation of the MLE is feasible in any statistical software, which is
capable of operating the maximum likelihood procedure.
The RRR estimator is based on the mean equation (5), which can be computed by using
RRR (Ahn and Reinsel, 1988) or canonical analysis (Box and Tiao, 1977). Other slope
parameters can be estimated by least squares once the cointegrating vector is estimated.
We denote the RRR estimator as b~ n and other estimators as U~ n . The RRR estimator b~ n is
super-consistent, and thus its estimates can be used as the initial values for an algorithm to
maximize the likelihood function.
3. Main results
If the cointegration rank is known and equals r, then there exist p  r full column rank
0
matrices a and bn satisfying P abn . Let a? and bn? be p  p  r full column rank
0
matrices such that a0? a 0 and bn? bn 0. From the representation theorem by Engle and
Granger (1987), the error correction model (1) has the following representation:
Dxt CLut ,
t
X
xt C1
ui FLut ,

8
9

i1
0

wt bn xt bn FLut ,
b? a0? Pn 1bn? 1 a0? ,
n

10
n

P L PL  P1=1  L, and FL CL


where C1
C1=1  L.
P
The ECM representation holds if Ejut j2 o1 and
k1 kjC k jo1. Thus, xt involves
stochastic trends and a stationary component. Because the null space of C1 is spanned by
0
the cointegration space, bn C1 0 and C1a 0. The cointegration vector eliminates
the stochastic trends; hence, the cointegrating relationship wt is stationary. We denote
C 2 1 as a partitioned matrix of C1 which corresponds to x2t , hence its dimension is
p  r  p.
1=2
Dene the standardized innovations t St et . That is, jt ejt =sjt for j 1; 2; . . . ; p.
We assume the following conditions.
Assumption 1. (a) jt i:i:d:0; 1, E6jt o1, and jt has a continuous and symmetric density
for j 1; 2; . . . ; p.
(b) P
Ejut jm1 o1 for some m1 42.
P1
2
k
(c) 1
k1 k jC k jo1, where CL
k0 C k L and Dxt CLut .
(d) oj Xoj 40 for j 1; 2; . . . ; p.
(e) Y is compact.
(f) For some m1 4m2 42, fwt1 =s2jt ; Dxti =s2jt ; wtk1 e2jtk =s4jt ; Dxtk1 e2jtk =s4jt ; i 1;
2; . . . ; l; j 1; 2; . . . ; p; kX1g is a zero mean, strictly stationary, and strong mixing
process with mixing coefcient ak Okc such that c4m1 m2 =m1  m2 .

ARTICLE IN PRESS
B. Seo / Journal of Econometrics 137 (2007) 68111

74

Assumptions 1(a) and 1(b) imply that fs2jt ; ejt g is strictly stationary and b-mixing (or
absolutely regular) with exponential decay for j 1; 2; . . . ; p as shown by Carrasco and
Chen (2002) and He and Terasvirta (1999). In addition, the volatility process fs2jt g is weakly
stationary from Assumption 1(b), which justies the moving average representation of
fs2jt g. Assumptions 1(b) and 1(c) imply that fDxt ; wt g is squarely integrable. The volatility
processes are strictly positive from Assumption 1(d). Assumption 1(e) implies that the
parameter space is bounded. The volatility parameters are bounded from Assumption 1(b).
Assumption 1(f) can be veried by assuming the smooth density condition because the
ECM representations (8) and (10) imply that the process fwt ; Dxt g is stationary and satises
the sufcient conditions for strong mixing suggested in Chanda (1974) and Gorodetskii
(1977).
The multivariate invariance principle of Phillips and Durlauf (1986) implies the
following:
Lemma 1. Under Assumption 1,
n1=2

ns
X

ut ) Us BMO,

11

t1

n1=2 x2ns ) C 2 1Us,


n1=2

ns
X

wt ) bn F1Us,

12
13

t1

where O EOt .
3.1. Stochastic equicontinuity
The asymptotic theory of the cointegrating vector estimator involves the tightness of the
Hessian matrix and the parameter restriction, which can be veried if consistency holds. As
Saikkonen (1993, 1995) has shown, the asymptotic distribution and consistency of the
cointegrating vector estimator in nonstationary cointegrated models cannot be achieved by
the standard tightness condition, which has been used in the model with stationary
variables such as Andrews (1987) and Newey (1991). The cointegrated systems involve
nonstationary variables with unbounded variance. Besides, the convergence rate of the
cointegrating vector estimator is different from that of short-run parameters. Thus, an
appropriate tightness condition is necessary to show the distribution theory.
We denepa p
diagonal
Dn diagD1n ; D2n , where D1n diagn; n; . . . ; n and

pmatrix

D2n diag n; n; . . . ; n correspond to the parameter vectors b and y2 , respectively.


The gradient vector, the Hessian matrix, and the outer product of gradients can be dened
as follows:
Gn y D1
n

n
X
ql t y
,
qy
t1

H n y D1
n

n
X
q2 l t y 1
D ,
qyqy0 n
t1

ARTICLE IN PRESS
B. Seo / Journal of Econometrics 137 (2007) 68111

75

and
Pn y D1
n

n
X
ql t y ql t y 1
Dn .
qy qy0
t1

Denition 1 (Stochastic equicontinuity). X n y is stochastically equicontinuous on Yd if,


for every 40 and Z40, there exists N; Z such that nXN; Z implies, for all d40,
!
P sup jX n y  X n y0 j4 pZ,
y2Yd

p 0
where Yd fy 2 Yj jDn y  y0 jpdg and Dn y nb ; y02 0 .
Our denition is the tightness condition of Saikkonen (1993, 1995), and it is based on the
normalized parameter space to allow for difference in the convergence rates. We denote
X n y X n y0 o p 1 if X n y is stochastically equicontinuous.
Lemma 2. (1) Under Assumption 1, Ln y Ln y0 o p 1.
(2) Assumption 1 implies
(
)
lim Pn;y0 sup Ln y  Ln y0 o0
n

1,

d
y2N

p
d fy 2 Yj j nb  b 0 j4dg.
for every d40, where N
p
p ^
limn Pn;y0 fy^ n 2 N d g 1 and
nbn  b 0 ! 0, where
pTherefore,

j nb  b0 jpdg.
(3) If Assumption 1 holds and kut k6 o1, then H n y H n y0 o p 1.

N d fy 2 Yj

Lemma 2(1) shows that the likelihood function is stochastically equicontinuous on the
local neighborhood of the true parameter value. As the parameter values deviate farther
from the local neighborhood, the integrated regressors amplify the squared errors, which
lowers the likelihood function sharply. However, the volatility increases at the same time,
which moderates the decline in the likelihood function.
The MLE y^ n exists because the likelihood function is continuous and the parameter
space is compact. The consistency of the MLE b^ n in Lemma 2(2) is based on the sufcient
condition for consistency, which has been used in Wu (1981) and Saikkonen (1995). The
standard theory of consistency does not apply as the model involves the different rates of
convergence. However, this condition holds under Assumption 1, and hence the MLE b^ n is
consistent.
The consistency of the short-run parameters can be based on the consistency of the longrun parameters. Given the convergence rate of b^ n , the analysis of the ECM reduces to that
of the stationary VAR. The standard theory of consistency such as Ling and McAleer
(2003) can be applied to show the consistency of the short-run parameters.
Lemma 2(3) shows that the tightness condition of the Hessian matrix can be satised
under the moment condition kut k6 o1. Lee and Hansen (1994) and Lumsdaine (1996)
derived the stochastic equicontinuity of the Hessian matrix in the GARCH model. In
particular, Lee and Hansen (1994) have shown that kt k4 o1 is sufcient for stochastic
equicontinuity. However, this result cannot be applied to our analysis as our model allows
nonstationary regressors in the mean equation.

ARTICLE IN PRESS
B. Seo / Journal of Econometrics 137 (2007) 68111

76

Because the likelihood function and the Hessian matrix of our model contain
volatility adjustments, the asymptotic theory of the MLE depends on the heavy moment
condition. In particular, the distribution theory requires the tightness of the Hessian
matrix, which can be justied when the errors ut satisfy strong moment conditions.
For example, Ling and McAleer (2003) developed the distribution theory for the
vector ARMA-GARCH model under the moment condition kut k6 o1. Li et al. (2001)
developed the limiting distribution of the cointegrating vector estimator with ARCH(1)
process under the condition kut k4 o1. However, it is not easy to compare the moment
conditions directly because their distribution theory is based on the nite-dimensional
convergence.
The condition of bounded moments may well be treated as the sufcient condition for
the main results, and it may not be a crucial burden in a practical sense. However, as noted
by Lumsdaine (1996), the moment condition restricts the parameter space severely, and
hence the estimated parameter values in empirical studies often fail to satisfy even the
fourth moment condition.
As shown by Carrasco and Chen (2002), the moment condition kut k2m o1 can be
implied by kt k2m o1 and Ejfj cj 2jt jm o1 for an integer mX1 and for all j 1; 2; . . . ; p.
As mentioned before, the moment condition restricts the parameter space seriously when
we set m 2 or 3. Therefore, we assume stochastic equicontinuity of the Hessian matrix
directly and explore the asymptotic distribution of the cointegrating vector estimator
under the minimal restriction on the parameter space.
p

Assumption 2. supy2Yd jH n y  H n y0 j ! 0.
Lemma 3. Under Assumption 1,
p

H 12n y0 ! 0,

P
0.
where H 12n y n3=2 nt1 q2 l t y=qbqy
2
Therefore, under Assumptions 12,
nb^  b 0 hn y0 1 gn y0 o p 1,

(14)

where
gn y n1

n
X
ql t y
,
qb
t1

hn y n2

n
X
q2 l t y
.
0
t1 qb qb

and

3.2. Asymptotic distribution


Let zjt ejt =s2jt and qjt;k ejtk Zjt =s2jt for kX1 and j 1; 2; . . . ; p. Because ejt , zjt ,
and qjt;k are Martingale difference sequences for all j 1; 2; . . . ; p and kX1,
Assumptions 1(a) and 1(b) imply the following from the invariance principle of Phillips

ARTICLE IN PRESS
B. Seo / Journal of Econometrics 137 (2007) 68111

and Durlauf (1986).


0 1=2 Pns
1
0 2
0
1
sj
n
E j s
t1 ejt
B 1=2 Pns
C
B
B Z j s C
B n
C
1
A BM B
t1 zjt A ) @
@
@
P
Qj;k s
0
n1=2 ns
t1 qjt;k

77

1=B2j

C
C,
A

kj 

1x2j;k

(15)

where s2j Es2jt , 1=B2j E1=s2jt , x2j;k Ee2jtk =s4jt , and kj E4jt .
We note that Z j s and Qj;k s are independent for each j 1; 2; . . . ; p and kX1. Also,
Z j s is independent of Qj;k s for each j 1; 2; . . . ; p and kX1.
P
P
k1
2 2
and Xj 1
Let Qj s 1
k1 hj;k Qj;k s BMkj  1Xj , where hj;k cj fj
k1 hj;k xj;k ,
which is nite because hj;k decays exponentially and x2j;k is nite for all j and k. We denote Es,
Zs, and Qs as p-dimensional vectors of E j s, Z j s, and Qj s, respectively. Note that
Es LUs.
Dene W 1 s and W 2 s as follows:
!
!
!
m
0
W 1 s
A0 Zs  Qs
,
BM

0 C 2 1OC 02 1
W 2 s
C 2 1Us
where m A0 S1=2 MS1=2 A, S diags21 ; s22 ; . . . ; s2p , M is a diagonal matrix with the
P
2
2 2
4
element s2j =B2j kj  1H j , and H j 1
k1 hj;k Esj ejtk =sjt for j 1; 2; . . . ; p.
We can show that W 1 s and W 2 s are mutually independent by using the ECM
representation theorem and Eq. (15). We also dene n A0 S1=2 NS1=2 A, where N is a
diagonal matrix with the element s2j =B2j P
2H j for j 1; 2; . . . ; p. If r 1, then m
P
p
p
2
2
2
2 2
2
2 2
j1 Aj =sj sj =Bj kj  1H j  and n
j1 Aj =sj sj =Bj 2H j .
Lemma 4. Under Assumption 1,
Z 1
dW 1 s  W 2 s,
gn y0 )
0


Z 1
 hn y0 ) n 
W 2 sW 02 s ds ,

16
17

and

pn y0 )

m
0

where pn y n2

W 2 sW 02 s ds


,

(18)

Pn

0
t1 ql t y=qbql t y=qb .

If jt is Gaussian for each j, then kj 3 and m n, which implies that the negative
Hessian and the outer product have the same distribution. However, if the distribution of
jt is not normal for some j, then the variance of score does not coincide with that based on
the Hessian matrix, as discussed in White (1982). Thus, statistical inference on the
cointegrating vector depends on the covariance estimation.
To derive the asymptotic distribution of the RRR estimator b~ n , we dene W 1r s
0 1
A S Es BMmr , where mr A0 S1 A. We note that W 1r s is independent of W 2 s.

ARTICLE IN PRESS
B. Seo / Journal of Econometrics 137 (2007) 68111

78

Theorem 1. Under Assumptions 12,


Z 1
1 Z
nb^ n  b0 )
W 2 W 02
0

If Assumptions 1(b), (c), and (e) hold,


Z 1
1 Z
0
~
W 2W 2
nbn  b0 )
0

1
0

W 2 dW 01 n1 .

(19)

W 2 dW 01r m1
r .

(20)

1
0

First, we note that theR asymptotic distribution of the MLE is a mixture normal with a
1
variance of n1 mn1  0 W 2 W 02 1 . Li et al. (2001) considered the cointegrating vector
estimator in the vector autoregressive model with ARCH(1) errors. The limiting
distribution of the cointegrating vector estimator, found by Li et al. (2001), is a functional
of two correlated Brownian motions, which implies that the cointegrating vector estimator
follows nonstandard asymptotic distribution. Theorem 1 shows that the MLE of the
cointegrating vector has the mixed normal distribution, and therefore the inference on the
cointegrating vector can be based on the standard theory.
Second, the RRR estimator
is also asymptotically distributed as mixture normal with a
R1
0 1
variance of m1
r  0 W 2 W 2 . Because the rst-order condition and the Hessian
matrix of the RRR estimator do not accompany the volatility adjustment, Assumptions
1(b), (c), and (e) are sufcient for the limiting distribution of the RRR estimator. The
distribution theory for the RRR estimator has been explored by Johansen (1988, 1991) and
Seo (1998), where white noise errors are assumed. This paper considers conditional
heteroskedastic errors, and we nd that the asymptotic distribution of the RRR estimator
is invariant to conditional heteroskedasticity.
Third, if there is no GARCH effect, then m n mr A0 S1 A because H j 0 and
2 2
sj =Bj 1 for all j 1; 2; . . . ; p. In this case, the asymptotic distribution of the MLE is the
same as that of the RRR estimator.
Fourth, the variance of the MLE depends on the adjustment vector a, correlation matrix
L, kurtosis kj , and the magnitude of the GARCH effect, which can be represented by H j
and s2j =B2j . As a approaches zero, the cointegrating relationship becomes weaker and the
variance of the MLE increases to innity. In the same way, a weak cointegrating
relationship increases the variance of the RRR estimator.
The GARCH effect magnies the unconditional variance of the error term, which leads
to the increase in the variance of the MLE. In our model, the intercept of the volatility
equation is xed. The unconditional variance can be invariant to the GARCH effect when
the intercept varies depending on the GARCH parameters. However, the MLE uses the
information of conditional heteroskedasticity, which lowers the variance of the MLE. The
GARCH effect increases H j and s2j =B2j for j 1; 2; . . . ; p, which generates the gain in
relative efciency of the MLE. On the other hand, the RRR estimator does not consider
the information of conditional heteroskedasticity. Thus, the variance of the RRR
estimator always increases in the GARCH effect.
The relative efciency gain of the MLE compared to the RRR estimator can be
measured by g as follows:
g n1 mn1 mr ,
where G Varnb^n Varnb~n 1 g  I.

(21)

ARTICLE IN PRESS
B. Seo / Journal of Econometrics 137 (2007) 68111

79

The efciency gain g depends on the magnitude of the GARCH effect, the fourth
moment kj , and s2j =B2j for j 1; 2; . . . ; p. To simplify analysis, we dene the partial
efciency gain gj as follows:
gj

s2j =B2j kj  1H j 
s2j =B2j 2H j 2

As the GARCH effect increases, H j increases. Thus, gj decreases, and the relative
efciency of the MLE improves. If the fourth moment kj is larger than 3, gj increases and
lowers the efciency gain. Thus, the efciency of the MLE can be affected by the
specication error. The efciency gain also depends on Jensens ratio s2j =B2j because this
ratio increases in the GARCH effect.
Fig. 1 shows the partial efciency gain of the MLE of the cointegrating vector in an
error correction model with GARCH errors. The theoretical efciency gain is calculated as
the function of the volatility parameters cj and fj . The standardized innovations jt are
assumed to follow the standard normal distribution. If the volatility parameters are not
large, the efciency improves slowly. However, as the volatility parameters become larger,
a signicant amount of efciency gain emerges. The overall efciency gain g can be larger
than the partial efciency gain gj depending on the correlation coefcient and the
adjustment coefcient. Fig. 1 is based on the asymptotic theory, but the small sample
performance will be affected by the estimation error, which increases uncertainty. As the

Fig. 1. GARCH effect and efciency gain.

ARTICLE IN PRESS
B. Seo / Journal of Econometrics 137 (2007) 68111

80

sample size increases, the estimation error decreases and the efciency gain approaches the
theoretical values.
3.3. Statistical inference
Suppose we want to test a set of linear restrictions based on the null hypothesis as
follows:
H0 : Rb rq ,
where R is a q  rp  r matrix, and rq is the q-dimensional vector.
The covariance matrix of the cointegrating vector estimator can be estimated by using
the Hessian matrix and the outer product of the gradient. When the model is correctly
specied, the negative Hessian matrix is equivalent to the outer product matrix. However,
our model cannot be correctly specied, and so in that case we use the robust covariance
estimator. Thus, we may dene three t-statistics or Wald statistics according to the
covariance estimator.
The Wald statistics can be dened according to the covariance estimation method as
follows:
^ 0 1 nRb^  r ,
W jn nRb^  rq 0 R Varj nbR
q
where j I using the information matrix, j P using the inverse of the outer product
matrix, and j W using Whites robust covariance estimator.
To derive the distribution of the Wald statistics, we dene a q-dimensional random
variable J as follows:
"
Z 1
1 #Z 1

1=2
1
0
JQ
R n 
W 2 sW 2 s ds
dW 1 s  W 2 s ,
0

R1

where Q Rn1 mn1  0 W 2 sW 02 s ds1 R0 .


The random variable J follows the standard normal distribution. We dene Qn and Qm
as follows:
"
Z 1
1 #
1
0
W 2 sW 2 s ds
Qn R n 
R0 ,
0

"
Qm R m1 

Z

1
0

W 2 sW 02 s ds

1 #

R0 .

Theorem 2. Under the null hypothesis H0 : Rb rq and Assumptions 12,


) J 0 J,
WW
n

22

W In

23

) J QI J,

and
W Pn ) J 0 QP J,
0

(24)
0

1=2
1=2
where QI Q1=2 Q1
and QP Q1=2 Q1
.
n Q
m Q

ARTICLE IN PRESS
B. Seo / Journal of Econometrics 137 (2007) 68111

81

The Wald statistic based on the robust covariance estimator follows the chi-squared
distribution with q degrees of freedom, where q is the number of restrictions in the null
hypothesis. If we use the information matrix, the Wald statistic follows a chi-squared
distribution up to the scale effect QI . If the cointegration rank r equals 1, m and n become
scalars, and thus QI m=nI q . If the covariance is estimated by the inverse of the outer
product of gradients, the distribution of the Wald statistic is also chi-squared up to the
scale effect QP . By the same token, QP m2 =n2 I q if r 1. Therefore, if the covariance
matrix is estimated by the information matrix or the outer product of gradients, excess
kurtosis tends to amplify the Wald statistics, which may lead to the over-rejection of the
null hypothesis. If the distribution of jt is normal, or if kj 3 for all j, these nuisance
parameters disappear since m n. Thus, the scale effect disappears given normality, or
more generally, kj 3. However, statistical inference using the robust covariance
estimator works properly even without normality.

4. ECM with an intercept


Suppose the nonstationary variable xt contains nonzero mean. It is natural to include an
intercept in the vector error correction model as follows:
!0
l
X
1
Dxt t a
xt1
Gi Dxti ut .
b
i1
The error ut follows the multivariate GARCH process as in the model without intercept.
0
The parameter vector is dened as y b ; t0 ; y02 0 . The likelihood function, score, and
Hessian matrix can be dened in the same way as before. We denote y^ n as the MLE and y~ n
as the RRR estimator.
We use the following asymptotic results:
Lemma 5. Under Assumption 1,
n1=2
n1
n3=2

n
X
ql t y0 d 0
! L Z  QN0; L0 S1=2 MS1=2 L,
qt
t1

n
X
q2 l t y0 p 0 1=2
!L S
NS1=2 L,
0
qt
qt
t1

Z 1
n
X
q2 l t y0
0 1=2
1=2
)
A
S
NS
L

W 2,
0
0
t1 qb qt

where Z Z1 and Q Q1 are the vector-valued random variables defined in Section 3.


Dene the demeaned Brownian motion W 2 s W 2 s 

W 2 s ds.

Theorem 3. Under Assumptions 12,


nb^ n  b0 )

Z
0

W 2 W 2

1 Z
0

W 2 dW 01 n1 .

(25)

ARTICLE IN PRESS
B. Seo / Journal of Econometrics 137 (2007) 68111

82

If Assumptions 1(b), (c), and (e) hold,


Z 1
1 Z
0
nb~ n  b0 )
W 2 W 2
0

1
0

W 2 dW 01r m1
r .

(26)

In the ECM with an intercept, the asymptotic distribution of the MLE is a mixture
R1
0
normal with a variance of n1 mn1  0 W 2 W 2 1 . Also, the RRR estimator is
R 1 0 1
asymptotically distributed as mixture normal with a variance of m1
r  0 W 2 W 2 .
Thus, the cointegrating vector estimators follow the mixed normal asymptotic distribution.
Besides, the efciency gain of the MLE depends on the magnitude of the GARCH effect as
in the model without deterministic trends.
Our results can be extended to the ECM with the deterministic trends. When the data
generating process xt contains the deterministic trends, we consider the ECM with the
corresponding trend variables. In that case, the asymptotic distribution is based on the
detrended Brownian motions. In addition, we may use the detrended variables to reduce
the number of parameters to estimate. The detrended variables remove the deterministic
trends, and the asymptotic distribution of the cointegrating vector estimator is based on
the detrended Brownian motions. Therefore, the cointegrating vector estimator follows the
mixed normal distribution in the ECM with deterministic trends.
5. Simulation evidence
In this section, we examine the nite sample properties of the cointegrating vector
estimators using the Monte Carlo simulation. The experiments are based on a bivariate
error correction model as follows:
!
!
1 0
a1
Dxt
xt1 ut .
b
a2
We also assume et Lut and St Eet e0t jFt , where
!
!
s21t 0
1 0
L
; St
,
0 s22t
l 1
s2jt 1 cj e2jt1 fj s2jt1 ,
ejt sjt jt

and

jt i:i:d:0; 1 for j 1; 2.

We compare the nite sample performance of the MLE of the cointegrating vector to
that of the RRR, the fully modied (FM), and the OLS estimators. The standard errors of
the MLE are calculated from the robust covariance estimator. The experiments are based
on a sample size of 250 and 1000 replications. The process jt is generated by the Gauss
random number generator. The true value of b is set at 1.
First, we study the efciency gain of the MLE of the cointegrating vector. Table 1 shows
the root mean squared error (RMSE) and the mean absolute error (MAE) of the
cointegrating vector estimators. When there is no conditional heteroskedasticity, the MLE
is almost equivalent to the RRR estimator. As the GARCH effect increases, the RMSE
and MAE of the MLE decrease while those of the RRR estimator slowly increase. For

ARTICLE IN PRESS
B. Seo / Journal of Econometrics 137 (2007) 68111

83

Table 1
Efciency gain
a1 , a2 , c1 , f1 , c2 , f2 , l

1; 0; 0; 0; 0; 0; 0
1; 0; 0:25; 0; 0:25; 0; 0
1; 0; 0:5; 0; 0:5; 0; 0
1; 0; 0:75; 0; 0:75; 0; 0
1; 0; 0:95; 0; 0:95; 0; 0
1; 0; 0:25; 0:2; 0:25; 0:2; 0
1; 0; 0:25; 0:45; 0:25; 0:45; 0
1; 0; 0:25; 0:7; 0:25; 0:7; 0
1; 0; 0:25; 0:7; 0:25; 0:7; 0:5
1; 0:5; 0:25; 0:7; 0:25; 0:7; 0
1; 0:5; 0:25; 0:7; 0:25; 0:7; 0

RMSE

MAE

MLE

RRR

FM

OLS

MLE

RRR

FM

OLS

0.0077
0.0069
0.0059
0.0050
0.0044
0.0074
0.0074
0.0066
0.0057
0.0025
0.0076

0.0079
0.0074
0.0080
0.0088
0.0173
0.0081
0.0080
0.0093
0.0072
0.0031
0.0093

0.0147
0.0133
0.0138
0.0166
0.0250
0.0141
0.0141
0.0160
0.0123
0.0057
0.0256

0.0158
0.0146
0.0157
0.0179
0.0219
0.0159
0.0156
0.0177
0.0115
0.0067
0.0478

0.0050
0.0047
0.0040
0.0033
0.0028
0.0049
0.0052
0.0044
0.0036
0.0017
0.0052

0.0054
0.0051
0.0053
0.0056
0.0072
0.0055
0.0055
0.0058
0.0045
0.0021
0.0066

0.0108
0.0098
0.0101
0.0109
0.0133
0.0105
0.0104
0.0111
0.008
0.0042
0.0146

0.0106
0.0097
0.0102
0.0108
0.0128
0.0108
0.0104
0.0119
0.0072
0.0050
0.0297

example, at a1 ; a2 ; c1 ; f1 ; c2 ; f2 ; l 1; 0; 0:95; 0; 0:95; 0; 0 the RMSE and MAE of the


MLE are 75% and 60% lower than those of the RRR estimator, respectively. The RMSE
of the MLE is 30% lower than that of the RRR estimator at a1 ; a2 ; c1 ; f1 ; c2 ;
f2 ; l 1; 0; 0:25; 0:7; 0:25; 0:7; 0.
Table 1 shows that the impact of the parameter cj on the efciency gain is larger than
that of the parameter fj . If the parameter l is different from 0, the RMSE and the MAE
decrease. Thus, the relative efciency of the MLE improves depending on the parameters
in the volatility processes compared to other estimators, which do not consider
heteroskedasticity.
As Table 1 shows, the FM estimator is less efcient to the MLE and the RRR estimator
because it considers neither the short-run dynamics nor conditional heteroskedasticity. The
RMSE and MAE of the FM estimator increase slowly as the volatility parameters increase.
Compared to the FM estimator, the OLS estimator does not treat asymptotic bias, and its
RMSE and MAE increase in the volatility parameters.
Next, we examine the size performance of the t-statistics for the null hypothesis:
H0 : b 1.
Table 2 shows the descriptive statistics, the percentiles, and the coverage rates of the
t-statistics based on the MLE, RRR, FM, and OLS estimators. The coverage rate is
dened as PTou0:05 for the lower 5% size and PT4u0:95 for the upper 5% size, where T
is the t-statistic and u is the critical value. The standard errors of the MLE are based on the
robust covariance matrix estimator.
The descriptive statistics of the t-statistics based on the MLE are close to the properties
of the standard normal distribution. The coverage rates are very close to the true size,
and thus statistical inference on the cointegrating vector can be based on the standard
theory.
Fig. 2 shows the estimated kernel density of the t-statistics of the cointegrating vector
estimators. The estimated density based on the MLE looks very close to the standard

ARTICLE IN PRESS
B. Seo / Journal of Econometrics 137 (2007) 68111

84

Table 2
Size performance of the t-statistics
Descriptive statistics

a 1
MLE
RRR
FM
OLS

Percentiles

Mean

S.D.

Skewness

00 , c1 0,
0.0484
0.0616
0.0552
0.9415

f1 0, c2 0, f2 0,
0.9609
0.0493
1.0225
0.0154
1.0791
0.1021
0.8468
0.0786

Kurtosis

50

95

0.05

0.95

l0
3.3621
2.8984
3.0048
3.0227

1.7115
1.7760
1.7274
2.3296

0.0130
0.0526
0.0436
0.9376

1.5387
1.6079
1.7642
0.4512

0.0550
0.0650
0.0610
0.2050

0.0390
0.0460
0.0550
0.0000

1.6091
1.6140
1.7647
2.3297

0.0390
0.0274
0.0875
0.8723

1.6593
1.6284
1.6087
0.4928

0.0480
0.0470
0.0600
0.1890

0.0530
0.0500
0.0450
0.0000

1.6957
1.5389
1.7616
2.4042

0.0758
0.0387
0.1311
0.8693

1.5011
1.5201
1.5035
0.4231

0.0540
0.0460
0.0620
0.1910

0.0410
0.0350
0.0330
0.0020

a 1 00 , c1 0:25, f1 0, c2 0:25, f2 0, l 0
MLE
0.0046
0.9958
0.0746
3.2712
RRR
0.0106
0.9843
0.0114
3.1257
FM
0.0706
1.0138
0.0751
3.1097
OLS
0.8922
0.8530
0.1537
2.8878
a 1
MLE
RRR
FM
OLS

00 , c1 0:5, f1 0,
0.0705
0.9754
0.0307
0.9609
0.1096
1.0009
0.9032
0.8502

Coverage rate

c2 0:5, f2 0, l 0
0.0058
3.2890
0.0357
3.2549
0.0255
2.9200
0.1717
3.0220

a 1 00 , c1 0:75, f1 0,
MLE
0.0417
1.0145
RRR
0.0514
1.0088
FM
0.0915
1.0160
OLS
0.8917
0.8879

c2 0:75, f2 0, l 0
0.1359
3.2862
0.1007
3.0358
0.2077
3.8452
0.2214
3.4199

1.7106
1.6146
1.7403
2.3956

0.0712
0.0590
0.0772
0.8632

1.6367
1.6125
1.5578
0.5092

0.0590
0.0450
0.0620
0.1950

0.0500
0.0490
0.0420
0.0030

a 1 00 , c1 0:95, f1 0,
MLE
0.0012
1.0686
RRR
0.0351
1.0649
FM
0.1201
1.1283
OLS
0.9188
0.9947

c2 0:95, f2 0, l 0
0.0244
3.1862
0.0283
2.8242
1.7399
21.8168
0.1727
5.6338

1.7797
1.7388
1.9119
2.4226

0.0422
0.0268
0.0813
0.9408

1.7125
1.7888
1.6030
0.7530

0.0630
0.0620
0.0660
0.2170

0.0570
0.0640
0.0460
0.0120

f2 0:2, l 0
3.3390
1.8587
2.9102
1.8149
2.9888
1.7867
2.7293
2.3466

0.0429
0.0878
0.0875
0.9436

1.6206
1.6038
1.6885
0.4890

0.0650
0.0670
0.0630
0.2160

0.0490
0.0460
0.0520
0.0000

l0
1.6937
1.6670
1.7891
2.2193

0.0458
0.0185
0.0739
0.8928

1.7371
1.7099
1.6406
0.4802

0.0540
0.0530
0.0650
0.1910

0.0590
0.0590
0.0500
0.0020

f2 0:7, l 0
3.1390
1.7393
3.3659
1.7458
3.2349
1.8452
3.2786
2.4333

0.0372
0.0793
0.1695
0.9934

1.6738
1.6762
1.5007
0.5298

0.0630
0.0610
0.0730
0.2150

0.0540
0.0540
0.0370
0.0060

0:25, f1 0:7, c2 0:25, f2 0:7, l 0


1.0451
0.0570
3.0441
1.6961
1.0213
0.0492
3.0431
1.6389
1.0662
0.0690
3.1926
1.6852
1.2365
0.5086
3.6378
3.0223

0.0280
0.0060
0.0442
0.7568

1.7383
1.6497
1.7712
0.8945

0.0570
0.0490
0.0550
0.2330

0.0590
0.0520
0.0610
0.0150

a 1
MLE
RRR
FM
OLS

00 , c1 0:25, f1 0:2, c2 0:25,


0.0906
1.0056
0.0208
0.0808
1.0342
0.0058
0.0792
1.0507
0.0326
0.9417
0.8527
0.0207

a 1 00 , c1 0:25, f1 0:45, c2 0:25,


MLE
0.0259
1.0294
0.0217
RRR
0.0220
1.0079
0.0116
FM
0.0768
1.0339
0.0238
OLS
0.8899
0.8357
0.0080
a 1
MLE
RRR
FM
OLS

00 , c1 0:25, f1 0:7, c2 0:25,


0.0340
1.0471
0.0032
0.0521
1.0511
0.1534
0.1839
1.0140
0.0089
0.9524
0.9199
0.1455

a 1  0:50 , c1
MLE
0.0366
RRR
0.0048
FM
0.0077
OLS
0.8649

f2 0:45,
2.8997
2.9878
2.8758
3.0056

ARTICLE IN PRESS
B. Seo / Journal of Econometrics 137 (2007) 68111

85

Table 2 (continued )
Descriptive statistics
Mean

S.D.

Percentiles
Skewness

Kurtosis

a 1 0:50 , c1 0:25, f1 0:7, c2 0:25, f2 0:7,


MLE
0.0435
1.0626
0.0083
3.0819
RRR
0.0110
1.0519
0.0424
2.7972
FM
0.2559
1.0325
0.2624
4.6975
OLS
1.4933
1.0235
0.7144
3.9089

5
l0
1.7816
1.7365
2.0407
3.3739

Coverage rate
50

95

0.05

0.95

0.0561
0.0084
0.2643
1.3836

1.8055
1.7046
1.2801
0.0030

0.0660
0.0610
0.0810
0.3950

0.0660
0.0570
0.0300
0.0000

Fig. 2. Kernel density estimation.

normal distribution for most values of the GARCH parameters. Also, the t-statistics based
on the RRR and FM estimators can be closely approximated by the normal distribution.
However, as Fig. 2 shows, the OLS estimator reveals a large amount of size distortion and
asymmetry.
Next, we investigate the small sample properties on the power of the t-statistics by using
the local alternative hypothesis:
d
Hn : bn 1 .
n

ARTICLE IN PRESS
86

B. Seo / Journal of Econometrics 137 (2007) 68111

If d 0, then the null hypothesis holds. As the local alternative parameter d varies, the
null hypothesis is no longer valid, and the t-statistics tend to reject the null hypothesis.
Table 3 shows the frequency of rejecting the null hypothesis at d 1, 2, 3, 4, 5. At the local
alternative d 3, the MLE rejects 65%, and the RRR estimator rejects 66% of the null
hypothesis at the 5% size if there is no conditional heteroskedasticity. At
a1 ; a2 ; c1 ; f1 ; c2 ; f2 ; l 1; 0; 0:95; 0; 0:95; 0; 0 and d 3, the MLE rejects 92%, and
the RRR estimator rejects 65% of the null hypothesis at the 5% size. Thus, the relative
power of the t-statistic based on the MLE improves as the volatility parameters increase.
On the other hand, the power of the t-statistic based on the RRR or the FM estimator is
invariant to conditional heteroskedasticity.
As Table 3 shows, the impact of the parameter cj on the power is greater than that of the
parameter fj . The moving average representation of a GARCH(1,1) process has the
exponentially decaying coefcient of fj multiplied by the parameter cj at each lag of ejt .
Besides, the power of the tests based on the MLE improves as the correlation parameter l
is different from 0.

Table 3
Power of the t-statistics
d

a1 , a2 , c1 , f1 , c2 , f2 , l

1; 0; 0; 0; 0; 0; 0

1; 0; 0:25; 0; 0:25; 0; 0

1; 0; 0:5; 0; 0:5; 0; 0

1; 0; 0:75; 0; 0:75; 0; 0

1; 0; 0:95; 0; 0:95; 0; 0

1; 0; 0:25; 0:7; 0:25; 0:7; 0

1; 0; 0:25; 0:7; 0:25; 0:7; 0:5

1; 0:5; 0:25; 0:7; 0:25; 0:7; 0

1; 0:5; 0:25; 0:7; 0:25; 0:7; 0

MLE
RRR
FM
MLE
RRR
FM
MLE
RRR
FM
MLE
RRR
FM
MLE
RRR
FM
MLE
RRR
FM
MLE
RRR
FM
MLE
RRR
FM
MLE
RRR
FM

0.2050
0.2140
0.1190
0.2660
0.2360
0.1190
0.3000
0.2340
0.1190
0.3790
0.2390
0.1280
0.5280
0.2720
0.1400
0.3180
0.2360
0.1210
0.3970
0.3090
0.1430
0.6530
0.5490
0.2760
0.2790
0.1810
0.1310

0.4700
0.4920
0.2320
0.5010
0.4730
0.2140
0.5830
0.4740
0.2320
0.7060
0.4870
0.2390
0.7810
0.4970
0.2610
0.5460
0.4210
0.2320
0.6620
0.5550
0.3090
0.8880
0.8380
0.5740
0.4940
0.4010
0.2020

0.6450
0.6610
0.3340
0.6860
0.6510
0.3180
0.7510
0.6590
0.3690
0.8420
0.6370
0.3510
0.9160
0.6500
0.3790
0.7430
0.6600
0.3680
0.8100
0.7360
0.4530
0.9650
0.9390
0.7760
0.6930
0.5880
0.3100

0.7790
0.7910
0.4990
0.8040
0.7830
0.4540
0.8330
0.7660
0.4770
0.9240
0.8000
0.5070
0.9610
0.7510
0.4880
0.8360
0.7580
0.4900
0.9050
0.8370
0.6170
0.9910
0.9860
0.9030
0.7980
0.7010
0.3980

0.8520
0.8540
0.6130
0.8850
0.8670
0.6040
0.9080
0.8520
0.6080
0.9520
0.8360
0.5900
0.9800
0.8020
0.5450
0.8950
0.8140
0.5800
0.9520
0.9200
0.7130
0.9930
0.9810
0.9420
0.8540
0.7910
0.5310

ARTICLE IN PRESS
B. Seo / Journal of Econometrics 137 (2007) 68111

87

Many empirical studies have shown that the conditional variances of nancial variables
reveal common persistence and volatility causality. Therefore, the efciency gain and
powerful inference on the cointegrating vector can be obtained when we use the
information of conditional heteroskedasticity. The simulation evidence indicates the
potential gain of the information contained in the volatility process.
6. Concluding remarks
In this paper, we nd that the asymptotic distribution of the MLE of the cointegrating
vector depends on the conditional heteroskedasticity. This fact implies that the efciency
of the MLE can be improved as the data contains conditional heteroskedasticity. Although
the RRR estimator and the regression-based estimators allow for conditional heteroskedasticity, they do not consider the information coming from conditional heteroskedasticity. As a result, the power of statistical inference on the cointegrating vector
improves if we use the information of conditional heteroskedasticity.
The conventional methods of estimating the cointegrating vector are based on the mean
equation. Because the OLS and GLS estimators are asymptotically equivalent in
nonstationary cointegrated models, the volatility equation has been treated less
importantly. However, Amemiya (1973) has shown that the MLE improves the efciency
of estimators if the heteroskedasticity depends on the parameter of the mean equation in
the linear model with stationary variables. Therefore, this paper extends Amemiyas result
to the nonstationary cointegrated model with conditionally heteroskedastic errors.
As many studies have shown, the nancial variables have time-varying variances and the
GARCH model has been widely used to estimate volatility. There exist many other
specications which are capable of explaining conditional heteroskedasticity. Although we
consider a multivariate GARCH model with constant coefcients of correlation, our main
results can be extended to other heteroskedastic models.
Statistical inference on the cointegration space can be also affected by conditional
heteroskedasticity. If we use information of heteroskedastic errors, the power of the
cointegration test is expected to improve in the same way that the efciency gain of the
cointegrating vector estimator emerges. As this topic requires more complicated analysis,
we leave it to future research.
Acknowledgments
I would like to thank Badi Baltagi, Valentina Corradi, David Drukker, Bruce Hansen,
Dennis Jansen, Qi Li, Joon Park, Peter Robinson, Pentti Saikkonen, and participants at
the 2004 North America Econometric Society Meeting and workshops at Rice University
and Texas A&M University for useful comments and suggestions. Special thanks are owed
to the co-editor and two anonymous referees, who provided detailed and extensive
comments and suggestions. The author gratefully acknowledges the research support from
Soongsil University.
Appendix A. Mathematical proofs
In the appendix, we denote jAj trA0 A1=2 , kAkm EjAjm 1=m , and Yd fy 2
Y j jDn y  y0 jpdg. For simplicity, supt sup1ptpn , and k  k k  k1 . We denote

ARTICLE IN PRESS
B. Seo / Journal of Econometrics 137 (2007) 68111

88

X t y X t y0 o p 1 if, for all d40,


p

sup jX t y  X t y0 j ! 0.
y2Yd

Proof of Lemma 1. By the invariance principle of Phillips and Durlauf (1986),


n1=2

ns
X

ut ) Us BMO.

t1

(1.1) Show n1=2 xns ) C1Us.


We need to show
 !

ns


X

1=2 
P sup n
ut 4 pP
xns  C1


s20;1
t1

!
sup n

1=2

jFLuns j4

! 0.

s20;1

Note that supt kFLut k2 o1 because


kFLut k2 p

1
X

jFj jkut k2 p

j0

1 X
1
X

jC k jkut k2 p

j0 kj1

1
X

kjC k j kut k2 o1.

k1

Thus, fFLut g is uniformly square integrable, which implies


p

sup n1=2 jFLut j ! 0.


t

P
n0
(1.2) Show n1=2 ns
t1 wt ) b F1Us.
We need to show
 !

ns
ns

X
X

n0
1=2 
P sup n
wt  b F1
ut 4 pP


 t1
s20;1
t1

!
sup n

p sup jbn j
bn

k2 jC k jkut k2 o1:

n0

jb F1 Luns j4

s20;1

FL  F1=1  L.
where
P1 F1 L
2
k
jC
jo1, supy2Y jyjo1, and kut k2 o1 imply
k
k1



X
1 X
1
0


k  j  1C k ut 
kbn F1 Lut k2 p sup jbn j


n
b
j0 kj2
1
X

1=2

&

k1

Proof of Lemma 2. The error correction model (1) can be written as follows:
!
!0
Ir
x1t1
Dxt a
GDX t1 ut ,
b
x2t1
where G G1 ; G2 ; . . . ; Gl and DX t1 vecDxt1 ; Dxt2 ; . . . ; Dxtl .

! 0,

ARTICLE IN PRESS
B. Seo / Journal of Econometrics 137 (2007) 68111

89

Let Lj be the jth row of the correlation matrix L. The orthogonalized innovation ejt
Lj ut follows the GARCH(1,1) process
s2jt oj cj e2jt1 fj s2jt1 .
We use the fact that supt supy 1=s2jt yp1=oj o1 by Assumption 1(d). Also, we use
0pfj o1 for j 1; 2; . . . ; p.
1=2
First, we show sup
xt km o1 for 1pmp6. We prove it for m 6.
Pt t kn
Since xt C1 i1 ui FLut , we need to show




t
X

 1=2
C1
ui  o1 and sup kn1=2 FLut k6 o1,
sup n

t 
t
i1
6

where ut ut y0 , and FL CL  C1=1  L.


By using Burkholders inequality and Minkowskis inequality, kut k6 o1 implies
0 

3 11=6





t
t
X
X


 1=2

1
2
sup n
C1
ui  p sup C 1 @En
ui  A



t 
t
i1
i1
6

pC 1 sup t=nkut k6 pC 1 kut k6 o1,


t

sup kn1=2 FLut k6 pn1=2


t

1
X

kjC k jkut k6 op 1,

k1

p
where C 1 108 6=5jC1j.
Thus, supt kn1=2 xt km o1 for 1pmp6 by monotonicity.
Also, we can show that supt kDxt km o1 and supt kwt km o1 for 1pmp6 because
sup kDxt km p sup kCLut km p
t

1
X

jC k jkut km o1

k0

and
0

sup kwt km p sup kbn FLut km p sup jbn j


t

bn

1
X

kjC k jkut km o1.

k1

(2.1) Ln is stochastically equicontinuous.


(2.1.a) Show e2jt y e2jt o p 1.
Because
p
ut y ut  a nb  b0 0 n1=2 x2t1  a  a0 wt1  G  G0 DX t1 ,
ut yu0t y ut u0t o p 1 if kn1=2 x2t k2 o1, kwt k2 o1, and kDxt k2 o1.
We use the following:
ejt y Lj ut y
ejt Lj  Lj0 ut Lj ut y  ut .

ARTICLE IN PRESS
B. Seo / Journal of Econometrics 137 (2007) 68111

90

Since
e2jt y e2jt Lj  Lj0 ut u0t Lj  Lj0 0 Lj ut y  ut ut y  ut 0 L0j
2ejt u0t Lj  Lj0 0 2ejt ut y  ut 0 L0j 2Lj  Lj0 ut ut y  ut 0 L0j ,
p

sup je2jt y  e2jt j ! 0,

y2Yd

for all j 1; 2; . . . ; p if kn1=2 x2t k2 o1, kwt k2 o1, and kDxt k2 o1.
(2.1.b) Show s2jt y s2jt o p 1.
Note that
t1
X
oj
cj
fkj e2jtk1 y
1  fj
k0
"
#
t1
X
o
oj0
j
2
sjt

fkj e2jtk1 y  e2jtk1 
cj
1  fj 1  fj0
k0

s2jt y

t1
X

cj fkj  cj0 fkj0 e2jtk1 .

k0

We use the following:


cj fkj  cj0 fkj0 cj  cj0 fkj cj0 fkj  fkj0
fkj  fkj0 fj  fj0 fk1
fk2
fj0    fk1
j
j
j0
k1
pfj  fj0 kf j ,

where f j maxffj ; fj0 g.


Thus, s2jt y s2jt o p 1 for all j if kn1=2 x2t k2 o1, kwt k2 o1, and kDxt k2 o1.
(2.1.c) Show e2jt y=s2jt y e2jt =s2jt o p 1.
We use the following:
e2jt y
s2jt y

e2jt
s2jt

e2jt y  e2jt 
s2jt y

2jt
s2jt y

s2jt y  s2jt .

Since e2jt y e2jt o p 1, s2jt y s2jt o p 1, and 1=s21t yp1=o1 , e2jt y=s2jt y e2jt =s2jt
o p 1 for all j if kn1=2 x2t k2 o1, kwt k2 o1, and kDxt k2 o1.
(2.1.d) Show l t y l t y0 o p 1, where
"
!#
p
X
e2jt y e2jt
2
2
log sjt y  log sjt

l t y  l t y0 0:5
,
s2jt y s2jt
j1
Since log s2jt y=s2jt ps2jt y  s2jt =s2jt , log s2jt y log s2jt o p 1.
Therefore, l t y l t y0 o p 1 if kn1=2 x2t k2 o1, kwt k2 o1, and kDxt k2 o1.

ARTICLE IN PRESS
B. Seo / Journal of Econometrics 137 (2007) 68111

91

Because
!

!
n
1X
E sup jl t y  l t y0 j
P sup jLn y  Ln y0 j4 p
n t1
y2Yd
y2Yd
!
1
p sup E sup jl t y  l t y0 j ,
 t
y2Yd
Assumption 1 implies that the likelihood function Ln y is stochastically equicontinuous.
^
(2.2) Show the consistency
pof
bn .
p
d fy 2 Yj j nb  b 0 j4dg. We claim
We dene N d fy 2 Yj j nb  b 0 jpdg and N
that, for every d40,
lim Pn;y0 fLn y0 4 sup Ln yg 1.
n

d
y2N

To prove the claim, we use the following:


Ln y0  Ln y
"
#
p
n
e2jt y e2jt
1 XX
2
2

log sjt y  log sjt 2



2n j1 t1
sjt y s2jt
"
#
p
n
s2jt y s2jt y  s2jt e2jt y  e2jt e2jt  s2jt s2jt y  s2jt
1 XX
log 2 


2n j1 t1
s2jt y
s2jt y
sjt
s2jt s2jt y
"
#
p
n
s2jt
s2jt
e2jt y  e2jt e2jt  s2jt s2jt y  s2jt
1 XX

 log 2
1

.
2n j1 t1 s2jt y
sjt y
s2jt y
s2jt s2jt y
Note that
e2jt y Lj ut yu0t yL0j
s2jt y

t1
X
oj
cj
fkj e2jtk1 y,
1  fj
k0

where ut y ut  ab  b0 0 x2t1p
p
a  a0 wt1  Gp G0 DX t1 . 1=2
We use the fact that x2t Op n, supt n1=2 jDxt j ! 0, and sup
ptn 2jwt j ! 0 since Dxt
and wt are uniformly square integrable. Thus, ut y Op n, ejt y Op n, and
d.
1=s2jt y Op n1 if y 2 N
First, s2jt =s2jt y  log s2jt =s2jt y  1X0 for all s2jt =s2jt y40. Thus, limn Pn;y0 fK 1jn yX0g
P
d,
1 for all j and y 2 Y, where K 1jn y 1=n nt1 s2jt =s2jt y  log s2jt =s2jt y  1. If y 2 N
2
2
sjt =sjt y ! 0 as n ! 1, which implies limn Pn;y0 fK 1jn y40g 1.

ARTICLE IN PRESS
B. Seo / Journal of Econometrics 137 (2007) 68111

92

P
d , where K 2jn y 1=n nt1
Second, show limn Pn;y0 fK 2jn yX0g 1 for all j and y 2 N
e2jt y  e2jt =s2jt y.
We use the following:
n1

n
X

e2jt y n1

t1

n
X

e2jt Lj n1

n
X

t1

ut y  ut ut y  ut 0 L0j

t1

Lj  Lj0 n1

n
X

ut u0t Lj  Lj0 0

t1

2n1

n
X

ejt u0t Lj  Lj0 0 2Lj n1

n
X

t1

Since n1
n1

Pn

n
X

t1

t1

n
X

n
X

Pn

0
t1 wt1 ut

op 1, and n1

Pn

t1

DX t1 u0t op 1,

ut y  ut ut y  ut 0 L0j op n.

t1

As Lemma 3 shows, n1


X t1 w0t1 ; DX 0t1 0 .
n1

t1

x2t1 u0t Op 1, n1

e2jt y  e2jt Lj n1

ut ut y  ut 0 L0j .

Pn

0
t1 x2t1 X t1

Op 1 and n1

ut y  ut ut y  ut 0 ab  b0 0 n1

t1

n
X

Pn

t1

X t1 X 0t1 Op 1, where

x2t1 x02t1 b  b0 a0 op n.

t1

d,
Because x2t1 x02t1 =s2jt y Op 1 for all y 2 N
n1

n e2 y  e2
X
jt
jt
t1

s2jt y

Lj ab  b0 0 n1

n
X
x2t1 x02t1
b  b0 a0 L0j op 1
2 y
s
jt
t1

! Lj ab  b0 0 M 22 yb  b0 a0 L0j ,
P
where M 22 y plim n1 nt1 x2t1 x02t1 =s2jt y.
d . If aa0 and M 22 y40 for all
Thus, limn Pn;y0 fK 2jn yX0g 1 and for all j and y 2 N
d , then limn Pn;y0 fK 2jn y40g 1.
y2N
P
P
Third, K 3jn y 1=n nt1 e2jt  s2jt s2jt y  s2jt =s2jt s2jt y 1=n nt1 2jt  11  s2jt =
p

s2jt y ! 0 for all j and y 2 Y because E2jt  11  s2jt =s2jt yjFt1  0 and k2jt  1
1  s2jt =s2jt ykm=2 pkjt k2m 11 kejt k2m =oj o1 for some m42.
Thus, limn Pn;y0 fLn y0 4supy2N d Ln yg 1. The claim implies that if Ln y^ n XLn y0 ,
it must be that y^ n 2 N d .
Therefore,
lim Pn;y0 fy^ n 2 N d gX lim Pn;y0 fLn y^ n XLn y0 g 1.
n

Next, we show that H n y is stochastically equicontinuous.

ARTICLE IN PRESS
B. Seo / Journal of Econometrics 137 (2007) 68111

(2.3) Show H nbb y H nbb y0 o p 1, where H nbb y n1


and
q2 l t y
0
qb qb

p
X

"
A0j Aj

j1

 

93

Pn

t1

n1 q2 l t y=qb qb

hj Lx2t2 ejt1 yhj Lx02t2 ejt1 y


x2t1 x02t1

2
1 2Zjt y
s2jt y
s4jt y

hj Lx2t2 ejt1 yx02t1 ejt y


x2t1 ejt yhj Lx02t2 ejt1 y

2
s4jt y
s4jt y
#
hj Lx2t2 x02t2 Zjt y
.

s2jt y
2

(2.3.a) Show A0j Aj  n1 x2t1 x02t1 =s2jt y A0j0 Aj0  n1 x2t1 x02t1 =s2jt o p 1.
Because
!
!
n1 x2t1 x02t1
n1 x2t1 x02t1
0
0
Aj Aj 
Aj0 Aj0 
s2jt y
s2jt
"
!#
n1 x2t1 x02t1 2
0
2
 Aj Aj 
sjt y  sjt
s2jt s2jt y
"
#
n1 x2t1 x02t1
0
0
Aj Aj  Aj0 Aj0 
s2jt
and A0j Aj  A0j0 Aj0 Aj  Aj0 0 Aj A0j0 Aj  Aj0 , we get A0j Aj  n1 x2t1 x02t1 =s2jt y
A0j0 Aj0  n1 x2t1 x02t1 =s2jt o p 1 if kn1=2 x2t k4 o1, kwt k4 o1, and kDxt k4 o1.
(2.3.b) Show n1 hj Lx2t2 ejt1 yhj Lx02t2 ejt1 y=s4jt y n1 hj0 Lx2t2 ejt1
hj0 Lx02t2 ejt1 =s4jt o p 1, where
t1
X

hj Lx2t2 e1t1 yhj Lx02t2 ejt1 y c2j

0
2
f2k
j x2tk1 x2tk1 ejtk y
k1
X
c2j
fkj flj x2tk1 x02tl1 ejtk yejtl y.
kal

If kn1=2 x2t k6 o1, kwt k6 o1, and kDxt k6 o1, then


x2tk1 x02tk1 e2jtk y
s4jt y

x2tk1 x02tk1 e2jtk


s41j

o p 1

because
x2tk1 x02tk1 e2jtk y
s4jt y

x2tk1 x02tk1 e2jtk


s4jt


x2tk1 x02tk1 e2jtk


s2jt s2jt y

x2tk1 x02tk1 e2jtk y  e2jtk 


s4jt y
!

1
1

s2jt y  s2jt .
s2jt y s2jt

ARTICLE IN PRESS
B. Seo / Journal of Econometrics 137 (2007) 68111

94

As a result, if kn1=2 x2t k6 o1, kwt k6 o1, and kDxt k6 o1, then
n1

hj Lx2t2 ejt1 yhj Lx02t2 ejt1 y


s4jt y

n1

hj0 Lx2t2 ejt1 hj0 Lx02t2 ejt1


o p 1.
s4jt

(2.3.c) Show
n1

hj Lx2t2 ejt1 yhj Lx02t2 ejt1 y


Zjt y
s4jt y

n1

hj0 Lx2t2 ejt1 hj0 Lx02t2 ejt1


Zjt o p 1,
s4jt

where Zjt e2jt =s2jt  1.


Because
x2tk1 x02tk1 e2jtk ye2jt y
s6jt y

x2tk1 x02tk1 e2jtk e2jt


s6jt


x2tk1 x02tk1 e2jtk ye2jt y  e2jtk e2jt 

x2tk1 x02tk1 e2jtk 2jt


s2jt y

s6jt y

!
1
1 1
1

s2jt y  s2jt ,
s4jt y s2jt y s2jt s4jt

we can get the desired results if kn1=2 x2t k6 o1, kwt k6 o1, and kDxt k6 o1.
(2.3.d) Show n1 A0j Aj  hj Lx2t2 ejt1 yx02t1 ejt y=s4jt y n1 A0j0 Aj0  hj0 Lx2t2
ejt1 x02t1 ejt =s4jt o p 1.
If kn1=2 x2t k5 o1, kwt k5 o1, and kDxt k5 o1, then
n1

0
x2tk1 ejtk yx02t1 ejt y
1 x2tk1 ejtk x2t1 ejt
o p 1

n
s4jt
s4jt y

because
x2tk1 ejtk yx02t1 ejt y
x2tk1 ejtk x02t1 ejt

4
s4jt
sjt y
x2tk1 x02t1 ejtk yejt y  ejtk ejt 
s4t
!
x2tk1 x02t1 ejtk jt
1
1


s2jt y  s2jt 
sjt s2jt y
s2jt y s2jt

for all kX1.


(2.3.e) In the same way, kn1=2 x2t k4 o1 and kwt k4 o1 imply that
n1

hj Lx2t2 x02t2 Zjt y


hj0 Lx2t2 x02t2 Zjt
1
o p 1.

n
s2jt y
s2jt

ARTICLE IN PRESS
B. Seo / Journal of Econometrics 137 (2007) 68111

95
0

b n1 q2 l t y0 =
Therefore, Assumption 1 and kut k6 o1 imply that n1 q2 l t y=qbq
0
b o p 1.
qbq
P
(2.4) Show H naa y H naa y0 o p 1, where a veca, H naa y n1 nt1 q2 l t
y=qa qa0 and
"
p
hj Lwt2 bejt1 yhj Lw0t2 bejt1 y
q2 l t y X 0
wt1 bw0t1 b
Lj Lj  
2
0
2
qa qa
sjt y
s4jt y
j1
1 2Zjt y 2

hj Lwt2 bejt1 yw0t1 bejt y


s4jt y

#
wt1 bejt yhj Lw0t2 bejt1 y hj Lwt2 bw0t2 bZjt y
2

,
s2jt y
s4jt y
P
k
where hj Lwt2 bejt1 y cj t1
k0 fj wtk2 bejtk1 y.
Note that wt bw0t b wt w0t o p 1 if kn1=2 x2t k2 o1 and kwt k2 o1 because
p
p
wt bw0t b wt w0t nb  b0 0 n1 x2t x02t nb  b0
p
p
nb  b0 0 n1=2 x2t w0t wt n1=2 x02t nb  b0 .
(2.4.a) Show wt1 bw0t1 b=s21t y wt1 w0t1 =s2jt o p 1.
If kn1=2 x2t k4 o1 and kwt k4 o1, then wt1 bw0t b=s21t y wt1 w0t1 =s2jt o p 1
because
wt1 bw0t b wt1 w0t1 wt1 bw0t b  wt1 w0t1  wt1 w0t1 2

 2 2
s y  s2jt .
s2jt
sjt sjt y jt
s21t y
s2jt y
In the same way as (2.3.b)(2.3.e), we can show that Assumption 1 and kut k6 o1 imply
q2 l t y q2 l t y0

o p 1.
qa qa0
qa qa0

P
(2.5) Show H ngg y H ngg y0 o p 1, where H ngg y n1 nt1 q2 l t y=qg qg0 .
Because q2 l t y=qgi qgj 0 for iaj and i; j 1; 2; . . . ; p, we consider the following:
q2 l t y
1
1 2Zjt y

2
qoj
21  fj 2 s4jt y

q2 l t y

2
qcj

Pt1

k 2
2
k0 fj ejtk1 y
s4jt y

1 2Zjt y

Pt1
oj =1  fj 2 cj k1
kfk1
e2jtk1 y2
q2 l t y
j


1 2Zjt y
2s4jt y
qf2j


oj =1  fj 3
Zjt y.
s2jt y

The proof for stochastic equicontinuity in the GARCH model has been provided in Lee
and Hansen (1994) and Lumsdaine (1996), where the mean equation does not contain

ARTICLE IN PRESS
B. Seo / Journal of Econometrics 137 (2007) 68111

96

regressors. However, in the same way as (2.3), we can show that


q2 l t y q2 l t y0

o p 1
qg qg0
qg qg0
if kn1=2 x2t k6 o1, kwt k6 o1, and kDxt k6 o1.
P
(2.6) Show H nll y H nll y0 o p 1, where H nll y n1 nt1 q2 l t y=ql ql0 .
Because qeit y=qlji uit y1 j4i,
c2j
q2 l t y
u2it y



2
s2jt y
ql2ji
4

Pt1

k
k0 fj uitk1 yejtk1 y
1
s4jt y

2Zjt y

P
k
cj uit yejt y t1
k0 fj uitk1 yejtk1 y

cj

s4jt y

P1

k 2
k0 fj uitk1 y
s2jt y

Zjt y.

(2.6.a) If kn1=2 x2t k4 o1, kwt k4 o1, and kDxt k4 o1, then u2it y=s2jt y u2it =s2jt o p 1
because
u2it y u2it u2it y  u2it 
u2it


s2 y  s2jt .
s2jt s2jt y jt
s2jt y s2jt
s2jt y
(2.6.b) If kn1=2 x2t k4 o1, kwt k4 o1, and kDxt k4 o1, then
c2j

Pt1

k0

fkj uitk1 yejtk1 y


s4jt y

c2j0

Pt1

k
k0 fj0 uitk1 ejtk1
s4jt

o p 1,

because
uitk1 yejtk1 y
uitk1 ejtk1 uitk1 yejtk1 y  uitk1 ejtk1 

s4jt
s4jt y
s4jt y
!
uitk1 ejtk1
1
1

2 s2jt y  s2jt .
2
2
2
sjt sjt y
sjt y sjt
(2.6.c) If kn1=2 x2t k4 o1, kwt k4 o1, and kDxt k4 o1, then
c2j

Pt1

k0

fkj uitk1 yejtk1 y


s4jt y

Zjt y

c2j0

Pt1

k
k0 fj0 uitk1 ejtk1
s4jt

Zjt o p 1.

The cross derivatives entail similar variables, and the same method can be applied to the
proof. Therefore, if Assumption 1 holds and kut k6 o1, then
H n y H n y0 o p 1:
Proof of Lemma 3. Show n3=2

&
Pn

t1 q

l t y0 =qb qy02 op 1.

ARTICLE IN PRESS
B. Seo / Journal of Econometrics 137 (2007) 68111

Pn

97

l t y0 =qb qveca0 0 op 1, where


"
m
X
hj Lx2t2 ejt1 hj Lw0t2 ejt1
q2 l t y0
x2t1 w0t1
0

A
L



2
1 2Zjt
j
j
s2jt
s4jt
qb qveca0 0
j1

(3.1) We show n3=2

t1 q

hj Lx2t2 ejt1 w0t1 ejt


x2t1 ejt hj Lw0t2 ejt1
2
4
sjt
s4jt
#
hj Lx2t2 w0t2 Zjt

,
s2jt
2

P
k
where hj Lx2t2 ejt1 cj t1
k0 fj x2tk2 ejtk1 .
P
n
First, we show that n1 t1 x2t1 w0t1 Op 1.
Z
n
n
X
X
1
0
1
0
n
xt wt n
xt1 Dxt wt )
t1

C1U dU 0 F0 1bn K 1 ,

t1

where K 1 EDx0 w00P


Ev0 w01 , and vt FLut .
n
3=2
0
2
(3.1.a) Show n
t1 x2t1 wt1 =sjt op 1.
2
Note that supt kwt1 =sjt km o1 for some m42 because
0

kwt1 =s2jt km kbn FLut1 =s2jt km p1=oj sup jbn j


bn

1
X

kjC k jkut km o1.

k1

Because fwt1 =s2jt g is strong mixing


to Theorem 3.1
P from Assumption 1(f), we can appealP
of Hansen (1992) to show that n1 nt1 x2t1 w0t1 =s2jt Op 1. Thus, n3=2 nt1 x2t1 w0t1 =
s2jt op 1.
P
(3.1.b) Show n3=2 nt1 hj Lx2t2 ejt1 hj Lw0t2 ejt1 =s4jt op 1, where
hj Lx2t2 ejt1 hj Lw0t2 ejt1

t1
X

h2j;k x2tk1 w0tk1 e2jtk

k1

hj;k hj;l x2tk1 w0tl1 ejtk ejtl .

kal

First, we show that


n3=2

n X
t1
X

h2j;k x2tk1 w0tk1 e2jtk =s4jt op 1.

t1 k1

Lemma 4 of Lee and Hansen (1994) has shown that, for all j and kX1,

!
2

s

jtk
E fkj 2 Ftk1 p1 a.s.
sjt 
Thus, we can show that, for some m42,
sup kfkj wtk1 e2jtk =s4jt km p1=oj kwtk1 km kjtk k22m o1.
t

ARTICLE IN PRESS
B. Seo / Journal of Econometrics 137 (2007) 68111

98

Because fwtk1 e2jtk =s4jt g is strong mixing, we can show that


n
X

n1

hj;k x2tk1 w0tk1 e2jtk =s4jt Op 1

t1

for all j and kX1.


Because hj;k cj fkj decays exponentially, the nite-dimensional convergence implies
n X
t1
X

n1

h2j;k x2tk1 w0tk1 e2jtk =s4jt Op 1.

t1 k0

Second, we show that


n3=2

n X
X

hj;k hj;l x2tk1 w0tl1 ejtl ejtk =s4jt op 1,

t1 kal

for all j.
Without loss of generality, we set kol.
n3=2

n X
X

hj;k hj;l x2tk1 w0tl1 ejtl ejtk =s4jt

t1 kal
n
X
3=2

hj;k hj;l x2tl1 Dx2tl    Dx2tk1 w0tl1 ejtl ejtk =s4jt .

t1 kal
k=2

l=2

Because kfj fj wtl1 ejtl ejtk =s4jt km p1=oj kwtl1 km kjtl km kjtk km o1 for some
m42,
n1

n X
X

hj;k hj;l x2tl1 w0tl1 ejtl ejtk =s4jt Op 1.

t1 kal

Also,
n1

n X
X

hj;k hj;l Dx2tl    Dx2tk1 w0tl1 ejtl ejtk =s4jt Op 1.

t1 kal

(3.1.c) The other parts entail fwtk2 ejtk1 ejtl1 Zjt =s4jt g, fejtk1 wt1 ejt =s4jt g, fwtk2 Zjt =
and fwtk2 ejtk1 ejt =s4jt g. These processes are Martingale difference sequences, and
thus we can applyPTheorem 2.1 of Hansen (1992) to get the desired results.
(3.2) Show n1 nt1 q2 l t y0 =qb qvecG0 0 Op 1, where
"
m
X
hj Lx2t2 ejt1 hj LDX 0t2 ejt1
q2 l t y0
x2t1 DX 0t1

A0j Lj  
2
0
0
2
sjt
s4jt
qb qvecG
j1
s2jt g,

1 2Zjt 2

hj Lx2t2 ejt1 DX 0t1 ejt


s4jt

#
x2t1 ejt hj LDX 0t2 ejt1 hj Lx2t2 DX 0t2 Zjt
2

.
s2jt
s4jt

ARTICLE IN PRESS
B. Seo / Journal of Econometrics 137 (2007) 68111

First, we show that n1


n1

n
X

Pn

t1

xt Dx0ti n1

t1

n
X

99

xt Dx0ti Op 1 for i 0; 1; 2; . . . ; l  1.
xti1 Dxti    Dxt Dx0ti

t1

C1U dU 0 C10 K 2i ,

)
0

where K 2i EDx0 Dx00    Dxi Dx00 Ev0 Dx01 for i 0; 1; 2; . . . ; l  1.


(3.2.a)(3.2.b) Note that supt kDxti =s2jt km o1 for some m42 and i 1; 2; . . . ; l because
kDxti =s2jt km kCLuti =s2jt km p1=oj

1
X

jC k jkut km o1.

k0

Also,
sup kfkj Dxtk1 e2jtk =s4jt km p1=oj kDxtk1 km kjtk k22m o1.
t

Thus, we get the following in the same way as (3.1.a)(3.1.b):


n
X
x2t1 DX 0t1
op 1,
s2jt
t1
n
X
n3=2
hj Lx2t2 ejt1 hj LDX 0t2 ejt1 =s4jt op 1.

n3=2

t1

(3.2.c) The other parts entail


(
)
DX tk2 ejtk1 ejtl1 Zjt ejtk1 DX t1 ejt DX tk2 Zjt DX tk2 ejtk1 ejt
;
;
;
.
s4jt
s4jt
s2jt
s4jt
These processes are Martingale difference sequences, and thus we can get the desired
results.
(3.3) Let gj oj ; cj ; fj 0 for j 1; 2; . . . ; m.
P
Show n1 nt1 q2 l t y0 =qb qg0j Op 1, where
"
#
m
x2t1 ejt qs2jt =qg0j hj Lx2t2 ejt1 qs2jt =qg0j
q2 l t y0 X
0

Aj  

1 2Zjt
s4jt
s4jt
qb qg0
j

j1

and
0
B
B
B
B
B
B
qgj
B
@

qs2jt

1
1  fj
Pt1

k 2
k0 fj ejtk1

P
oj
k1 2
cj t1
ejtk
k1 kfj
1  fj 2

1
C
C
C
C
C.
C
C
A

Lemma 4 of Lee and Hansen (1994) and Lemma 3 of Lumsdaine (1996) have shown that
kqs2jt =qgj 1=s2jt km o1 for some 1pmp6. Furthermore, it can be shown that qs2jt =qgj 1=
s2jt o1 a.s. for gj oj ; cj . Thus, we show the proof for gj fj .

ARTICLE IN PRESS
B. Seo / Journal of Econometrics 137 (2007) 68111

100

P
(3.3.a) Show n1 nt1 x2t1 qs2jt =qfj ejt =s4jt Op 1, where qs2jt =qfj oj =1  fj 2
P
k1 2
cj t1
ejtk .
k1 kfj
1=2
Note that qs2jt =qfj ejt =s4jt is an MDS. Because kqs2jt =qfj ejt =s4jt k2 p1=oj kqs2jt =qfj 1=
2
sjt k2 o1, we can show that
n1

2
n x
X
2t1 qsjt =qfj ejt
t1

s4jt
Pn
3=2

Op 1.

Pt1
hj Lx2t2 ejt1 k1
kfk1
e2jtk =s4jt op 1, where
j
!
t1
t1
X
X
k1 2
3
hj Lx2t2 ejt1
kfj ejtk cj
kf2k
j x2tk1 ejtk

(3.3.b) Show n

t1

k1

k1

cj =fj

lfkj flj x2tk1 ejtk e2jtl .

kal

First, we note that vjt;k

3k=2
fj e3jtk =s4jt

sup Ev2jt;k fjvjt;k jXcgp


t

is uniformly square integrable because

1
sup E6jtk fj3jtk jXcoj g ! 0
oj t

as c ! 1 if kt k6 o1.
k=2
We apply Theorem 3.1 of Hansen (1992) and get the desired result because kfj decays
exponentially.
Second, we show that
n3=2

n X
X

lfkj flj x2tk1 ejtk e2jtl =s4jt op 1.

t1 kal
k=2

In the same way as (3.1.b), kfj flj ejtk e2jtl =s4jt km o1 for some m42 and for all kal.
k=2
Because lfj decays exponentially, we can get the desired result.
(3.3.c) The other part entails ejtk qs2jt =qfj Zjt =s4jt . As the process is an MDS, we can
show the proof in the
P same way as (3.3.a).
(3.4) Show n3=2 nt1 q2 l t y0 =qb ql0 op 1, where
"
hj Lx2t2 ejt1 hj Luit1 ejt1
q2 l t y0
x2t1 uit
A0j  
2
1 2Zjt
2

s
s4jt
qb qlji
jt
#
hj Lx2t2 ejt1 uit ejt hj Lx2t2 uit1 Zjt
4

s4jt
s2jt
for ioj.
We use the following:
qejt
uit
qlji
0

if ioj
otherwise.

(3.4.a) Because uit =s2jt is an MDS and kuit =s2jt k2 p1=oj kuit k2 o1, we can get
n1

n
X
xt1 uit
Op 1.
s2jt
t1

ARTICLE IN PRESS
B. Seo / Journal of Econometrics 137 (2007) 68111

(3.4.b) Show n3=2

Pn

4
t1 hj Lx2t2 ejt1 hj Luit1 ejt1 =sjt

hj Lx2t2 ejt1 hj Luit1 ejt1

t1
X

101

op 1, where

h2j;k x2tk1 uitk e2jtk

k1

hj;k hj;l x2tk1 ejtk uitl ejtl .

kal

First, because kfkj uitk e2jtk =s4jt km o1 for some m42 and fkj decays exponentially,
n3=2

n X
t1
X

h2j;k x2tk1 uitk e2jtk =s4jt op 1.

t1 k1

P P
Next, we can show that n3=2 nt1 kal hj;k hj;l x2tk1 ejtk uitl ejtl =s4jt op 1 in the
same as (3.3.b). P
Therefore, n3=2 nt1 q2 l t y0 =qb qy02 op 1.
Lemma 2implies that y^ n 2 Yd , and hence y 2 Yd , where y 2 y^ n ; y0 .
Therefore, by appealing to Proposition 3.2 of Saikkonen (1993), H n y
H n y0 o p 1, and
Dn y^ n  y0 H n y 1 G n y0
H n y0 1 G n y0 o p 1,
where y 2 y^ n ; y0 . Furthermore, block-diagonality implies that
nb^ n  b 0 hn y0 1 gn y0 o p 1,
which completes the proof.

&

R1
P
Proof of Lemma 4. (4.1) Show n1 nt1 ql t y0 =qb ) 0 W 2 s dW 01 s, where
"
#
m
h
Lx
e
Z
x
e
ql t y0 X
j
2t2
jt1
2t1
jt
jt
A0j 


.
s2jt
s2jt
qb
j1
We denote zjt ejt =s2jt and qjt;k ejtk Zjt =s2jt . Note that fzjt ; qjt;k g is strictly stationary
and ergodic, and an MDS.
Since kzjt k2 p1=oj kejt k2 o1, we can apply Kurtz and Protter (1991) and Hansen (1992)
to get
Z 1
n
X
1
x2t1 zjt )
W 2 dZ j .
n
0

t1

In the same way, kqjt;k k2 p1=oj kejt k2 kjt k24 1o1 for all kX1, and hence
Z 1
n
X
x2t1 qjt;k )
W 2 dQj;k .
n1
0

t1

R1
We denote F jn;k n
t1 x2t1 qjt;k and F j;k 0 W 2 dQj;k . Now, we want to show
F jn ) FP
j , where F jn F jn;1 ; F jn;2 ; . . . and F j F j;1 ; F j;2 ; . . .. We dene a metric
r
1
d 1 f 1
k1 k jf k j, where r42 and f 2 R . Then, the nite-dimensional convergence
and the tightness of the probability measure F jn in R1 , with respect to the metric d 1 f ,
imply the weak convergence F jn ) F j . The detailed proof is given in Hansen (1995,
pp. 11271128).
1

Pn

ARTICLE IN PRESS
B. Seo / Journal of Econometrics 137 (2007) 68111

102

Because
n1

P1

k1 hj;k f j;k

n X
1
X

is d 1 -continuous, we have the following result by using the CMT:

hj;k x2tk1 qjt;k

t1 k1

1
X

hj;k n1

k1
1
X

x2t1 qjt;k op 1

t1
1

hj;k

W 2 dQj;k
0

k1
Z 1

W2

n
X

1
X

hj;k dQj;k

W 2 dQj .
0

k1

P Pt1
show that Q jns ) Qj s, where Q jns n1=2 ns
k1 hj;k qjt;k and Qj s
t1
PAlso,
1
h
Q
.
j;k
j;k
k1
P
Pns P1
t
Because 1
kt hj;k kqjt;k k Ofj and
kt hj;k kqjt;k ko1 for all s 2 0; 1,
t1

 !
ns X
ns X
X

1
t1
X

1=2 
hj;k qjt;k 
hj;k qjt;k 4
P sup n

 t1 k1

s20;1
t1 k1
 !

ns X

X
1


P sup n1=2 
hj;k qjt;k 4


s20;1
t1 kt
!
n X
1
X
pP n1=2
hj;k jqjt;k j4
t1 kt

n1=2

Pn P1
t1

kt

hj;k kqjt;k k

! 0.

Pk
Pt1
Q jns ) Qj s.PLet x2tk1
PThus,
Pk x2t1  i1 Dx2ti for kX1. Since k k1 hj;k
k
t1
i1 Dx2ti qjt;k km=2 p k1 hj;k i1 kDx2ti km kqjt;k km o1 for some m42,
n1

n X
t1
X

hj;k x2tk1 qjt;k

t1 k1

1

n X
t1
X

hj;k x2t1 qjt;k  n

t1 k1

n1

n X
t1
X

1

n X
t1
X
t1 k1

hj;k

k
X

!
Dx2ti qjt;k

i1

hj;k x2t1 qjt;k op 1.

t1 k1

Thus,
n1

n
n X
t1
X
X
hj Lx2t2 ejt1 Zjt
1

n
hj;k x2tk1 qjt;k
s2jt
t1
t1 k1

n1
Z

n X
t1
X

hj;k x2t1 qjt;k op 1

t1 k1
1

W 2 dQj .

)
0

ARTICLE IN PRESS
B. Seo / Journal of Econometrics 137 (2007) 68111

103

R1
P
Therefore, we have n1 nt1 ql t y0 =qb ) 0 W 2 dW 01 , where W 1 s A0 Zs  Qs.
R1
P
0
(4.2) Show n2 nt1 q2 l t y0 =qb qb ) n  0 W 2 sW 02 s ds, where
"
m
X
hj Lx2t2 ejt1 hj Lx02t2 ejt1
q2 l t y0
x2t1 x02t1
0

Aj Aj 
2
1 2Zjt
0
2
sjt
s4jt
qb qb
j1
 2x2t1 ejt hj Lx02t2 ejt1 =s4jt  2hj Lx2t2 ejt1 x02t1 ejt =s4jt
#
hj Lx2t2 x02t2 Zjt =s2jt .
R
0
0
2
2 1
t1 x2t1 x2t1 =sjt ) 1=Bj 0 W 2 W 2 .
P
First, we show that n3=2 nt1 x2t1 x02t1 1=s2jt  E1=s2jt Op 1.
Assumption 1(a) and 1(b) imply that fs2jt g is b-mixing with exponential decay, and so
P
f1=s2jt g. Because k1=s2jt km p1=oj o1 for some m42, we can show that n3=2 nt1 x2t1 x02t1
1=s2jt  E1=s2jt Op 1. Thus,
(4.2.a) Show n2

n
X

n2

Pn

x2t1 x02t1 =s2jt ) E1=s2jt

t1

(4.2.b) Show n

Pn
2

t1

W 2 W 02 1=B2j

1
0

W 2 W 02 .

hj Lx2t2 ejt1 hj Lx02t2 ejt1 =s4jt ) Xj

hj Lx2t2 ejt1 hj Lx02t2 ejt1

t1
X

R1
0

W 2 W 02 , where

h2j;k x2tk1 x02tk1 e2jtk

k1

hj;k hj;l x2tk1 x02tl1 ejtk ejtl .

kal

First, we show that


n3=2

n
X

hj;k x2tk1 x02tk1 sjt;k Op 1,

t1

where sjt;k hj;k e2jtk =s4jt  Ee2jtk =s4jt for all k.


Note that sjt;k is b-mixing with exponential decay and ksjt;k km p2=oj kjt k22m o1 for some
P Pn 2
0
m42 and for all kX1. Thus, we get n3=2 t1
k1
t1 hj;k x2tk1 x2tk1 sjt;k Op 1 by
appealing to Theorem 3.1 of Hansen (1992).
Also, we get
n2

n X
t1
X

h2j;k x2tk1 x02tk1 Ee2jtk =s4jt

t1 k1

n2

n X
t1
X

h2j;k x2t1 x02t1 Ee2jtk =s4jt op 1

t1 k1

since n
Op 1.

Pn
1

Pk

t1 hj;k x2t1

i1

Dx02ti Op 1 and n1

Pn

t1

P
P
hj;k ki1 Dx2ti ki1 Dx02ti

ARTICLE IN PRESS
B. Seo / Journal of Econometrics 137 (2007) 68111

104

Because n1 x2t1 x02t1 Op 1 and


n2

n X
1
X

P1

2
2
4
kt hj;k Eejtk =sjt

Oftj ,

h2j;k x2t1 x02t1 Ee2jtk =s4jt op 1.

t1 kt

Therefore,
n2

n X
t1
X

h2j;k x2tk1 x02tk1 e2jtk =s4jt

t1 k1

n2

n X
t1
X

h2j;k x2tk1 x02tk1 Ee2jtk =s4jt op 1

t1 k1

n2

n
X

x2t1 x02t1

t1
1

Z
) Xj

1
X

h2j;k Ee2jtk =s4jt op 1

k1

W 2 W 02 .

P
Second, we show n2 nt1 hj;k hj;l x2tk1 x02tl1 ejtk ejtl =s4jt op 1 for all kaj.
Without loss of generality, we set l4k.
P P
We have n3=2 nt1 kal hj;k hj;l x2tk1 x02tl1 ejtk ejtl =s4jt Op 1 because ejtk ejtl =s4jt
k=2

l=2

is b-mixing and kfj fj ejtk ejtl =s4jt km p1=oj kt k2m o1 for some m42.
P
(4.2.c) Also, n2 nt1 hj Lx2t2 ejt1 hj Lx02t2 ejt1 Zjt =s4jt op 1 because
n3=2

n
X

x2tk1 x02tk1 e2jtk Zjt =s4jt Op 1

t1

and
n3=2

n
X

x2tk1 x02tl1 ejtk ejtl Zjt =s4jt Op 1 for all kX1 and kal.

t1

P
(4.2.d) In P
the same way, n3=2 nt1 x2tk1 x02t1 ejtk ejt =s4jt Op 1 for all kX1. Hence,
we have n2 Pnt1 x2t1 ejt hj Lx02t2 ejt1 =s4jt op 1.
P
(4.2.e) n2 nt1 hj Lx2t2 x02t2 Zjt =s2jt op 1 since n3=2 nt1 x2tk1 x02tk1 Zjt =s2jt
Op 1 for all kX1.
R1
P
0
Therefore, we have n2 nt1 q2 l t y0 =qb qb ) n  0 W 2 sW 02 s ds.
R
P
ql t y0 =qb 0 ) m  1 W 2 W 0 , where
(4.3) Show n2 nt1 ql t y0 =qb
2
0
m
X
ql t y0 ql t y0

A0j Aj  x2t1 x02t1 e2jt =s4jt hj Lx2t2 ejt1 hj Lx02t2 ejt1 Z2jt =s4jt
0
qb
qb
j1

 2hj Lx2t2 ejt1 x02t1 ejt Zjt =s4jt .


(4.3.a) We apply Theorem 3.1 of Hansen (1992) to show that
n3=2

n
X
t1

x2t1 x02t1 e2jt =s4jt  Ee2jt =s4jt Op 1.

ARTICLE IN PRESS
B. Seo / Journal of Econometrics 137 (2007) 68111

105

Thus,

2

n
X

x2t1 x02t1 e2jt =s4jt

Ee2jt =s4jt

t1

W 2 W 02

1=B2j

1
0

W 2 W 02 .

(4.3.b) Since EZ2jt kj  1,

2

n
X

hj Lx2t2 ejt1 hj Lx02t2 ejt1 Z2jt =s4jt

Z
) kj  1Xj

t1

W 2 W 02

in the same way as (4.2.b).


P
(4.3.c) Since fejtk ejt Zjt =s4jt g is an MDS, n3=2 nt1 x2tk1 x02t1 ejtk ejt Zjt =s4jt Op 1 for
P
all kX1. Thus, n2 nt1 hj Lx2t2 ejt1 x02t1 ejt Zjt =s4jt op 1 in the same way as (4.2.c).
R1
P
Therefore, n2 nt1 gt y0 g0t y0 ) m  0 W 2 sW 02 s ds. &
Proof of Theorem 1. By using Lemmas 14, we have
nb^ n  b 0 hn y0 1 gn y0 o p 1

1 Z 1

Z 1
) n
W 2 W 02
dW 1  W 2
0
0
!
Z
1 Z
1

vec
0

W 2 W 02

W 2 dW 01 n1 .

The RRR estimator b~ n is based on the following likelihood function:


Ln b; U; L n1

n
X

l t b; U; L,

t1

P
Pm 2
2
2
2
where l t b; U; L 0:5 m
j1 log sj  0:5
j1 ejt b; U; L=sj , et b; U; L Lut b; U, sj
2
Esjt , and ut b; U satises Eq. (5).
We have the following derivatives:
m
x2t1 ejt
ql t b 0 ; U0 ; L0 X
A0j 
,

s2j
qb
j1

m
X
q l t b 0 ; U0 ; L0
x2t1 x02t1

A0j Aj 
.
0
s2j
qb qb
j1
2

By using the previous results, we can show that the RRR estimator b~ n has an asymptotic
distribution as (20). &

ARTICLE IN PRESS
B. Seo / Journal of Econometrics 137 (2007) 68111

106

Proof of Theorem 2. If we use the information matrix,


Z

Z 1
1 !
I
0
0
1
0
Wn )
dW 1  W 2
W 2W 2
n 
0

"
0

R R n

1

Z


0

R n

1

Z

0

W 2 W 02

W 2 W 02

1 ! #1
R0

1 !Z


dW 1  W 2

1=2
J 0 Q1=2 Q1
J.
n Q

We can show other parts in the same way. &


P
d
Proof of Lemma 5. (5.1) Show n1=2 nt1 ql t y0 =qt ! L0 Z  QN0; L0 S1=2 MS1=2 L,
where
"
#
m
h
Le
Z
e
ql t y0 X
j
jt1
jt
jt

L0j 2 
.
qt
sjt
s2jt
j1
Let zjt ejt =s2jt , and qjt;k ejtk Zjt =s2jt . Since fzjt g and qjt;k are Martingale difference
sequences,
!
n
X
1
d
1=2
zjt ! Zj N 0; 2 ,
n
Bj
t1
n1=2

n
X

qjt;k ! Qj;k N0; kj  1x2j;k ,

t1

for each kX1.


P1
P P
Let Qjn n1=2 nt1 1
k1 hj;k qjt;k . Because kqjt;k k2 o1 and
k1 hj;k kqjt;k k2 o1,
d P1
Qjn ! k1 hj;k Qj;k Qj .
P
t
Since 1
kt hj;k kqjt;k k Ofj , we have the following result in the same way as (4.1).
Q jn n1=2

n X
t1
X

hj;k qjt;k ! Qj .

t1 k1

Therefore, n

1=2

Pn

t1

Pn
1

ql t y0 =qt ! L0 Z  Q.
p

q2 l t y0 =qtqt0 ! L0 S1=2 NS1=2 L, where


"
#
m
Zjt
hj Lejt1 2
hj Lejt1 ejt
q2 l t y0 X
1
0

Lj Lj 2 2
1 2Zjt  4
 hj 1 2 .

qt qt0
s4jt
s4jt
sjt
sjt
j1

(5.2) Show n

t1

(5.2.a) Because f1=s2jt g is strictly stationary and ergodic, and k1=s2jt kp1=oj o1,
!
n
X
p
1
1
1
!E 2 2.
n1
2
s
s
B
jt
j
t1 jt

ARTICLE IN PRESS
B. Seo / Journal of Econometrics 137 (2007) 68111

(5.2.b) Show n1

Pn

hj Lejt1 2

t1

t1
X

hj Lejt1 2 =s4jt ! Xj , where


h2j;k e2jtk

hj;k hj;l ejtk ejtl .

kal

k1

Because e2jtk =s4jt is strictly stationary and ergodic, for all kX1, and k
s4jt ko1,
!
n X
1
1
X
e2jtk p X
e2jtk
2
2
1
hj;k 4 !
hj;k E
n
.
sjt
s4jt
t1 k1
k1
Also, n1
Thus,
n1

Pn P1
t1

kt

h2j;k Ee2jtk =s4jt o1 since

n X
t1 h2 e2
X
j;k jtk
t1 k1

107

n1

s4jt

n X
1
X

h2j;k

t1 k1
p

1
X

P1

h2j;k e2jtk
s4jt

kt

P1

2
2
k1 hj;k ejtk =

h2j;k Ee2jtk =s4jt Oftj .

op 1

h2j;k Ee2jtk =s4jt Xj .

k1

Pn
1

Pn
p
2
4
1
4
Because n
t1 hj;k hj;l ejtk ejtl =sjt op 1 for all kal, n
t1 hj Lejt1 =sjt ! Xj .
4
4
2
The other parts entail fejtk ejtl Zjt =sjt ; ejtk ejt =sjt ; Zjt =sjt g for k; lX1. These processes are
Martingale difference sequences, and their sample moments converge to zero.
P
p
Therefore, n1 nt1 q2 l t y0 =qtqt0 ! L0 S1=2 NS1=2 L.
R1
P
(5.3) Show n3=2 n q2 l t y0 =qb qt0 ) A0 S1=2 NS1=2 L  W 2 , where
t1

"

m
X
hj Lx2t2 ejt1 hj Lejt1
q2 l t y0
x2t1

A0j Lj 
2
1 2Zjt
2
0

sjt
s4jt
qb qt
j1

#
x2t1 ejt hj Lejt1
hj Lx2t2 ejt1 ejt hj Lx2t2 Zjt
2
2

.
s4jt
s2jt
s4jt

R1
P
(5.3.a) Show n3=2 nt1 x2t1 =s2jt ) 1=B2j 0 W 2 .
Assumption 1 implies that fs2jt g is b-mixing with exponential decay, and so f1=s2jt g.
P
Because k1=s2jt km p1=oj o1 for some m42, we can show that n1 nt1 x2t1 1=s2jt 
E1=s2jt Op 1. Thus,
n

3=2

n
X

x2t1 =s2jt

E1=s2jt

t1

(5.3.b) Show n3=2

W2
0

1=B2j

W 2.
0

Pn

4
t1 hj Lx2t2 ejt1 hj Lejt1 =sjt

hj Lx2t2 ejt1 hj Lejt1

t1
X
k1

h2j;k x2tk1 e2jtk

) Xj
X
kal

R1
0

W 2 , where

hj;k hj;l x2tk1 ejtk ejtl .

ARTICLE IN PRESS
B. Seo / Journal of Econometrics 137 (2007) 68111

108

First, we show that


n1

n
X

fkj x2tk1 sjt;k Op 1,

t1

where sjt;k fkj e2jtk =s4jt  Ee2jtk =s4jt for all kX1.
Note that sjt;k is b-mixing with exponential decay and ksjt;k km p2=oj kjt k22m o1 for some
P P
2
2
4
1=2
m42 and for all kX1. Also, n3=2 nt1 1
x2t1
kt hj;k x2t1 Eejtk =sjt op 1 since n
P1 2
k
2
4
Op 1 and kt hj;k Eejtk =sjt Ofj .
Therefore,
n3=2

2
n X
t1 h2 x
X
j;k 2tk1 ejtk
t1 k1

s4jt

n3=2

n X
t1
X

h2j;k x2tk1 Ee2jtk =s4jt op 1

t1 k1

n3=2

n X
t1
X

h2j;k x2t1 Ee2jtk =s4jt op 1

t1 k1

n3=2
Z

n
X

x2t1

t1
1

1
X

h2j;k Ee2jtk =s4jt op 1

k1

W 2.

) Xj
0

P
Second, we show n3=2 nt1 hj;k hj;l x2tk1 ejtk ejtl =s4jt op 1 for all kaj.
Without loss of generality, we set l4k. We have
n1

n
X

hj;k hj;l x2tk1 ejtk ejtl =s4jt Op 1.

t1

P P
Because hj;k hj;l decays exponentially, n1 l4k nt1 hj;k hj;l x2tk1 ejtk ejtl =s4jt Op 1.
The other parts entail Martingale difference sequences, and are asymptotically
negligible. Therefore,
Z 1
n
X
q2 l t y0
0 1=2
1=2
n3=2
)
A
S
NS
L

W 2:
&
0
0
t1 qb qt
Proof of Theorem 3. The parameter vector of the model with an intercept can
P be dened
0
as y b ; t0 ; y02 0 . First, show that H ny2 t y0 op 1, where H ny2 t y0 n1 nt1 q2 l t y0 =
qy2 qt0 .
P
(6.1) Show H nat y0 op 1, where H nat y0 n1 nt1 q2 l t y0 =qa qt0 and
"
p
X
hj Lwt2 ejt1 hj Lejt1
q2 l t y0
wt1
0

Lj0 Lj0   2  2
1 2Zjt
0
qa qt
sjt
s4jt
j1
#
hj Lwt2 ejt1 ejt
wt1 ejt hj Lejt1 hj Lwt2 Zjt
2
2

.
s4jt
s2jt
s4jt
(6.1.a) Because wt1 =s2jt is strictly stationary and ergodic, and kwt1 =s2jt ko1,
P
p
n1 nt1 wt1 =s2jt ! 0.

ARTICLE IN PRESS
B. Seo / Journal of Econometrics 137 (2007) 68111

(6.1.b) Show n1

Pn

t1

109

hj Lwt2 ejt1 hj Lejt1 =s4jt op 1, where

hj Lwt2 ejt1 hj Lejt1

t1
X

h2j;k wtk1 e2jtk

hj;k hj;l wtk1 ejtk ejtl .

kal

k1

Since supt kfkj wtk1 e2jtk =s4jt km o1 for some m42,


n1=2

n
X

fkj wtk1 e2jtk =s4jt Op 1.

t1

P Pn 2
2
4
Because hj;k decays exponentially, n1=2 t1
k1
t1 hj;k wtk1 ejtk =sjt Op 1. In the
P
n
same way, n1=2 t1 hj;k hj;l wtk1 ejtk ejtl =s4jt Op 1 for kal.
P
Thus, n1 nt1 hj Lwt2 ejt1 hj Lejt1 =s4jt op 1.
The other parts entail
(
)
wtk1 ejtk ejtl Zjt wtk1 ejtk ejt wtk1 Zjt
;
;
for k; lX1.
s4jt
s4jt
s2jt
These processes are MDS, and therefore H nat y0 op 1.
In the same way, we can show that H ny2 t y0 op 1.
Using the block diagonality of the Hessian matrix,
0
1
R
n  W 2 W 02
nb^ n  b 0
@ p
A)@
R
L0 S1=2 NS1=2 A  W 02
n^tn  t0
R
!
dW 1  W 2

.
L0 Z  Q
0

A0 S1=2 NS1=2 L 

W2

L0 S1=2 NS1=2 L

Thus,
nb^ n  b 0 ) V 1 U,
where
Z
U


dW 1  W 2

 AS

1=2

1=2

NS

Z
L

W2

L0 S1=2 NS1=2 L1 L0 Z  Q,


 

Z
Z
n  W 2 W 02  A0 S1=2 NS1=2 L  W 2 L0 S1=2 NS1=2 L1


Z
 L0 S1=2 NS1=2 A  W 02 .

11
A

ARTICLE IN PRESS
B. Seo / Journal of Econometrics 137 (2007) 68111

110

We use the following:



 Z
Z
dW 1  W 1 1  W 2  W 2
dW 1  W 2 ,

 

Z
Z
Z
Z
0
V n  W 2 W 02  n  W 2 W 02 n  W 2 W 2 ,
Z

R
where W 1 a0 L0 Z  Q and W 2 W 2  W 2 .
Therefore,

1 Z
Z
^

0

dW 1  W 2
nbn  b0 ) n  W 2 W 2
!
Z
 Z
1

vec

W 2 W 2

W 2 dW 01 n .

R
R
0
In the same way, nb~ n  b 0 ) vec W 2 W 2 1 W 2 dW 01 mr .

&

References
Ahn, S.K., Reinsel, G.C., 1988. Nested reduced-rank autoregressive models for multiple time series. Journal of the
American Statistical Association 83, 849856.
Amemiya, T., 1973. Regression analysis when the variance of the dependent variable is proportional to the square
of its expectation. Journal of the American Statistical Association 68, 928934.
Andrews, D., 1987. Consistency in nonlinear econometric models: a generic uniform law of large numbers.
Econometrica 55, 14651471.
Bollerslev, T., 1986. Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics 31,
307327.
Bollerslev, T., 1990. Modeling coherence in short-run nominal exchange rates: a multivariate generalized ARCH
approach. Review of Economics and Statistics 72, 498505.
Bollerslev, T., Engle, R., Wooldridge, J., 1988. A capital asset pricing model with time varying covariances.
Journal of Political Economy 96, 116131.
Bollerslev, T., Chou, R.Y., Kroner, K.F., 1992. ARCH modeling in nance: a review of the theory and empirical
evidence. Journal of Econometrics 52, 559.
Box, G.E.P., Tiao, G.C., 1977. A canonical analysis of multiple time series. Biometrika 64, 355365.
Carrasco, M., Chen, X., 2002. Mixing and moment properties of various GARCH and stochastic volatility
models. Econometric Theory 18, 1739.
Chanda, K., 1974. Strong mixing properties of linear stochastic processes. Journal of Applied Probability 11,
401408.
Engle, R., 1982. Autoregressive conditional heteroskedasticity with estimates of the variance of United Kingdom
ination. Econometrica 50, 9871008.
Engle, R., Granger, C., 1987. Cointegration and error correction representation, estimation, and testing.
Econometrica 55, 251276.
Gorodetskii, V., 1977. On the strong mixing property for linear sequences. Theory of Probability and Its
Applications 22, 411413.
Hansen, B.E., 1992. Convergence to stochastic integrals for dependent heterogeneous processes. Econometric
Theory 8, 489500.
Hansen, B.E., 1995. Regression with nonstationary volatility. Econometrica 63, 11131132.
He, C., Terasvirta, T., 1999. Properties of moments of a family of GARCH processes. Journal of Econometrics
92, 173192.
Johansen, S., 1988. Statistical analysis of cointegrating vectors. Journal of Economic Dynamics and Control 12,
231254.

ARTICLE IN PRESS
B. Seo / Journal of Econometrics 137 (2007) 68111

111

Johansen, S., 1991. Estimation and hypothesis testing of cointegration vectors in Gaussian vector autoregressive
models. Econometrica 59, 15511580.
Kurtz, T., Protter, P., 1991. Weak limit theorems to stochastic integrals and stochastic differential equations.
Annals of Probability 19, 10351070.
Lee, S.W., Hansen, B.E., 1994. Asymptotic theory for the GARCH(1,1) quasi-maximum likelihood estimator.
Econometric Theory 10, 2952.
Li, W.K., Ling, S., Wong, H., 2001. Estimation for partially nonstationary multivariate autoregressive models
with conditional heteroskedasticity. Biometrika 88, 11351152.
Ling, S., Li, W.K., 1998. Limiting distributions of maximum likelihood estimators for unstable autoregressive
moving-average time series with general autoregressive heteroskedastic errors. Annals of Statistics 26, 84125.
Ling, S., Li, W.K., 2003. Asymptotic inference for unit root processes with GARCH(1,1) errors. Econometric
Theory 19, 541564.
Ling, S., McAleer, M., 2003. Asymptotic theory for a vector ARMA-GARCH model. Econometric Theory 19,
280310.
Lumsdaine, R.L., 1996. Consistency and asymptotic normality of the quasi-maximum likelihood estimator in
IGARCH (1,1) and covariance stationary GARCH (1,1) models. Econometrica 64, 575596.
Newey, W., 1991. Uniform convergence in probability and stochastic equicontinuity. Econometrica 59,
11611167.
Phillips, P.C.B., 1991. Optimal inference in cointegrated system. Econometrica 59, 283306.
Phillips, P.C.B., Durlauf, S.N., 1986. Multiple time series with integrated variables. Review of Economic Studies
53, 473495.
Saikkonen, P., 1993. Continuous weak convergence and stochastic equicontinuity results for integrated processes
with an application to the estimation of a regression model. Econometric Theory 9, 155188.
Saikkonen, P., 1995. Problems with the asymptotic theory of maximum likelihood estimation in integrated and
cointegrated systems. Econometric Theory 11, 888911.
Seo, B., 1998. Tests for structural change in cointegrated systems. Econometric Theory 14, 222259.
Seo, B., 1999. Distribution theory for unit root tests with conditional heteroskedasticity. Journal of Econometrics
91, 113144.
Seo, B., 2001. Efcient estimation of the cointegrating vector in error correction models with conditional
heteroskedasticity. Korea Research Foundation Working Papers C00174.
White, H., 1982. Maximum likelihood estimation of misspecied models. Econometrica 50, 125.
Wong, H., Li, W., 1997. On a multivariate conditional heteroskedastic model. Biometrika 84, 111123.
Wu, C., 1981. Asymptotic theory of nonlinear least squares estimation. Annals of Statistics 9, 501513.