Sie sind auf Seite 1von 15

4

Linear Regression Models with Time Series Data

The asymptotic theory developed in Chapter 3 is applicable to cross-sectional data. The IID assumption rules out time series data. Most macroeconomic and nancial data are time series. So we focus on linear regression models with time series data in this chapter.

4.1
4.1.1

Fundamental Concepts in Time Series


Time Series Process

In this subsection we introduce the very basic concepts in time series analysis that will be used throughout our discussion on time series. The rst key concept is stochastic process, which is just a fancy name for a sequence of random variables. If the index for the random variables represents time, the stochastic process is then called a time series or time series process. Denition 4.1 (Time series process) A time series process {Yt , t = 1, 2, } is a sequence of random variables or vectors indexed by time t and governed by some probability law (, F, P ) , where is the sample space, F is a -eld and P is a probability measure. In the above denition, we let the time index t run from 1 to the innity. In theory, sometimes it is more convenient to let the time start at the negative innity. Frequently, we write the time series {Yt , t = 1, 2, } or {Yt , t = , 2, 1, 0, 1, 2, } as {Yt } . In terms of the probability language, a sample space is collection of all possible outcomes in a random experiment. An element of is called the basic event in . A random variable or vector is a measurable function from to some Euclidean space and thus we can write Yt as a function of : Yt () . For each , we obtain a realization or sample path {yt } of {Yt } , where yt = Yt () . So a realization {yt } is a sequence of real numbers. Dierent 0 s will result in dierent sample paths. The dynamics of {Yt } is completely determined by the transition probability of Yt , which is the conditional probability of Yt given its past history. In the extreme case, where {Yt } are independent, such conditional probability is also the marginal density of Yt . Denition 4.2 (Random sample) A time series random sample of size n is a subset of a discrete n time series process {Yt }t=1 : 0 Yn = (Y1 , ..., Yn ) . Any realization of this random sample is a dataset, denoted as yn = (y1 , ..., yn )0 . A fundamental feature of economic time series: each random variable Yt only has one realization. For example, let Yt denote the S&P 500 closing price index at day t. Then the daily records of S&P 500 index from July 2, 1962 to December 31, 2001 constitute a time series data, denoted as : y n = (y1 , , yn )0 where n = 9987. Consider a random sample Yn . The joint probability density function of Yn is fYn (yn ) =
n Y

t=1

f (yt |It1 )

49

where It = {Yt , Yt1 , , Y1 } denotes the information set available at t. For t = 1, f (y1 |I0 ) = f (y1 ), the marginal density of Y1 , since I0 is an empty set. Thus, the transition probability density completely describes the joint probability of the random sample. 4.1.2 Weak Stationarity, Strict Stationarity, and Ergodicity

We rst introduce the concept of weak stationarity. Denition 4.3 (Weak stationarity) A time series {Yt } is said to be weakly stationary or covariancestationary or second-order stationary if (i) E (Yt ) = for all t; (ii) Cov (Yt , Ytj ) = j < for all t and any j. Note that if a time series is weakly stationary, the covariance between Yt and Ytj depends only on j, the length of time separating the observations, and not on t, the date of the observation. For j = 0, 1, 2, , j Cov (Yt , Ytj ) is called the the jth order autocovariance of a time series process {Yt } . It does not depend on t because of covariance-stationarity. Also by covariance stationarity, j satises j = 0 . j The 0-th order autocovariance is the variance of {Yt } , i.e., 0 =Var(Yt ) . For a scalar covariance stationary process {Yt } , the jth order autocovariance is now a scalar and often written as j . If j is the autocovariance, it satises j = j . The j-th order autocorrelation coecient, j , is dened as j Cov (Yt , Ytj ) , j = 0, 1, 2, = 0 Var (Yt ) As expected, for j = 0 we have j = 1. The plot of j against j = 0, 1, 2, is called the correlogram. A very important class of weakly stationary processes is a white noise process, a process with zero mean and no serial correlation. j Denition 4.4 (White Noise) A covariance-stationary process {Yt } is a white noise process if (i) E (Yt ) = 0; (ii) Var(Yt ) = (0) < ; (iii) Cov(Yt , Ytj ) = (j) = 0 for all j 6= 0. Clearly, an IID sequence with mean zero and nite variance is a special case of white noise process. More generally, if {Yt } is a white noise process and Yt s are independent across time (i.e., Yt is independent of Ys for all t 6= s), then it is called an independent white noise process. If further Yt is Gaussian, then we have the Gaussian white noise process. A dierent concept is that of strict stationarity, that has to do with the distribution of {Yt } .

50

Denition 4.5 (Strict Stationarity) A time series {Yt } is strictly stationary if the joint distribution of {Yt1 , Yt2 , , Ytm } is the same as the joint distribution of {Yt1 +k , Yt2 +k , , Ytm +k } for all integers t1 , t2 , , tm , m and k. Alternatively, a time series is said to be strictly stationary if, for any values of t1 , t2 , , tm , the joint distribution of {Yt , Yt+t1 , Yt+t2 , , Yt+tm } depends only on the intervals separating the dates (t1 , t2 , , tm ) and not on the date itself (t). Remarks (a) Let It = {Yt , Yt1 , }. If {Yt } is strictly stationary, then the conditional probability ft (y|It1 ) of Yt given It1 , if exists, will have a time-invariant formula. That is, we can write ft (y|It1 ) as f (y|It1 ) which does not depend on the time t. A nice feature about strict stationarity is that any measurable transformation of a strictly stationary time series process is still strictly stationary. (b) Note that there is no moment requirement on the denition of strict stationarity. Nevertheless, if the time series process is strictly stationarity with nite second moments, then it must be covariancestationary. Nevertheless, it is possible to imagine a process that is covariance-stationary but not strictly stationary; even though the means and autocovariances of a time series process do not vary across the time, perhaps higher moments are functions of time. (c) A time series process {Yt } is said to be Gaussian if the joint density fYt ,Yt+j1 , ,Yt+jm (yt , yt+j1 , , yt+jm ) is Gaussian for any j1 , , jm . Since the mean and covariance matrix are all that are needed to parametrize a multivariate Gaussian distribution completely, a covariance-stationary Gaussian process is also strictly stationary. Denition 4.6 (Ergodicity) A strict stationary process {Yt } is said to be ergodic if, for any two bounded functions f : Rk+1 R and g : Rl+1 R,
m

lim {|E [f (Yt, , Yt+k ) g (Yt+m , , Yt+l+m )] | |E [f (Yt, , Yt+k )]| |E [g (Yt+m , , Yt+l+m )]|} = 0.

Note that the above denition does not require the existence of moments for {Yt } . Even though Yt may not have a moment, the denition is valid because of the boundedness of the functions f and g. Heuristically, a strictly stationary process is ergodic if it is asymptotically independent, that is, if any two sets of random variables in the sequence, they are almost independently distributed as long as they lie far apart enough from each other. A strictly stationary process that is ergodic will be called ergodic stationary or stationary ergodic. Ergodic stationarity is an integral ingredient in developing large sample theory because of the following theorem. [Obviously, if {Yt } is an IID sequence, then it is ergodic stationary process.] Theorem 4.7 (Ergodic theorem) Let {Yt } be an ergodic stationary process with E (Yt ) = . Then 1 X a.s. Yn = Yt . n t=1 51
n

The ergodic theorem says that the sample mean of an ergodic stationary process converges to its population mean almost surely. It is a substantial generalization of the Kologorovs LLN for IID random variables. 4.1.3 Martingales, Martingale Dierence Sequence and Random Walk

Denition 4.8 (Martingale) A vector-valued time series process {Yt } is called a martingale if E (Yt |Yt1 , Yt2 , ) = Yt1 a.s. for all possible values of t. The conditioning set (Yt1 , Yt2 , ) can be either (Yt1 , Yt2 , , Y1 ), or (Yt1 , Yt2 , , Y ), depending on from which point the time series starts. It is often called the information set at date t 1 and denoted as It1 . In the probability language, It1 is the -eld generated by {Yt1 , Yt2 , }. Let Ytj be an element of Yt . The scalar process {Ytj } is called a martingale with respect to {Yt } if E (Ytj |Yt1 , Yt2 , ) = Yt1,j a.s. {Ytj } is called simply a martingale if the information set is its own past values (Yt1,j , Yt2,j , ). Alternatively, a vector-valued time series process {Yt } is a martingale if Yt = Yt1 + t where E (t |It1 ) = 0 a.s., It1 = (Yt1 , Yt2 , ) is the information set at time t 1, and {t } is called a martingale dierence sequence (m.d.s.) [with respect to It ]. Note that if the process starts at the innity past It1 can also be dened as (t1 , t2 , ) because in this case the two sets (Yt1 , Yt2 , ) or (t1 , t2 , ) contain the same amount of information. If the process starts at time 1, then It1 can be dened either as (Yt1 , Yt2 , , Y1 ) or (t1 , t2 , , 1 ). The information contained in the two sets are equivalent provided one restricts Y0 = 0. It is easy to verify that a martingale dierence sequence with nite variance matrix is a white noise process. Lemma 4.9 (Martingale Diference Sequence) If {t } is a martingale dierence sequence with nite variance matrix 0 , then it is a white noise process. Proof. Let It1 = (t1 , t2 , ). Since {t } is an m.d.s. with nite variance matrix, it follows from the law of iterated expectations that E (t ) = E [E (t |It1 )] = 0, Var(t ) = 0 , and 0 Cov (t , tj ) = E t 0 tj = E E (t |It1 ) tj = 0 for all j > 0.

This completes the proof. An example of martingale dierence sequence, frequently used in analyzing asset returns, is an autoregressive conditional heteroskedastic (ARCH) process introduced by Engle (1982). See the following example. Example 4.10 (ARCH Processes) A scalar process {Ut } is said to be an ARCH process of order 1 (ARCH(1)) if it can be written as p Ut = ht t , (4.1) ht
2 = 0 + 1 Ut1 , 0 , 1 > 0,

(4.2)

52

where {t } is IID with mean zero and unit variance. If U1 is the initial value of the process, we can dene It = (Ut , Ut1 , , U1 ) . Then it is easy to verify that {Ut } is an m.d.s. because q 2 E (Ut |It1 ) = E 0 + 1 Ut1 t |It1 q 2 = 0 + 1 Ut1 E (t |It1 ) q 2 = 0 + 1 Ut1 E (t ) (since t is independent of It1 ) = 0 (since E (t ) = 0). Similarly, we can show that V ar (Ut |It1 ) = E(Ut2 |It1 ) = ht . (4.3) That is, ht is the conditional variance of Ut given the information set at time t 1. Hence the process exhibits conditional heteroskedasticity. Engle (1982) showed that the process is strictly stationary and ergodic with second moment if |1 | < 1 and U1 is a draw from an appropriate distribution or the process started in the innite past. In this case, taking expectations on both sides of E(Ut2 |It1 ) = ht gives us 2 E Ut2 = 0 + 1 E Ut1 . (4.4) 2 Using the stationarity of {Ut } , we can obtain E Ut = 0 /(1 1 ), which is the unconditional second moment of Ut . Note that homoskedasticity and conditional heteroskedasticity co-exist in this example. An important example of martingales is a random walk. Denition 4.11 (Random Walk) A time series {Yt , t = 1, 2, } is a random walk if it is generated according to Yt = Yt1 + t , where {t , t = 1, 2, } is an IID sequence with zero mean and nite variance matrix and Y0 = 0. Notice that a random walk {Yt } is a sequence of cumulative sums: Y1 = 1 , Y2 = 1 + 2 , , Yt = 1 + 2 + + t . Clearly, a random walk is a martingale because the IID of {t } implies that E (t |Yt1 , , Y1 ) = E (t |t1 , , 1 ) = E (t ) = 0. The following theorem extends the Lindeberg-Levy CLT for IID sequence to the CLT for an ergodic stationary m.d.s. Theorem 4.12 (CLT for an Ergodic Stationary m.d.s.) Suppose {t } is a stationary ergodic m.d.s. with E [t 0 ] = V , a nite and positive denite matrix. Then t 1 X d t N (0, V ) . n t=1
n

In the above theorem, it is easy to verify that ! n n n n X XX X 1/2 Var n t = n1 E (t 0 ) = n1 E (t 0 ) = V. s t


t=1 t=1 s=1 t=1

53

4.2

Large Sample Theory for Linear Regression Models with Time Series Processes

In this section we study the large sample theory for LRMs with time series processes. 4.2.1 Basic Assumptions

We make the following assumptions. Assumption A4.1 (Ergodic stationarity and linearity) The stochastic process {Yt , Xt } is stationary and ergodic such that Yt = 0 Xt + t . Assumption A4.2 (Correct specication of the conditional mean) E(t |Xt ) = 0 a.s. Assumption A4.3 (Non-singularity) Q E(Xt X0 ) is nite and p.d. t Assumption A4.4 (MDS) {Xt t } is an m.d.s. and V E(Xt X0 2 ) is nite and p.d. t t Assumption A4.5 (Conditional homoskedasticity) E(2 |Xt ) = 2 a.s. t Remarks. (1) If {Yt , Xt } is IID, then we return to the case studied in Chapter 3. (2) {Yt , Xt } is stationary ergodic. Then functions of {Yt , Xt } , e.g., Xt X0 , t , and Xt t , are all t stationary ergodic. (3) E(t |Xt ) = 0 allows for predetermined variables (lagged dependent variables, e.g., Yt1 ) in Xt . For example, Yt Xt = 1 + 2 Yt1 + t , t = 2, , n, AR(1) process = (1, Yt1 )0

Here, E(t |Xt ) will be implied if E(t |It1 ) = 0, and It = (t1 , t2 , ). (4) No normality of {t } is assumed. (5) Assumption A4.5 may or may not be required. (6) When Xt contains an intercept, the m.d.s. assumption for {Xt t } implies {t } is an m.d.s. t serially uncorrelated. 4.2.2 Consistency of the OLS Estimator

p p Theorem 4.13 Under Assumptions A4.1 - A4.4, (a) , and (b) s2 2 . 1 Proof. Recall = + (X0 X) X0 , and s2 = 0/(n k). (a) The fact that {Xt } is ergodic stationary implies that {Xt X0 } is also ergodic stationary. By t Assumption A4.3 and the ergodic theorem

Pn 1 Now, consider n1 X0 = n t=1 Xt t . By the ergodic stationarity of {Yt , Xt }, t and Xt t are also ergodic stationary. By Assumption A4.4, the Cauchy-Schwarz inequality, and ergodic theorem, Pn p 1 t=1 Xt t E(Xt t ) = 0. Therefore, n
p 1 = (n1 X0 X) n1 X0 Q1 0 = 0.

1 0 1X p Xt X0 Q = E(Xt X0 ). XX= t t n n t=1

54

(b) Noting that t = Yt Xt = t ( )0 Xt under Assumption A4.1, we have s2 = = =


p n 1 1 X 2 0 = nk n k t=1 t

n o2 1 Xn t ( )0 Xt n k t=1

2 + 0 Q 0 2 0 0 = 2.

n n n 1 X 2 1 X 1 X t + ( )0 Xt X0 ( ) 2( )0 Xt t t n k t=1 n k t=1 n k t=1

4.2.3

Asymptotic Normality of

Theorem 4.14 Suppose Assumptions A4.1-A4.4 hold, then d n( ) N (0, Q1 V Q1 ) Proof. Note that n( ) = 1X Xt X0 t n t=1
n

!1

Pn Pn p 1 We have shown that Q = n t=1 Xt X0 Q by the ergodic theorem. It suces to show n1/2 t=1 Xt t t d N (0, V ). Noting that {Xt t } is stationary ergodic and m.d.s., we have E(Xt t ) = 0, and ! n n X X 1/2 Var n Xt t = n1 E(Xt X0 2 ) = V. t t
t=1 t=1

1 X Xt t . n t=1

By the CLT for ergodic stationary and m.d.s. process n


1/2 n X t=1

Xt t N (0, V ).

d It follows that n( ) N (0, Q1 V Q1 ). If Assumption A4.5 is also satised, i.e., E(2 |Xt ) = 2 a.s., then V = E(Xt X0 2 ) = 2 Q, and t t t d n( ) N (0, 2 Q1 ) 4.2.4 Asymptotic Variance Estimator

The general formula for the asymptotic variance is given by avar( n) = Q1 V Q1 . Case 1. Conditional homoskedasticity Under conditional homoskedasticity, we have avar( n) = 2 Q1 . 55

We have shown that under Assumptions A4.1-A4.4 1X p p Q= Xt X0 Q and s2 2 , t n t=1


n

p it follows that avar( n) s2 Q1 2 Q1 . [

Case 2. Conditional heteroskedasticity Now, we need to estimate avar( n) = Q1 V Q1 . We add the following assumption. 4 Assumption A4.6 E kXt k < . Lemma 4.15 Under Assumptions A4.1-A4.4 and A4.6, V = 1X p Xt X0 2 V and t t n t=1
n

p avar( n) = Q1 V Q1 Q1 V Q1 [ Proof. The proof is analogous to the IID case, with the WLLN for IID sequence replaced by the ergodic theorem. 4.2.5 Hypothesis Testing

Again, we are interested in the null hypothesis: H0 : R = r. The test relies on the asymptotic result that d n(R r) = nR( ) N (0, RQ1 V Q1 R0 ) under H0 Case 1. Conditional homoskedasticity Theorem 4.16 Under Assumptions A4.1-A4.5 and H0 qFn 2 (q) . Proof. The proof is the same as the IID case. 1 Recall Fn = 1 (R r)0 s2 R(X0 X)1 R01 (R r), and q qFn =
d

0 0 d 2 (q) . 0/(n k)

Case 2. Conditional heteroskedasticity Theorem 4.17 Under Assumptions A4.1-A4.4 and A4.6 and H0
d Wn (R r)0 [RQ1 V Q1 R0 ]1 (R r) 2 (q)

Proof. The proof is the same as the IID case. It is dicult to talk about the LR and LM tests here. 56

4.2.6

Testing for Serial Correlation

Motivation (i) The assumption that {Xt t } is an m.d.s. plays a key role in the asymptotics. When Xt contains the intercept, {t } is also an m.d.s., which implies that there is no serial correlation among {t }. Thus, serial uncorrelatedness is an important necessary condition for the validity of avar( n) = Q1 V Q1 . (ii) In time series, correct model specication usually requires that {t } be m.d.s.: E(t |It1 ) = 0 a.s., where It1 is the information set available at time t 1, e.g., It1 = {Xt , Xt1 , ; t1 , t2 , }. Because Xt belongs to It1 , for all t > s we have by the law of iterated expectations that E(t |Xt ) = 0, and E[t s Xt X0 ] = E[E(t s Xt X0 |It1 )] = E[E(t |It1 )s Xt X0 ] = 0. s s s In this subsection we study how to test for serial correlation in {t }. 1. Box-Pierce and Ljung-Box test for serial correlation The sample j-th order autocovariance is j = where n =
1 n n 1 X (t n )(tj n ), j = 0, 1, 2, n t=j+1

Pn

t=1 t .

The sample j-th order autocorrelation is given by j = j , j = 1, 2, 0

Theorem 4.18 (Box-Pierce Q statistic) Suppose {t } is a stationary m.d.s. and E(2 |t1 , t2 , ) t = 2 < . Then the Box-Pierce (1970) Q statistic Qn n
p X j=1

2 = j

p X d ( nj )2 2 (p) j=1

Remark. (1) Let = ( 1 , ..., p )0 and = (1 , ..., p ). Then we can show that d d n N (0, 4 Ip ) and n N (0, Ip ). The result in the above theorem then follows by the continuous mapping theorem (CMT). (2) The Box-Pierce Q statistic tests whether a group of autocorrelations are simultaneously zero. It oers no clear guide on how to choose p. So it is frequently to check the serial correlation for dierent values of p, e.g., p = 1, 2, 3, 4... (3) Modication: Ljung-Boxs (1979) modied Q statistic: Q(1) n(n + 2) n
p p X 2 Xn+2 j d = ( nj )2 2 (p) . nj nj j=1 j=1

The modication often provides a better approximation to 2 (p) for moderate sample sizes. 0 (4) In practice, {t } is not observable. We can calculate the above statistic based on t = Yt Xt and obtain the same asymptotic distribution provided E(t |X1 , , Xn ) = 0, i.e., {Xt } is strictly exogenous. See Hayashi (2000, pp. 145-146). 57

(5) If Xt is predetermined but not strictly exogenous and conditionally homoskedastic such that E(t |t1 , t2 , ; Xt , Xt1 , ) = 0, (predictermined) (conditionally homoskedastic)

E(2 |t1 , t2 , ; Xt , Xt1 , ) = 2 a.s., t then

d d n N 0, 4 (Ip ) , and n N (0, Ip ) P 1 where the jth element of is given by j = j / 0 , j = n n t=j+1 t tj , is a p p matrix with typical (j, l)th element jl = E(Xt tj )0 E(Xt X0 )1 E(Xt tl )/ 2 . t Let (jl ), where jl = ( 0 s1 k )/s2 j xx
1 n

Pn 1 s2 = nk t=1 2 , j = t then dened as

Pn

t=j+1

Xttj , and sxx =

1 0 n X X.

The modied Box-Pierce Q statistic is

d Q(2) = n0 (Ip )1 2 (p) . n

2. Breusch-Godfrey (1979, 1978) Test for serial correlation Let t = Yt 0 Xt . The null hypothesis is H0 : E(t |It1 ) = 0. The maintained assumption is that E(2 |Xt ) = 2 a.s. t If t is observed, we can consider the auxiliary regression: t =
p X j=1

j tj + ut , t = p + 1, , n, AR(p) model.

Then under H0 one expects that 1 = ... = p = 0, and thus


2 (n p)Ruc 2 (p) , 2 where Ruc is the uncentered R2 in the above auxiliary regression. 0 In practice, t is not observed, and we have to rely on t = Yt Xt . Consider the auxiliary regression p X j=1 d

t = 0 Xt + or

j tj + ut , t = 1, 2, , n by setting 0 = 1 = = p+1 = 0,
p X j=1

t = 0 Xt +

j tj + ut , t = p + 1, , n

58

Then under some regularity conditions, we can show that


2 (n p) Ruc 2 (p) . d

Remarks. (1) Run the above regression for t = 1, 2, , n and test 1 = 2 = = p = 0. Under conditional d 2 homoskedasticity and serial uncorrelatedness, pFn 2 (p) and pFn = (n p) Ruc + op (1) . (2) The restricted regression is (a) t = 0 Xt + ut and the unrestricted regression is (b) t = 0 Xt +
p X j=1

j tj + ut

In the restricted model, Xt has not explanatory power on t because X0 = 0 by the normal equation for the original OLS regression of Yt on Xt . So if we use all observations {(t , Xt ) , t = 1, ..., n} in (a), 0 the restricted estimator would be = 0 so that RSSr = = T SS in this case. It follows that pFn RSSr RSSur RSSr RSSur = (n k p) RSSur /(n k p) RSSur 1 RSSur /RSSr = (n k p) RSSur /RSSr 2 Ruc = (n k p) . 2 1 Ruc =
2 (n p) Ruc =

Consequently np 1 d pFn 2 (p). pFn n k p 1 + nkp

(3) To test for serial correlation under conditional heteroskedasticity, we need to use dierent procedure. (4) In undergraduate textbooks, it is standard to introduce the Durbin-Watson (DW, 1950) test for omitted serial correlation11 , which once was very popular and is still routinely reported by conventional regression packages. Nevertheless, the DW test is appropriate only when the regression Yt = 0 Xt + t contains no lagged dependent variables on the RHS, and t is IID N 0, 2 (no heteroskedasticity, serial correlation, plus normality). Otherwise, it is invalid. See Hansen (2011, p. 233).

4.3
4.3.1

Large Sample Theory for Linear Regression Model under Both Conditional Heteroskedasticity and Autocorrelation
Basic Assumptions

We add the following assumption which is also adopted in Hayashi (2000, p 405).
1 1 Recall

DWn =

n t=2 (t

t1 )2 /

n 2 t=1 t

2 (1 ), where denotes the correlation between t and t1 .

59

Assumption A4.4 (Gordins condition on ergodic stationary processes) P (i) Let (j) =Cov(Xt t , Xtj tj ) = E(Xt t tj X0 ). V = tj j= (j) is nite and p.d. (ii) E(Xt t |Xtj , tj , Xtj1 , tj1 , ) 0 as j . P 0 1 (iii) j=0 [E( tj tj )] 2 < , where
L2

tj = E(Xt t |Xtj , tj , Xtj1 , tj1 , ) E(Xt t |Xtj1 , tj1 , Xtj2 , tj2 , ).

Remarks. (1) Assumption A4.4 on {t } allows for both conditional heteroskedasticity and autocorrelation of unknown form. In particular, we do not assume {Xt t } is an m.d.s.. (2) No normality assumption is assumed. 4.3.2 Recall Long-run Variance Estimator
n X 1 n( ) = Q1 n 2 Xt t t=1

Suppose

n 2

n X t=1

Xt t N (0, V )
1

where V is the asymptotic variance, i.e., V =avar(n 2

d n( ) N (0, Q1 V Q1 ) under some regularity conditions. 0 0 Let gt = Xt t . Recall (j) = E(gt gtj ) and (j) = E(gtj gt ) = (j)0 . Noting that 0 n terms, for i = j = 1, ..., n 1 n 1 terms, for i = 2, ..., n and j = 1, ..., n 1 2 n 2 terms, for i = 3, ..., n and j = 1, ..., n 2 . . . . . . ts= n1 1 term 1 n 1 terms, for j = 2, ..., n and i = 1, ..., n 1 2 n 2 terms, for j = 3, ..., n and i = 1, ..., n 2 . . . . . . (n 1) 1 term

Pn

t=1

Xt t ). Then

60

we have Var n
1 2 n X t=1

gt

n1 n1

n n XX

0 E [gt gs ] n1 X n X n t1 XX t=2 s=1

= =

1 [n (0) + (n 1) (1) + (n 2) (2) + + (n 1) n + (n 1) (1) + (n 2) (2) + + ((n 1))] n1 n1 X 1 X |j| (n |j|) (j) = 1 (j) n n
j=(n1) j=(n1) j=) X

( n X
t=1

t=1 s=1

0 E [gt gt ] +

0 E [gt gs ] +

0 E [gt gs ]

t=1 s=t+1

(j) = V.

We may conjecture to estimate V by V0 = where


n1 X

(j)

j=(n1)

p 0 p and (j) = (j) . Unfortunately, although (j) (j) for each xed value of j, V 9 V because we have too many estimated terms in the summation and the estimation errors accumulate. To ensure the consistency of the variance estimate, we can use instead

n 1 X (j) = Xt t tj X0 for j = 0, 1, 2, , n 1 tj n t=j+1

V1 =

j=pn
1 where pn , pn /n 0, as n (e.g., pn = bn 3 c). Nevertheless, V1 may not be p.d. for all n. To ensure the positive deniteness of the variance estimate, dene pn X j = V k (j) pn j=p n

pn X

(j)

where k() is called a kernel function. (1) If one use the Bartlett kernel k(z) = (1 |z|)|(|z| < 1), then one obtains the Newey-West (1987) HAC variance estimator. In this case, the best rate for pn is 1 1 n 3 , and n 3 (V V ) = Op (1), which is fastest rate of consistency here. HAC refers to heteroskedasticity and autocorrelation consistent. (2) If one uses the Quadratic-Spectral kernel k(z) = sin(6z/5) 25 ( cos(6z/5)) 12 2 z 2 6z/5 61

then one obtains Andrews (1991) HAC variance estimator. In this case, the best rate for pn is n 5 , and 2 n 5 (V V ) = Op (1). This kernel is preferred to the Bartlett kernel. (3) V1 is obtained by using the truncation kernel ( 1 if |z| 1 k (z) = . 0 if |z| > 1 Not all kernel functions give p.d. matrix estimator V , but many of them do. Under certain conditions p on {Yt , Xt }, k() and pn , we can prove that V V . 4.3.3 Consistency of OLS

p Theorem 4.19 Under Assumptions A4.1-A4.3 and A4.4*(i), . Proof. Note that 1 = Q1 n
n X t=1

Xt t .

p By Assumptions A4.1 and A4.3, and the ergodic theorem, Q Q. By Assumption A4.2 and A4.4*(i) Pn p 1 and the ergodic theorem, n t=1 Xt t E (Xt t ) = 0. The result follows. 4.3.4 Asymptotic Normality of Theorem 4.20 Under Assumptions A4.1-A4.3 and A4.4* d n( ) N (0, Q1 V Q1 ) Pn Proof. n( ) = Q1 t=1 Xt t . By Assumptions A4.1-A4.3 and A4.4*, and the CLT for zero mean ergodic stationary processes below, we have n2
1

n X t=1

Xt t N (0, V )

d p Also, Q Q. Then n( ) N (0, Q1 V Q1 ). Theorem 4.21 (CLT for zero mean ergodic stationary process) (White, 2001,Theorem 5.16, Hayashi, 2000, p 405) Suppose {Zt } a stationary ergodic process with (i) E(Zt ) = 0, P 0 (ii) = j= (j) is nite and p.d. where (j) = E(Zt Ztj ), (iii) E(Zt |Ztj , Ztj1 , ) 0 as j , P 1 (iv) [E( 0 tj )] 2 < where tj = E(Zt |Ztj , Ztj1 , ) E(Zt |Ztj1 , Ztj2 , ). tj j=0 Then n X 1 d n 2 Zt N (0, ).
t=1 L2

62

4.3.5

Hypothesis Testing

As before, we consider testing the null hypothesis H0 : R = r, where R is aq k matrix with rank q k, and r is a q 1 vector. Under H0 , d n(R r) N (0, RQ1 V Q1 R0 ). So we can construct the Wald statistic Wn h i = n(R r)0 RQ1 V Q1 R0 (R r) h i 1 = n1 (R r)0 R (X0 X) V (X0 X)1 R0 (R r) Wn 2 (q). Proof. Same as before. Remarks. (1) In the presence of both conditional heteroskedasticity and serial correlation, the standard t and F test statistics cannot be used. (2) If we use n X V = (0) = n1 Xtt t X0 t
t=1 d

p Theorem 4.22 Under Assumptions A4.1-A4.3 and A4.4*, if V V, then under H0

we obtain the Whites heroskedasticity-consistent variance-covariance estimator.

63

Das könnte Ihnen auch gefallen