0 Bewertungen0% fanden dieses Dokument nützlich (0 Abstimmungen)

2K Ansichten37 Seiten© Attribution Non-Commercial (BY-NC)

PDF oder online auf Scribd lesen

Attribution Non-Commercial (BY-NC)

Als PDF **herunterladen** oder online auf Scribd lesen

0 Bewertungen0% fanden dieses Dokument nützlich (0 Abstimmungen)

2K Ansichten37 SeitenAttribution Non-Commercial (BY-NC)

Als PDF **herunterladen** oder online auf Scribd lesen

Sie sind auf Seite 1von 37

This part discusses a new theory for a regression with nonstationary unit

Casual inspection of most economic time series data such as GNP and prices

reveals that these series are non stationary. We can characterize some of the

1

3. Any shock to a series displays a high degree of persistence. For ex-

not returning to its previous level until mid 80s. Overall the general

4 Some series share co-movements with other series. For example, short-

A note on notations: It is widely used that the unit root process is called

1

In this regard we can de…ne I(d) process, and d is a number of di¤erencing to render

the series stationary.

2

1.2 Spurious Regression

yt = xt + error:

But, as will be shown below, this is not the case. This phenomenon originated

Example 1 There are some famous examples for spurious correlation. One

all marriages and the mortality rate over the period 1866-1911. Yet another

2

This relation proved resilient to many econometric diagnostic tests and was humorously

advanced as a new theory of in‡ation.

3

As we have come to understand in recent years, it is commonality of

relations. What makes the phenomenon dramatic is that it occurs even when

and the regressors. Using Monte Carlo simulations Granger and Newbold

yt = yt 1 + "t ;

xt = xt 1 + et ;

2 2

where "t iid (0; ") and et iid (0; e ), and "t and et are independent of

4

each other. Consider the regression,

y t = x t + ut : (1)

Then, as T ! 1,

(a) The OLS estimator of obtained from (1), denoted ^ , does not converge

In sum, in the case of spurious regression, ^ takes any value randomly, and

or cointegration (see below), one useful guideline is that we are likely to face

with the spurious relation when we …nd a highly signi…cant t-ratio combined

with a rather low value of R2 and a low value of the Durbin-Watson statistic.

1.3 Cointegration

5

consideration may drift away from equilibrium for a while, economic forces

Example 2 Consider the market for tomatoes in two parts of a country, the

north and the south with prices pnt and pst respectively. If these prices are

the prices are unequal it will be possible to make a pro…t by buying tomatoes

in one region and selling them in the other. This trading mechanism will

be inclined to equate prices again, raising prices in the buying region and

sumption and income exhibit a unit root, over the long run consumption tends

Parity (PPP). This theory holds that apart from transportation costs, goods

6

should sell for the same e¤ective price in two countries. Let Pt denote the

index of the price level in US (in dollars per good), Pt denote the price

index for UK (in pounds per good), and St the rate of exchange between the

currencies (in dollars per pound). Then the PPP holds that Pt = St Pt ;

quality prevent PPP from holding exactly at every date t: A weaker version of

how to estimate the cointegrating parameters and the second is how to test

yt = x t + ut ; (2)

3

The deterministic regressors such as intercept and a linear time trend can be easily

accommodated in the regression without changing the results in what follows.

7

where xt is an I(1) variable given by

xt = xt 1 + et : (3)

2

Assumption 2.1. ut is iid process with mean zero and variance u.

random variable.

8

This theorem clearly indicates that most estimation and inference results

0

yt = xt + ut ;

xt = xt 1 + et ;

Violation of Assumption 2.3 means that there may be more than one

9

a multiple regression with stationary regressors.

In this more general case the OLS estimator of is also consistent, and

can be tested in a standard way using the Wald statistic, which is asymptot-

2

ically distributed.

One of most popular tests for (a single) cointegration has been suggested

sion:

0

yt = xt + ut ; t = 1; :::; T; (4)

The Engle and Granger cointegration test is carried out in two steps:4

4

Of course you need to pretest the individual variables for unit roots. By de…nition of

10

1. Run the OLS regression of (4) and obtain the residuals by

u^t = yt ^ 0 xt ; t = 1; :::; T;

u^t :

the test of no-cointegration, because the null of unit root in u^t implies that

also non-standard, but more importantly di¤erent from that of the univariate

DF unit test. Main di¤erence stems from the fact that one needs to allow

cointegration you need to ascertain that all the variable involved are I(1):

5

Since u

^t is a zero-mean residual process, there is no need to include an intercept term

here.

11

for estimation uncertainty through ^ in the …rst step. The resulting test

As in the case of the univariate unit root test, there are the three speci-

0

yt = xt + vt ; (6)

0

y t = a0 + xt + vt ; (7)

0

y t = a0 + a1 t + xt + vt ; (8)

where a0 is an intercept and a1 is the linear trend coe¢ cient. The three sets of

critical values of the EGDF tests have been provided by Engle and Yoo (1987,

p 1

X

u^t = '^

ut 1 + i u^t 1 + "t ;

i=1

6

For example, Micro…t provides the corresponding critical values when estimating and

testing.

12

and do the t-test for ' = 0.

Notice that all the problems that a- ict the unit root tests also a- ict

provide only weak evidence that two or more variables are not cointegrated.

The cointegrating regression so far considers only the long-run property of the

model, and does not deal with the short-run dynamics explicitly.7 Clearly,

a good time series modelling should describe both short-run dynamics and

an error correction model (ECM). Although ECM has been popularized after

Engle and Granger, it has a long tradition in time series econometrics dating

7

Here the long-run relationship measures any relation between the level of the variables

under consideration while the short-run dynamics measure any dynamic adjustments be-

tween the …rst-di¤erence’s of the variables.

13

tradition.

t = yt xt ;

yt = t 1 + xt + ut ; (9)

where ut is iid. The ECM equation (9) simply says that yt can be explained

which means that yt 1 is too high above its equilibrium value, so in order to

error correction coe¢ cient must be negative such that (9) is dynamically

stable. In other words, if yt 1 is above its equilibrium, then it will start falling

in the next period and the equilibrium error will be corrected in the model,

14

hence the term error correction model.

Notice that is called the long-run parameter, and and are called

short-run parameters. Thus the ECM has both long-run and short-run prop-

erties built in it. The former property is embedded in the error correction

term t 1 and the short-run behavior is partially but crucially captured by the

error correction coe¢ cient, . All the variables in the ECM are stationary,

yt (yt 1 yt 1 ) = xt + (xt 1 xt 1 ) + ut ;

yt = (yt 1 xt 1 ) + x t + ut :

yt = t 1 + xt + ut ;

where the error correction coe¢ cient is -1 by construction, meaning the per-

15

restrictive and unlikely to happen in practice.

yt ^ xt .

yt = ^t 1 + x t + ut :

Example 7 ECM model is also used in the Present Value (PV) model for

stocks. A PV model relates the price of a stock to the discounted sum of its

expected future dividends. It’s noted that 1. stock prices and dividends must

have much larger e¤ects on price than transitory movements. Thus an ap-

ECM model. This is also called the dynamic Gordon model in …nance.

Remark 1 We only cover single equation modelling and testing for 1 coin-

16

tegration relationship. When there are more than 2 variables, there could

system analysis which could solve the two problems. Namely, we use a VAR

(Vector Autoregressive) model and use Johansen’s test to test the reduced

sis

In time series analysis the explanatory variable may in‡uence the dependent

variable with a time lag. This often necessiates the inclusion of lags of the

may be correlated with lags of itself, suggesting that lags of the dependent

17

motivate the commonly used ARDL(p; q) model de…ned as follow:8

yt = 1 yt 1 + + p yt p + 0 xt + 1 xt 1 + + q xt q + ut ; (10)

2

where ut iid (0; ).

In the case where the variables of interest are trend stationary, the general

practice has been to de-trend the series and to model the de-trended series as

Estimation and inference concerning the properties of the model are then

However, the analysis becomes more complicated when the variables are

cerned with the analysis of the long run relations between I(1) variables,

and its basic premise is, at least implicitly, that in the presence of I(1) vari-

Pesaran and Shin (1999) recently re-examined the traditional ARDL ap-

8

For convenience we do not include the deterministic regressors such as constant and

linear time trend.

9

Consequently, a large number of alternative estimation and hypothesis testing proce-

dures have been speci…cally developed for the analysis of I(1) variables. See Phillips and

Hansen (1990, Review of Economic Studies) and Phillips and Loretan (1991, Review of

Economic Studies).

18

proach for an analysis of a long run relationship when the underlying variables

1. The ARDL-based estimators of the long-run coe¢ cients are also con-

2

made using the standard asymptotic distribution.

regressor,

yt = yt 1 + 0 xt + 1 xt 1 + ut ; t = 1; :::; T; (11)

erated by

xt = xt 1 + et ; (12)

2

A1. ut iid(0; u ).

A3. ut and et are uncorrelated for all lags such that xt is strictly exogenous

with respect to ut .

19

A4. (Stability Condition) j j < 1, so that the model is dynamically stable.

process and implies that there exists a stable long-run relationship between

yt and xt .10

y = y + 0x + 1x ;

and therefore,

y = x;

0+ 1

where = 1

is called the long-run multiplier.

yt = (1 ) yt 1 +( 0 + 1 ) xt 1 + 0 xt + ut (13)

= yt 1 + xt 1 + xt + ut ;

10

If = 1, then there would be no long-run relationship.

20

Assumption A4, we further obtain

yt = (yt 1 xt 1 ) + xt + ut ; (14)

ter. This shows that the ARDL speci…cation is indeed useful to characterize

both the long-run and short-run behavior. Notice that (11), (13) and (14)

are the same but represented di¤erently. We may call (13) and (14) as un-

restricted ECM and (restricted) ECM, respectively. From (13) the long-run

= : (15)

normal distribution.

tion.

21

We also …nd that the OLS estimators of all of the short-run parameters,

p

, , are T -consistent and have the normal distribution.

In a more general case where ut and et in (13) are serially correlated, the

and inference is carried out: that is, we now consider the ARDL(p; q) or

p 1 q 1

X X

yt = yt 1 + xt 1 + j yt j + j xt j + ut : (16)

j=1 j=0

1. Use model selection criteria such as AIC and SBC, and choose the

^

error correction term by ^t = yt ^ xt , where ^ =

^

and ^ and ^ are

22

3. Run an ECM regression of y on ^t , x, p 1 lagged y’s and q 1

lagged y’s,

p 1 q 1

X X

yt = ^t 1+ j yt j + j xt j + ut :

j=1 j=0

2 3 2 3

6 xt 7 6 ut 7

1. Given yt = 6

4

7 and et = 6

5 4

7

5 iidN (0; )

zt vt

2 3 2 32 3 2 3

6 xt 7 6'xx 'xz 7 6 xt 1 7 6 ut 7

yt = 'yt + e =) 6 7=6 76 7+6 7

1 t 4 5 4 54 5 4 5

zt 'zx 'zz zt 1 vt

In general, A(L)yt = B(L)et where

2 p

A(L) = 1 1L 2L ::::: pL

B(L) = 1 + 1L + 2 L2 + ::::: + pL

p

2 3

6'j;xx 'j;xz 7

j =6

4

7

5

'j;zx 'j;zz

Given an AR(1) representation,

2 32 3

6axx (L) axz (L)7 6 xt 7

A(L)yt = et =) yt = A(L) 1 et =) 6

4

76

54

7 =

5

azx (L) azz (L) zt

23

2 3

6 ut 7

6 7

4 5

vt

2 3 2 32 3

6 xt 7 6 axx (L) axz (L)7 6 ut 7

=) 6

4

7 = [axx (L)azz (L) axz (L)azx (L)]

5

1 6

4

76

54

7

5

zt azx (L) azz (L) vt

Similarly a process like an ARM A(2; 1) such as:

2 '2 yt 2 + u3

t+2 1 ut 13can2be represented

3 as:

6 yt 7 6'1 '2 1 7 6 yt 1 7 6 1 7

6 7 6 76 7 6 7

6 7 6 76 7 6 7

6 y 7 = 6 1 0 076 y 7 6 7

6 t 1 7 6 7 6 t 2 7 + 6 0 7 wt =) xt = Axt 1 + Cwt

6 7 6 76 7 6 7

4 5 4 54 5 4 5

ut 0 0 0 ut 1 1

It is sometimes convenient to de…ne the C matrix in such a way so the

variance-covariance

2 3 matrix of the shocks is the identity matrix. Then

6 e 7

6 7

6 7

C=6 0 7

6 =

7 and E(wt wt ) = I

6 7

4 5

e

2.2.1 Forecast

We start with a vector AR(1) representation which can have a M A(1) rep-

resentation as:

P

1

yt = Aj Cwt j

j=0

Et (yt+k ) = Ak yt

24

yt+1 Et (yt+1 ) = Cwt+1

Vt (yt+1 ) = CC =

Vt (yt+2 ) = CC = + ACC = A=

kP1

Vt (yt+j ) = Aj CC = Aj=

j=0

= =

Vt (yt+k ) = CC 2 + AV

3t (yt+k 1 )A

6 xt 7

Suppose, yt = 6

4

7 , then

5

zt

AR(1)

2 mapping

3 2 of the above equation gives:

32 3 2 3

6 xt 7 6 'xx1 'xz1 'xx2 'xz2 :: 7 6 xt 1 7 6 1 0 7

6 7 6 76 7 6 7

6 7 6 76 7 6 7

6 zt 7 6 :: 7 6 7 6 0 1 7 2 3

6 7 6 'zx1 'zz1 'zx2 'zz2 76 zt 1 7 6 7

6 7 6 76 7 6 7 e

6 7 6 76 7 6 7 6 xt 7

6xt 1 7 6 :: 7 6 7+6 0 0 7 6 7

6 7=6 1 0 0 : 76 xt 2 7 6 74 5

6 7 6 76 7 6 7 e

6 7 6 76 7 6 7 zt

6 zt 1 7 6 :: 7 6 7 6 7

6 7 6 0 1 0 : 76 zt 2 7 6 0 0 7

6 7 6 76 7 6 7

4 5 4 54 5 4 5

:: :: :: :: :: :: :: :: ::

=) yt = Ayt 1 + et

25

We can also start with:

y2t = yt 1

13 +

2 2 yt 2 + :: + et3 2 3 2 3

6 yt 7 6 1 2 :: :: 7 6 yt 7 6 I 7

1

6 7 6 76 7 6 7

6 7 6 76 7 6 7

6 y 7 6 I 0 :: :: 7 6 yt 2 7 6 7

6 t 1 7 6 76 7 6 0 7

6 7=6 76 7 + 6 7 [et ]

6 7 6 76 7 6 7

6 y 7 6 0 I :: :: 7 6 yt 3 7 6 7

6 t 2 7 6 76 7 6 0 7

6 7 6 76 7 6 7

4 5 4 54 5 4 5

:: :: :: :: :: :: ::

Given the above AR(1) representation, we can forecats both x and z.

We can also add a small change in the above formulation by choosing the C

matrix such that the shocks are orthogonal to each other. This implies that

E(ee= ) = I.

think about ‘cause’and ‘e¤ects’. For example, we may want to compute the

Rate VAR model and interpret the result as the e¤ect of interest rate on

stock price.

P

1

AR(1) Model: yt = 'yt 1 + et =) yt = 'j et j

j=0

et : 0 0 1 0 0 0 0

26

yt : 0 0 1 ' '2 '3 '4

P

1

M A(1) Model: yt = j ut j

j=0

ut : 0 0 1 0 0 0 0

yt : 0 0 1 1 2 3 4

Given the above formulation, the vector process works in the same way.

f 0; 1 ; :::g de…ne

2 the impulse-response

3 functions.

6 xx (L) xz (L) 7

(L) = 6

4

7

5

zx (L) zz (L)

Therefore, xx (L) gives the response of xt+k to a unit shock in uxt , and

xz (L) gives the response of xt+k to a unit shock in uzt . In practice, however.

A2 C,....Ak C,.....

2.2.3 Orthogonalization

27

=

A(L)yt = et ; A(0) = I; E(et et ) = (18)

= 1

yt = B(L)et ; B(0) = I; E(et et ) = where B(L) = A(L)

2 shock 3

as 1t = ext and 2t = ezt + 0:3ext .

6 1 0 7

We de…ne t = Qet where Q = 6

4

7. Then we get the following:

5

0:3 1

C(L) t are observationally equivalent, since they produce the same series yt .

We, then, need to decide which linear combinations is the most interesting.

orthogonal. Let us start with the example: ‘e¤ect of interest rate on stock

prices’. If the interest rate shock is correlated with the stock price shock,

28

dividends) that happen to come at the same time as the interest rate shock

(it may be because the Reserve Bank of India sees the announcement shock

so that they have a unit variance. Thus we want to choose Q such that

= =

E( t t) = I. To do this we need a Q such that Q 1 Q 1=

= , E( t t) =

=

E(Qet et Q= ) = Q Q= = I. However, there may be many di¤erent Q’s that

can act as the square root matrices for . Given one such Q, we can have

=

QQ = M QQ= M = = M M = = I. Since C(L) = B(L)Q 1 , we specify a

erties of C(0) that gives the instantaneous response of each variable to each

orthogonalized

2 3 2 shock . Sims

3 2suggested

3 a lower triangular C(0):

6 xt 7 6 C0xx 0 7 6 xt 7

6 7=6 76 7 + C1 t 1 + :::

4 5 4 54 5

zt C0zx C0zz zt

The above spe…cation implies that 2t does not a¤ect the …rst variable,

29

matrix C(0) implies that contemporaneous xt appears in the zt equation, but

traingular.

2 32 3

6 D0xx 0 7 6 xt 7

6 76 7 + D1 yt + ::: =

4 54 5 1 t

D0zx D0zz zt

Assuming an AR(1) process we get:

with contemporaneous xt in the zt equation, but not the vice versa, and then

scaling each equation so that the eroor variance is one. Recall that OLS

estimates produce residuals that are uncorrelated with the RHS variables by

30

xt = a1xx xt 1 + :: + a1xz zt 1 + xt (20)

with each other. The term a0zx captures all of the contemporaneous correla-

the orthogonalized system: what percent of the k-step ahead forecast error

=

yt = C(L) t , E( t t) =I

11

C(L) = B(L)Q 1

C(0) = B(0)Q 1

This implies that the Choleski decomposition produces a lower triangular Q.

31

2 32 3

6 cxx;0 cxz;0 7 6 x;t+1 7

et = yt+1 Et (yt+1 ) = C(0) t+1 =6

4

76

54

7

5

czx;0 czz;0 z;t+1

where

C(L) = C(0) + C(1)L + C(2)L2 + ::: and the elements of C(L) as cxx;0 +

cxx;1 L + cxx;2 L2 + :::. Since, the s are uncorrelated and have unit variance:

Vt (xt+1 ) = c2xx;0 2

( x ) + c2xy;0 2

( z ) = c2xx;0 + c2xy;0 and similarly for z.

Thus c2xx;0 gives the amount of 1-step ahead forecast error variance of x due

to the x shock and c2xx;0 due to the z shock. One in general reports them

c2xx;0 c2

as fractions: , zz;0 .

c2xx;0 +c2xx;0 c2zx;0 +c2xx;0

Hence, we can write:

2 3 2 3

6 1 0 7 6 0 0 7

Vt (xt+1 ) = C0 C0 , I1 = 6 7 and I2 = 6 7

=

4 5 4 5

0 0 0 1

Thus, the part of the 1-step ahead forecast error variance due to the …rst

= =

(x) shock is C0 I1 C0 and the part due to the second (z) shock is C0 I2 C0 . We

= = =

also get: C0 C0 = C0 I1 C0 +C0 I2 C0 . If we think Im as a new covariance

matrix where all shocks but the m-th are turned o¤ then the total variance

of forecast errors must be equal to the part due to the m-th shock, given by

=

C0 Im C0 . Therefore,

kP1

= = = =

Vt (yt+k ) = C0 C0 + C1 C1 + :::: + Ck 1 Ck 1 =) Vk;m = Cj Im Cj is the

j 0

32

variance of k-step ahead forecast due to the m-th shock, and the variance is

P

the sum of these components, Vt (yt+k ) = Vk;m . On the other hand, note

m

y2t = yt 1

13 +

2 2 yt 2 + :: + et3 2 3 2 3

1

6 yt 7 6 1 2 :: :: 7 6 yt 7 6 Q

1 7

6 7 6 76 7 6 7

6 7 6 76 7 6 7

6 y 7 6 I 0 :: :: 7 6 yt 2 7 6 7

6 t 1 7 6 76 7 6 0 7

6 7=6 76 7+6 7 [ t]

6 7 6 76 7 6 7

6 y 7 6 0 I :: :: 7 6 yt 3 7 6 7

6 t 2 7 6 76 7 6 0 7

6 7 6 76 7 6 7

4 5 4 54 5 4 5

:: :: :: :: :: :: ::

=

=) yt = Ayt 1 + C t , E( t t) =I

recursively from:

1

If Q is lower diagonal, then only the 1st shock a¤ects the 1st variable:

kP1 kP1

Vt (yt+k ) = Aj CC = Aj= =) Vk;m = Aj CIm C = Aj=

j 0 j 0

de…nition of cause: “cause should preced e¤ects. However, this may not be

true in Time-Series.

33

xt Granger causes yt if xt helps to forecast yt , given past yt . Consider

the following:

yt = a(L)yt 1 + b(L)xt 1 + ut

xt = c(L)yt 1 + d(L)xt 1 + vt

Our de…nition inplies that xt does not Granger cause yt if b(L) = 0, i.e.

yt = a(L)yt 1 + ut

xt = c(L)yt 1 + d(L)xt 1 + vt

Alternatively,

2 3 2 32 3 2 3

6 yt 7 6 a(L) b(L) 7 6 yt 1 7 6 ut 7

6 7=6 76 7+6 7

4 5 4 54 5 4 5

xt c(L) d(L) xt 1 vt

2 32 3 2 3

6 I La(L) Lb(L) 7 6 yt 7 6 ut 7

=) 6

4

76

54

7=6

5 4

7

5

Lc(L) I Ld(L) xt vt

34

2 32 3 2 3

6 a (L) b (L) 7 6 yt 7 6 ut 7

=) 6

4

76

54

7=6

5 4

7

5

c (L) d (L) xt vt

Thus x does not Granger cause y i¤ b (L) = 0.

causality.

2 3 2 32 3

6 yt 7 6 d (L) b (L) 7 6 ut 7

6 7= 1 6 76 7

4 5 a (L)d (L) b (L)c (L) 4 54 5

xt c (L) a (L) vt

This implies that x does not Granger cause y if the Wold moving average

then y is a function of its own shocks only, and does not respond to x shocks.

x on the other hand, is a function of both y shocks and x shocks. This also

implies:

x does not Granger cause y if the y’s bivariate Wold representation is the

of y on past y alone.

Suppose,

35

yt = e(L) t where t = yt P (yt jyt 1 ; yt 2 ; :::)

for all j > 0. This implies that the univariate innovations of xt are un-

to innovations, not the level of the series. If xt does not Granger cause yt

then

xt = c(L)ut + d(L)vt

36

Thus, t is a linear combination of current and past ut and vt . Since ut

thus past does not help forecasting y given past y. Since xt = f (L) t , this

Facts

on past y and past x, the coe¢ ceints of the past x are all zero.

tions in y.

37