Sie sind auf Seite 1von 46

David Ruppert

Statistics and Finance: An Introduction

Solutions Manual

July 9, 2004

Springer

Berlin Heidelberg NewYork Hong Kong London Milan Paris Tokyo

2

Probability and Statistical Models

1. (a)

E (0.1X + 0.9Y ) = 1.

Var(0.1X + 0.9Y ) = (0.1 2 )(2) + 2(0.1)(0.9)(1) + (0.9 2 )(3) = 2.63.

(b)

Var {wX + (1 w )Y } = 3w 2 4w + 3.

2. (a)

(b)

(c)

The derivative of this expression is 6w 4. Setting this derivative equal to 0 gives us w = 2/3. The second derivative is positive so the solution must be a minimum. In this problem, assets X and Y have same expected return. This means that regardless of the choice of w , that is, the asset allocation, the expected return of the portfolio doesn’t change. So by minimizing the variance, we can reduce the risk without reducing the return. Thus the ratio w = 2/3 corresponds to the optimal portfolio

Use (2.54) with w 1 = (1 1) T and w 2 = (1 1) T .

Use part (a) and the facts that Cov(α 1 X, α 2 X ) = α 1 α 2 σ X , Cov(α 1 X, Z ) = 0, Cov(Y, α 2 X ) = 0, and Cov(Y, Z ) = 0.

Using (2.54) with w 1 and w 2 the appropriately-sized vectors of ones, it can be shown that

2

Cov

n

1

i=1

X

i ,

=1 Y i =

n

2

i

n

1

n

2

i=1 i =1

Cov(X i , Y i ).

3. The likelihood is

L(σ 2 ) =

n

i=1

1

2πσ 2 e

2 σ 2 ( Y i µ) 2 .

1

Therefore the log-likelihood is

2

2 Probability and Statistical Models

log L(σ 2 ) = 2σ 1 2

n

i=1

(Y i µ) 2 n log(σ 2 )/2 + H

where H consists of all terms that do not depend on σ . Diﬀerentiating the log-likelihood with respect to σ 2 and setting the derivative 1 with respect to σ 2 equal to zero we get

whose solution is

1

2(σ 2 ) 2

n

i=1

(Y i µ) 2

n

2σ 2 = 0

σ 2 = 1

n

n

i=1

(Y i µ) 2 .

4. Rearranging the ﬁrst equation, we get

β 0 = E (Y ) β 1 E (X ).

(2.1)

Substituting this into the second equation and rearranging, we get

E (XY ) E (X )E (Y ) = β 1 {E (X 2 ) E (X ) 2 }.

Then using

σ XY = E (XY ) E (X )E (Y )

and

we get

σ X = E (X 2 ) E (X ) 2

2

β 1 =

σ XY

2

σ X

,

and substituting this into (2.1) we get

5.

β 0 = E (Y )

σ

XY

2

σ X

E (X ).

E (w T X ) = E

N

i=1

w i X i =

N

i=1

w i E (X i ) = w T {E (X )}.

Next

Var(w T X ) = E w T X E (w T X ) 2 = E

N

i=1

w i {X i E (X i )} 2

1 The solution to this problem is algebraically simpler if we treat σ 2 rather than σ as the parameter.

2 Probability and Statistical Models

3

=

N

N

E

i=1 j =1

[ w i w j {X i E (X i )}{X j E (X j )}] =

N

N

i=1 j =1

w i w j Cov(X i , X j ).

One can easily check that for any N × 1 vector X and N × N matrix A

X T AX =

N N

i=1 j =1

X i X j A ij ,

whence

6. Since

w T COV(X )w =

N N

i=1 j =1

w i w j Cov(X i , X j ).

log {L(µ, σ 2 )} = n log(2π ) n log(σ 2 ) 1

2

2

2σ 2

n

i=1

(Y i µ) 2

and

it follows that

i=1 n (Y i Y ) 2

σ 2

ML

= n,

log {L(Y , σ ML )} = n {1 + log(2π ) + log(σ

2

2

Next, the solution to

2

ML

}.

0 = σ 2 log {L(0, σ 2 )} =

∂σ 2 n

2

log(2π ) n log(σ 2 ) 1

2

2σ 2

n

i=1

solves 2 = n

i=1 Y i

7. (a)

2

= n

1

2σ 2 + 2(σ 2 ) 2

so that

σ

2

0 ,ML = 1

n

n

i=1

n

i=1

Y

i

2

.

Y

i

2

,

Y

i

2

E {X E (X )} = E (X ) E {E (X )} = E (X ) E (X ) = 0.

(b) By independence of X E (X ) and Y E (Y ) we have

E [{X E (X )}{Y E (Y )}] = E {X E (X )}E {Y E (Y )} = 0·0 = 0.

4

2 Probability and Statistical Models

8. (a) Since

Y

= E (Y ) + σ XY {X E (X )}.

σ

2

X

and E {X E (X )} = 0 by Problem 7. it follows that E ( Y ) = E (Y ) so that E { Y Y } = 0 0 = 0.

9.

10.

11.

(b)

= σ

E (Y Y ) 2 = E {Y E (Y )} 2

2

XY

+ σ

4

σ X

{X E (X )} 2

2 σ XY E {(Y E (Y )}{X E (X )}

σ

2

X

2

XY

Y + σ

2

σ

2

X

2

XY

2 σ

σ

2

X

= σ Y 1 σ

2

σ

2

X σ

2

XY

Y = σ Y 1 ρ

2

2

2

XY .

a

E (XY ) = E (X 3 ) =

a

3

x

2a dx = x 4

8a

a

a

= 0

and E (X ) = 0 so that σ XY = E (XY ) E (X )E (Y ) = 0 0 = 0. Since Y is determined by X one suspects that X and Y are not independent. This can be proved by ﬁnding set A 1 and A 2 such that P {X 1 A 1 and Y A 2 } = P {X A 1 }P {Y A 2 }. This is easy. For example,

1/2 = P {|X | > a/2 and Y > (a/2) 2 }

= P {|X | > a/2}P {Y > (a/2) 2 } = (1/2)(1/2).

There is an error on page 55. The MAP estimator is 4/5, not 5/6. This can be shown by ﬁnding the value of θ that maximizes f (θ |3) = 30 θ 4 (1 θ ). Thus, one solves

0 = d θ 30 θ 4 (1 θ ) = 30 θ 3 (4 5θ )

d

whose solution is θ = 4/5.

(a) Since the kurtosis of a N (µ, σ 2 ) random variable is 3, E (X µ) 4 = 3σ 4 . Therefore, for a random variable X that is 95% N (0, 1) and 5% N (0, 10), we have

E (X 4 ) = (0.95)(3)(1 4 ) + (0.05(3)(10 4 ) = 1502.9

and

E (x 2 ) = (0.95)(1 2 ) + (0.05)(10 2 ) = 5.9500.

Therefore, the kurtosis is

1502.9

5.95 2

= 42.45.

2 Probability and Statistical Models

5

(b)

One has that E (X 4 ) = 3p + 3(1 p )σ 4 and E (X 2 ) = p + (1 p )σ 2 so that the kurtosis is

3{p + (1 p )σ 4 }

p 2 + 2p (1 p )σ 2 + (1 p ) 2 σ 4 .

(c)

For any ﬁxed value of p less than 1,

lim

σ

→∞

3{p + (1 p )σ 4 }

3

p 2 + 2p (1 p )σ 2 + (1 p ) 2 σ 4 = 1 p .

(d)

Therefore, by letting σ get very large and p get close to 1, the kurtosis can be made arbitrarily large. Suitable σ and p such that the kurtosis is greater than 10,000 can be found by ﬁxing p such that 3/(1 p ) > 10, 000 that then increasing σ until the kurtosis exceeds 10,000.

There is an error in the second sentence of part (d). The sentence should be “Show that for any p 0 < 1, no matter how close to 1, there is a p > p 0 and a σ , such that the normal mixture with these values of p and σ has a kurtosis at least M .” This result is similar to part (c). One can always ﬁnd a p > p 0 such that 3/(1 p ) > M and with this value of p , the kurtosis will converge to a value greater than M as σ increases to .

12. The conditional CDF is

P (X x|X > c) = P (c P < (x X > c) x)

= P (X x) P (X

c) = Φ(x/σ ) Φ(c/σ )

1 P (X x)

1 Φ(c/σ )

The conditional PDF is

.

d Φ(x/σ ) Φ(c/σ ) = (d/dx )Φ(x/σ ) Φ(c/σ ) =

dx

1 Φ(c/σ )

1 Φ(c/σ )

φ (x/σ )

σ {1 Φ(c/σ )} .

If one substitutes x = 0.25, c = 0.25, and σ = 0.3113 into this formula, the PDF is 4.002. If one substitutes a = 1.1, x = 0.25, and c = 0.25 into the Pareto PDF, that is, into ac 2 /x a+1 , the result is 4.000. Thus, the two PDFs are equal to two decimals.

13. The likelihood is

so the log-likelihood is

L(θ ) =

n

i=1

θ 1 e X i

log {L(θ )} = n log(θ ) n

θ

i=1 X i

Thus the MLE solves

.

6

2 Probability and Statistical Models

0 = d θ n log(θ ) n

d

θ

i=1 X i

= n

θ

+ n

θ

i=1 X i

2

whose solution is X .

(a)

14. The CDF is

P (Y y ) = P (3X 2 y ) = P X y 2

3

= y 2

3

for 2 < y < 3. For y 2, the CDF is 0 and for y > 5 the CDF is 1. The PDF is 1/3 for 2 < y < 5 and 0 elsewhere. The median is solution to 1/2 = (y 2)/3, that is, 3.5.

The CDF is y for 0 < y < 1, 0 for y 0, and 1 for y > 1. The PDF is (1/2)y 1 / 2 for 0 < y < 1 and 0 elsewhere. The 1st quartile is 1/4 and the third quartile is (3/4) 2 .

15. The CDF is (x 1)/4 for 1 < x < 5 so the 0.1-quantile solves 0.1 = (x 1)/4 so that x = 1.4.

The 0.1-quantile of X 1 solves 0 .1 = P (X 1 x) = P (X x 1 ) = 1 (x 1 1)/4 or 0.9 = (x 1 1)/4. Thus, the 0.1-quantile of X 1 is 1/4.6.

16. There is an inconsistency in the notation: σ XY should be changed to s XY .

(b)

(a)

(b)

(a)

s

2

d =

1

n 1

n

i=1

{(X i, 1 X 1 ) (X i, 2 X 2 )} 2

= s 2 + s 2 2

1

2

n 1

n

i=1

{(X i, 1 X 1 )(X i, 2 X 2 )} = s 2 + s 2 2 2s 1 , 2 .

1

(b) The numerators of the t -statistics are both equal to X 1 X 2 0 .

Since s 1 , 2 = 0 and n 1 = n 2 , 1/n 1 + 1/n 2 s pool = 2/n (s 2 + s 2 2 )/2 =

1

(s 2 + s 2 2 )/n . If s 1 , 2 = 0, then s d / n = (s 2 + s 2 2 )/n . Thus, the de- nominators of the two t-statistics are also equal.

1

1

3

Returns

1. (a)

2.

(b)

(c)

(a)

(b)

R 2 = P 2 + D 2 1 = 56.2

P 1

51

1.

R 4 (3) = P 4 + D 4

P 3

P 3 + D 3 P 2 + D 2

P

2

P

1

=

58.25 53.25 56.2

53

56

51

1.

N (0.3, 1.8).

r 3 = log P 3 + D 3

P

2

Φ 2 0.3

1.8

= log 53.25

56

= 0.897.

.

1

3.

(c)

(d)

Var(r 2 ) = 0.6.

Given r t 2 , r t (3) is distributed N {r t 2 + (2)(0.1), (2)(0.6)} so given

r t 2 = 0.8, r t (3) is N (1.0, 1.2).

The expected value and standard deviation of the 20-year log-return are (0.1)(20) = 2 and (0.2)(20) = 4, respectively. According to the geometric random walk model, the log returns are normally distributed, so the 0.05- quantile of the 20-year log return is 2 (1.654)(4) = 4.58 and therefore the 0.05-quantile of the 20-year return is e 4 . 58 = 0.0103. The 0.95-quantile of the log return is 2 + (1.645)(4) = 8.56 and the 0.95-quantile of the return is e 8 . 56 = 5321.

8

3 Returns

 (a) 4. Two examples of measurable uncertainty are random sampling from a populations and computer simulation. Of course, there are many more. In random sampling each possible sample has a known probability of being chosen. In computer simulation, random variables have known distributions speciﬁed by the programmer. (b) Again, there is an almost inﬁnite choice of examples. One example

is the uncertainty about whether in will rain tomorrow. In this case, meteorologists estimate the probably from past data and models of the atmosphere.

5. We have X k = X 0 exp(Z 1 + · · · + Z k ) where Z 1 , Therefore, X is lognormal, in particular, X

2

k

2

k

, Z k are iid N (µ, σ 2 ).

2

= X exp(W ) where W

0

is N (2kµ, 42 ). [Note: It is assumed here that X 0 is a ﬁxed constant. If instead X 0 is a random variable, then the expectations and quantiles being found here are conditional expectations and quantiles given X 0 . The unconditional expectations and quantiles cannot be found since the distribution of X 0 has not been speciﬁed.]

(a)

It follows that E (X ) = X exp{2+ (42 )/2} by the formula for the mean of a lognormal distribution.

(b)

2

k

2

0

W has a N (kµ, kσ 2 ) density which is

f W (w ) =

2πkσ 2 exp

1

2kσ 2 (w ) 2 .

1

First note that X k = g (W ) where g (w ) = X 0 exp(w ) so that W = h (X k ) where h (y ) = log(y/X 0 ) is the inverse of g . Also, h (y ) = 1/y , so by the change of variables formula (2.10) the density of X k is

(c)

f X k (x) =

x 2πkσ 2 exp 1

1

2kσ 2 [log(x) − { + log(X 0 )} ] 2 .

The third quartile of X k is the solution x to

.75 = P {X k x} = P {log(W ) log(x/X 0 )} = Φ log(x/X 0 ) kµ .

Therefore,

log(x/X 0 )

= Φ 1 (.75) = 0.6745.

(In MATLAB, norminv(.75) = 0.6745.) Therefore,

x = X 0 exp{ (0.6745) + }.

[Another method of solution is to notice that since W is N (kµ, kσ 2 )

the third quartile of W is + 0.6745 (see Section 2.8.2, subsec- tion “normal quantiles”). Then since X k = X 0 exp(W ) is a monoton- ically increasing function g (w ) = X 0 exp(w ) of W , the third quartile

of X k is x = g (+ 0.6745) = X 0 exp{ (0.6745) + }.]

3 Returns

9

6. Data before 1998 should not be used to test the Super Bowl theory, since these data were already used to formulate the hypothesis. To test the hypothesis one would need to collect data for a number of years past 1998. Once this is done, there are a number of ways to test the Super Bowl theory. One way would be the independent samples t -test, the ﬁrst sample being the years when the a former NFL wins the Super Bowl and the second sample being the years when a former AFL team wins. The null hypothesis would be H 0 : µ 1 = µ and the alternative would be H 1 :

µ 1 > µ 2 since the Super Bowl theory predicts that µ 1 > µ 2 . Deﬁning bull and bear markets can be tricky, as Malkiel discusses. He mentions the possibility that the market could be down for most of the year but then recovers at the end of the year. Is this a bull or bear market? One possibility is to deﬁne a bull (bear) market to occur if the net return for the year is positive (negative). Let p 1 and p 2 be the probabilities of bull markets in years when a former NFL, respectively, former AFL team wins. To test if a former NFL team winning predicts a bull market, we could test the null hypotheses H 0 :

p 1 = 1/2 versus H 1 : p 1 > 1/2. Similarly to test whether a former AFL team winning predicts a bear market we could test H 0 : p 2 = 1/2 versus H 1 : p 2 < 1/2. Another set of hypotheses to test is the null hypothesis H 0 : p 1 = p 2 versus the alternative H 1 : p 1 > p2. These could be easily tested by a likelihood ratio test.

4

Time Series Models

1. The process is stationary since φ = 0.7 so that |φ | < 1. Strictly speak- ing, the process is only stationary if it started in the stationary distri- bution. If the process has been operating for a reasonably long time (say 10 time periods) we can assume that it has converged to the sta- tionary distribution. In the remaining parts of the problem we will assume that it is in the stationary distribution.

(a)

(b)

(c)

(d)

µ =

γ (0) =

5

1 0.7 .

2

1 0.7 2 .

γ (h ) = 2(0.7 | h| )

1 0.7 2 .

2. Var(Y 1 ) = γ (0) = σ /(1 φ 2 ) = 2/(1 (.3) 2 ) = 2.198.

(a)

(b)

2

Cov(Y 1 , Y 3 ) = γ (2) = σ

(φ ) 2 = 2(.3) 2

2

1 (.3) 2 = .1978.

1 φ 2

Var {(Y 1 +Y 3 )/2} = (1/4)Var(Y 1 ) +(1/4)Var(Y 2 ) + 2(1/2)(1/2)Cov(Y 1 , Y 3 ) = (1/2)(2.198) + (1/2)(.1978) = 1.1978. (Note: Var(Y 1 ) =Var(Y 3 ) be- cause the process is stationary.)
3.

(c)

y n+1 = 102 + 0.5(99 102) + 0.2(102 102) + 0.1(101 102) = 100.4.

y n+2 = 102 + 0.5(100.4 102) + 0.2(99 102) + 0.1(102 102) = 100.6.

12

4 Time Series Models

4.

ARIMA(0,1,0) can be written as ∆y t = y t y t 1 = t so that

y t = y t 1 + t =

y t 2

+ t 1 + t = · · · = y + 0 + 1 + · · · + t

5.

which is the deﬁnition of a random walk.

Without loss of generality, we can assume that µ = 0 since the covariance are independent of µ. Since Y t = t θ 1 t 1 θ 2 t 2 ,

γ (0) = Var( t ) + θ 2 Var( t 1 ) + θ Var( t 2 ) = (1 + θ 2 + θ )σ

1

2

2

1

2

2

2

.

2

Similarly, γ (1) = (θ 1 + θ 1 θ 2 )σ , γ (2) = θ 2 σ

2 , and γ (k ) = 0 for k 3.

The autocorrelation function is ρ(0) = 1, ρ(1) = (θ 1 + θ 1 θ 2 )/(1+ θ 2 + θ ),

1

2

2

2

ρ(2) = θ 2 /(1 + θ 2 + θ ), and

1

2

ρ(k ) = 0 for k 3.

6. (a)

Again, assume that µ = 0. Then using the hint

γ (k ) = Cov(y t , y t k ) = Cov(φ 1 y t 1 + φ 2 y t 2 + t , y t k )

(b)

(c)

= φ 1 Cov(y t 1 , y t k ) + φ 2 Cov(y t 2 , y t k ) = φ 1 γ (k 1) + φ 2 γ (k 2).

for any k > 0. (If k = 0, then there is a non-zero covariance between t and y t k , which adds another term to the equation.)

Using the result in (a) with k = 1, we get ρ(1) = φ 1 ρ(0) + φ 2 ρ(1) = φ 1 + φ 2 ρ(1). Using this result with k = 2 we get ρ(2) = φ 1 ρ(1) + φ 2 ρ(0) = φ 1 ρ(1) + φ 2 .

Then

φ 2 = φ

1

1

0.4

0.4

1

1

0.4

0.2

=

0.3810

0.0475

.

ρ(3) = (0.4)(0.3810) + (0.2)(0.0476) = 0.16192.

7.

8.

2

The covariance between t i and t+ h j is σ if j = i + h and is zero otherwise. Therefore,

Cov

i=0

ti φ i ,

j

=0

t+ hj φ j

=

i=1

σ

2

φ i φ i+ h = σ

2 φ | h|

1 φ 2 .

∆w t = (w t 0 + Y t 0 + · · · + Y t 1 + Y t ) (w t 0 + Y t 0 + · · · + Y t 1 ) = Y t .

9.

(a) The series in the bottom panel is such that its ﬁrst diﬀerence is non- stationary and tend to wander. In fact, its ﬁrst diﬀerence is the series in the middle panel. We can see that the the ﬁrst diﬀerence spends long periods where it is always positive and also long periods where it is always negative. During a period when the ﬁrst diﬀerence is pos- itive the series is constantly moving upward. Similarly, when the ﬁrst

4 Time Series Models

13

diﬀerence is negative the series is constantly moving downward. This

is why the series in the bottom panel shows momentum. In contrast,

the ﬁrst diﬀerence of the series in the middle panel is the series in the

top panel which is stationary with mean 0 and only short term corre- lation. The series in the top panel never stays positive or negative for

a long period. Thus, the series in the middle panel does not move in

a constant direction for long periods.

(b) The series in the bottom panel would not be a good model for a stock

price. Under such a model it is possible to predict the direction of price movement which would allow one to make nearly certain short term proﬁts. Such market conditions would not last very long since traders would rush in to take advantage of this opportunity.

5

Portfolios

1. (a)

0.03 = (0.02)w + (0.05)(1 w ) = 0.05 0.03w w = 2/3.

(b) We need to ﬁnd w that solves

( 5/100) 2 = w 2 ( 6/100) 2 + ( 11/100) 2 (1 w ) 2

+(2)( 6/100)( 11/100)w (1 w )

or

15.3752w 2 20.3752w + 6 = 0.

The solutions are 0.8835 and 0.4417. We see from the equation in part (a) that the expected return is decreasing in w so that the smaller w , that is, 0.4417, gives the higher expected return.

2.

3. (a)

2/7 in risk-free, 3.7 in C, and 2/7 in D.

w =

(85)(300)

(85)(300) + (35)(100) .

(b)

1 w =

(35)(100)

(85)(300) + (35)(100) .

4. The equation

w j =

n j P j

N

k =1 n k P k

.

(5.1)

is true if R P is a net or gross return, but (5.1) not in general true if R P is a log return. However, if all the net returns are small in absolute value,

R P = w 1 R 1 + · · · + w N R N

16

5 Portfolios

then the log returns are approximately equal to the net returns and (5.1) will hold approximately. Let us go through an example ﬁrst. Suppose that N = 3 and the initial portfolio has \$500 in asset 1, \$300 in asset 2, and \$200 in asset 3, so the initial price of the portfolio is \$1000. Then the weights are w 1 = 0.5, w 2 = 0.3, and w 3 = 0.2. (Note that the number of shares being held of each asset and the price per share are irrelevant. For example, it is immaterial whether asset 1 is \$5/share and 100 shares are held, \$10/share and and 50 shares held, or the price per share and number of shares are any other values that multiply to \$500.) Suppose the gross returns are 2, 1, and 0.5. Then the price of the portfolio at the end of the holding period is

500(2) + 300(1) + 200(0.5) = 1400

and the gross return on the portfolio is 1.4 = 1400/1000. Note that

1.4 = w 1 (2) + w 2 (1) + w 3 (0.5) = (0.5)(2) + (0.3)(1) + (0.2)(0.5).

so (5.1) holds for gross return. Since a net return is simply the gross return minus 1, if (5.1) holds for gross returns then in holds for net returns, and vice versa. The log returns in this example are log(2) = 0.693, log(1) = 0, and log(0.5) = log(2) = 0.693. Thus, the right hand side of (5.1) when

R 1 ,

(0.5 0.2)(0.693) = 0.138

but the log return on the portfolio is log(1.4) = 0.336 so (5.1) does not hold for log returns. In this example, equation (5.1) is not even a good approximation because two of the three net returns have large absolute values. Now let us show that (5.1) holds in general for gross returns and hence

, P N be the prices of assets 1 through N in the

portfolio. (As in the example, P j is the price per share of the j th asset

times the number of shares in the portfolio.) Let R 1 ,

for net returns. Let P 1 ,

, R N are log returns is

, R N be the net

returns on these assets. The j th weight is equal to the ratio of the price of the j th asset in the portfolio to the total price of the portfolio which is

w j =

P j

N

i=1 P i

.

At the end of the holding period, the price of the j th asset in the portfolio has changed from P j to P j (1+ R j ), so that the gross return on the portfolio is

N

j

=1 P j (1 + R j )

N

i=1 P i

=

N

j

=1

i=1 P i (1 + R j ) =

P

j

N

which proves (5.1) for gross returns.

N

i= j

w j (1 + R j ),

5 Portfolios

17

5. Use the formula

a

c

d b 1

=

1

b

a

.

Substituting the determinant is ab cd = σ σ (1 ρ 12 ) and

1

2

2

2

a

c

d = b

σ

ρ 12

2

1

σ 1 σ 2

ρ 12

σ

σ 1 σ 2

2

2

into this formula and simplifying gives (5.34).

6

Regression

1. (a)

(b)

2. The likelihood is

E (Y i |X i = 1) = 3.

σ Y i | X i =1 = 0.6 .

P (Y i 3|X i = 1) = Φ 3 3 = 1

0.6

2 .

L(β 0 , β 1 , σ 2 ) =

n

i=1

2πσ 2 exp 2σ 2 ( Y β 0 β 1 X i )

1

1

= (2π ) n/ 2 σ 1 exp 2σ 1 2

n

i=1

( Y i β 0 β 1 X i ) 2 .

Therefore, L(β 0 , β 1 , σ 2 ) is maximized over (β 0 , β 1 ) by minimizing

n

i=1

(Y i β 0 β 1 X i ) 2 ,

so the least-squares estimator is the maximum likelihood estimator.

3. First

β 1 E ( β 1 ) =

n

i=1

w

i i .

Since the i are independent of the X i by (6.2) and since we are condition- ing on the X i we can treat the w i as ﬁxed weights. Therefore, by (2.55) and the independence of the i (by (6.1)) we have that

20

6 Regression

Var( β 1 ) =

n

i=1

2

w Var( i ).

i

Since Var( i ) = σ

Var( β 1 ) = σ

2

2

,

n

i=1

w

2

i

= σ

2

n i=1 (X i X ) 2

n i=1 (X i X ) 2 2 =

σ

2

n i=1 (X i X ) 2 .

4. The X i are 1 + (i 1)/29, i = 1,

(a)

n

i=1 X i

2

= 1124, n

i=1 X i

3

, 30. Therefore, i=1 n X i = 165,

69, 548. It

= 8562.9, and n

i=1 X i

4

=

follows that

2

Corr(X i , X )

i

=

1

n n i=1 X i

3

(

1

n n i=1 X i )( n n i=1 X i

1

2

)

1

n n i=1 X i

2

= 0.9770.

1

n n i=1 X i 2 1 / 2 n n i=1 X i

1

4

1

n n i=1 X i

2

2 1 / 2

The VIFs are both 1/(1 0.9770) = 43.48.

(b)

Since n i=1 (X i X ) 3 = 0 and i=1 n (X i X ) = 0,

Cov((X i X )