You are on page 1of 10

Simple Linear Regression Maximum Likelihood

Estimation
January 20, 2010
Tiejun (Ty) Tong
Department of Applied Mathematics
Simple Linear Regression
A simple linear regression model is dened as
Y
i
= a
0
+a
1
x
i
+c
i
,
where
Y
i
is the response values,
x
i
is the predictor values,
a
0
is the intercept,
a
1
is the slope,
c
i
are i.i.d. random variables from N(0, o
2
).
For ease of notation, denote x =
1
n

n
i =1
x
i
,

Y =
1
n

n
i =1
Y
i
,
S
xx
=
n

i =1
(x
i
x)
2
, and S
xy
=
n

i =1
(x
i
x)(Y
i


Y).
Least Square Estimation
The LS estimates of a
0
and a
1
are dened to be the values of

a
0
and

a
1
such that the line

a
0
+

a
1
x minimizes the RSS.
(

a
0
,

a
1
) = argmin
c,d
n

i =1
(Y
i
(c + dx
i
))
2
.
The LS estimators of a
0
and a
1
are

a
1
= S
xy
,S
xx
,

a
0
=

Y

a
1
x.
Given

a
0
and

a
1
, the tted linear regression model is

Y =

a
0
+

a
1
x.
Least Square Estimation
The dierence between the observed value Y
i
and the tted
value

Y
i
is called a residual. We denote it as
e
i
= Y
i


Y
i
= Y
i
(

a
0
+

a
1
x
i
), i = 1, . . . , n.
An unbiased estimator of o
2
is given as
o
2
=
RSS
n 2
=
1
n 2
n

i =1
e
2
i
.
The coecient of determination, denoted by r
2
, is given by
r
2
= 1
RSS
SST
= 1

n
i =1
(Y
i


Y
i
)
2

n
i =1
(Y
i


Y)
2
.
Maximum Likelihood Estimation
The lease squares method can be used to estimate a
0
and a
1
regardless of the distribution form of the error term c (either
normal or non-normal errors).
For Inference problems such as hypothesis testing and
condence interval construction, we need to assume that the
distribution of the errors are known.
For a simple linear regression model, we assume that
c
i
i .i .d.
N(0, o
2
), i = 1, . . . , n.
Thus for xed design points x
i
, the observations Y
i
are
independently r.v.s with distribution
Y
i
N(a
0
+a
1
x
i
, o
2
), i = 1, . . . , n.
Maximum Likelihood Estimation
Under the normal errors assumption, the joint pdf of
Y
1
, . . . , Y
n
is
f (Y
1
, . . . , Y
n
a
0
, a
1
, o
2
)
=
n

i =1
f (Y
i
a
0
, a
1
, o
2
)
=
1
(2o
2
)
n/2
exp
{

1
2o
2
n

i =1
(Y
i
a
0
a
1
x
i
)
2
}
.
The log-likelihood function is
log L(a
0
, a
1
, o
2
Y
1
, . . . , Y
n
)
=
n
2
log(2)
n
2
log(o
2
)
1
2o
2
n

i =1
(Y
i
a
0
a
1
x
i
)
2
.
Maximum Likelihood Estimation
Taking the rst partial derivatives of the log-likelihood
function on a
0
, a
1
and o
2
, we have
n

i =1
(Y
i
a
0
a
1
x
i
) = 0,
n

i =1
x
i
(Y
i
a
0
a
1
x
i
) = 0,
n

i =1
(Y
i
a
0
a
1
x
i
)
2
= no
2
.
Solving the above equations leads to

a
1,ML
= S
xy
,S
xx
,

a
0,ML
=

Y

a
1,ML
x, and o
2
ML
=
1
n
n

i =1
e
2
i
.
Note that the ML estimators of a
0
and a
1
are identical to the
LS estimators of a
0
and a
1
.
Properties of

0
and

1
First,

a
0
and

a
1
can be represented as linear combinations
of the observations Y
i
:

a
1
=
1
S
xx
n

i =1
(x
i
x)(Y
i


Y) =
n

i =1
c
i
Y
i
,

a
0
=
1
n
n

i =1
Y
i

n

i =1
c
i
xY
i
=
n

i =1
(
1
n
c
i
x)Y
i
,
where c
i
= (x
i
x),S
xx
.
Second,

a
0
and

a
1
are unbiased estimators of a
0
and a
1
,
respectively. For example,
E(

a
1
) =
n

i =1
c
i
(a
0
+a
1
x
i
) = a
0
n

i =1
c
i
+a
1
n

i =1
c
i
x
i
= a
1
,
where

n
i =1
c
i
= 0 and

n
i =1
c
i
x
i
= 1.
Properties of

0
and

1
The variances of

a
0
and

a
1
are
Var(

a
1
) =
n

i =1
c
2
i
Var(Y
i
) =
o
2
S
xx
,
Var(

a
0
) = Var(

Y) + x
2
Var(

a
1
) = o
2
(
1
n
+
x
2
S
xx
),
where

n
i =1
c
2
i
= 1,S
xx
, and the covariance of

Y and

a
1
is
zero.
Lastly, it can be shown that

a
0
and

a
1
are the Best Linear
Unbiased Estimators (BLUE) of a
0
and a
1
, where the
best implies the minimum variance. This result is called the
Gauss-Markov Theorem.
Distributions of the Estimators
Theorem: Let Z
1
, . . . , Z
n
be mutually independent random
variables with Z
i
N(j
i
, o
2
i
). Let a
1
, . . . , a
n
and b
1
, . . . , b
n
be xed constants. Then
Z =
n

i =1
(a
i
Z
i
+ b
i
) N
(
n

i =1
(a
i
j
i
+ b
i
),
n

i =1
a
2
i
o
2
i
)
.
The distributions of

a
0
and

a
1
are

a
0
N
(
a
0
, o
2
(
1
n
+
x
2
S
xx
)
)
,

a
1
N
(
a
1
,
o
2
S
xx
)
.
Furthermore, (

a
0
,

a
1
) and o
2
(unbiased estimator) are
independent and
(n 2) o
2
o
2

2
n2
.