Sie sind auf Seite 1von 41

Multiple linear regression

Multiplelinearregression
Objectivesofthetopic:

Buildingmultiplelinearregressionmodelstodata.
Building
multiple linear regression models to data
Applyingthemethodofleastsquarestoestimateregression
modelparameters.
Assessing the adequacy of the regression model
Assessingtheadequacyoftheregressionmodel.
Testinghypothesesandconstructingconfidenceintervalson
regressionmodelparameters.
Predicting future values and constructing prediction intervals.
Predictingfuturevaluesandconstructingpredictionintervals.

DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

Multiplelinearregressionmodels
Regressionmodelshavingmorethanoneregressorvariable
arecalledmultiplelinearmodels.
TheresponseYmayberelatedtokregressorvariablesbythe
model:
Y = 0 +
Y=
+ 1 x1 +
+ 2 x2 ++
+ + k xk +
+
Theparametersj arecalledtheregressioncoefficients.
Thetermlinearisusedbecausethemodelisalinearfunction
oftheunknownparametersj.
Theparameterj representstheexpectedchangeinresponse
Y
Yperunitchangeinx
it h
i j whenalltheremainingregressorsare
h
ll th
i i
heldconstant.

DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

Leastsquaresestimations
Assumen>kobservationsareavailable.
Letxij betheith observationofvariablexj.
Theith observationcanbewrittenas
(xi1,xi2,,xik,yi)
Itiscustomarytopresentthedataformultipleregressionina
tablelike:
y

x1

x2

xk

y1

x11

x12

x1k

y2

x21

x22

x2k

yn

xn1

xn2

xnk

DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

Eachobservationsatisfiesthemultipleregressionmodel
yi = 0 + 1 xi1 + 2 xi 2 + ... + k xik + i
k

= 0 + j xij 1 xi1 + i
j =1

Theleastsquaresfunctionis
n

i =1

j =1

L = = ( yi 0 j xij ) 2
i =1

2
i

Theleastsquaresestimatesoftheregressioncoefficientswill
minimizeLandtheymustsatisfy
L
0

0 , 1 ,..., k

i =1

j =1

= 2 ( yi 0 j xij ) = 0

DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

and

L
j

i =1

j =1

= 2 ( yi 0 j xij ) xij = 0 j = 1,..., k


0 , 1 ,..., k

Simplificationleadstothenormalequations:
n

i =1

i =1

i =1

i =1

n 0 + 1 xi1 + 2 xi 2 + ... + k xik = yi


n

i =1

i =1

i =1

i =1

i =1

0 xi1 + 1 x + 2 xi1 xi 2 + ... + k xi1 xik = xi1 yi


M

2
i1

i =1

i =1

i =1

i =1

i =1

0 xik + 1 xik xi1 + 2 xik xi 2 + ... + k xik2 = xik yi


DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

Example
Seeexample121andthe
solutioninthetextbook.
Amultipleregressionmodelwillbe
fittedtothedata:
Y = 0 +
Y=
+ 1 x1 +
+ 2 x2 +
+
Thenormalequationsfortwo
regressorvariablesare
n
n
n
n +
x +
x =
y
0

i =1

i1

i =1

i2

Observation

Pullstrength

Wirelength

Dieheight

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

Y
9.95
24.45
31 75
31.75
35
25.02
16.86
14.38
9.6
24.35
27.5
17.08
37
41.95
11.66
21.65
17.89
69
10.3
34.93
46.59
44.88
54.12
56.63
22.13
21.15

x1
2
8
11
10
8
4
2
2
9
8
4
11
12
2
4
4
20
1
10
15
15
16
17
6
5

x2
50
110
120
550
295
200
375
52
100
300
412
400
500
360
205
400
600
585
540
250
290
510
590
100
400

i =1

i =1

i =1

i =1

i =1

i =1

i =1

i =1

0 xi1 + 1 xi21 + 2 xi1 xi 2 = xi1 yi


n

0 xi 2 + 1 xi 2 xi1 + 2 x = xi 2 yi
2
i2

i =1

DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

Fromthedata,theessentialsummationsarecalculatedas
25

25

25

i =1

i =1

i =1

n = 25, yi = 725.82, xi1 = 206, xi 2 = 8,294


25

25

i =1

i =1

2
2
x
=
2
,
396
,
x
i1
i 2 = 3,531,848
25

x
i =1

x = 77,177, xi1 yi = 8,008.47

i1 i 2

25

x
i =1

25

i2

i =1

yi = 274,816.71

DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

Thenormalequationscanbewrittenspecificallyas

25 0 + 206 1 + 8294 2 = 725.82


206 + 2396 + 77,177 = 8,008.47
0

8294 0 + 77,177 1 + 3,531,848 2 = 274,816.71


Thesolutionoftheofthissystemoflinearequationsis

0 = 2.27,

1 = 2.74,

2 = 0.01

Hence,thefittedmultipleregressionmodelis
Hence, the fitted multiple regression model is
y = 2.27 + 2.74 x1 + 0.01x2

DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

Matrixnotation
Itismuchmoreconvenienttowritethemultipleregression
intoamatrixnotation.
Theregressionmodel

yi = 0 + 1 xi1 + 2 xi 2 + ... + k xik + i


canbeexpressedinmatrixnotationas
where
y1
y
y = 2
M

yn

y = X +

1 x11
1 x
21
X=
M M

1 xn1

x12
x22
M
xn 2

L x1k
0

L x2 k
= 1
M
O M


L xnk
k

DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

1

= 2
M

n

Thesumofresidualsquaresis
n

L = i2 ='= (y X ) ' (y X )
i =1

Thenormalequationsare

X' X = X' y
Thesolutionoftheseequationsleadtotheleastsquares
estimate of :
estimateof
= (X' X )1 X' y
Thefittedregressionmodelis
y = x'
DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

Example
Seeexample122andthe
solutioninthetextbook.
Amultipleregressionmodelwillbe
fittedtothedata:
Y = 0 +
Y=
+ 1 x1 +
+ 2 x2 +
+

Observation

Pullstrength

Wirelength

Dieheight

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

Y
9.95
24.45
31 75
31.75
35
25.02
16.86
14.38
9.6
24.35
27.5
17.08
37
41.95
11.66
21.65
17.89
69
10.3
34.93
46.59
44.88
54.12
56.63
22.13
21.15

x1
2
8
11
10
8
4
2
2
9
8
4
11
12
2
4
4
20
1
10
15
15
16
17
6
5

x2
50
110
120
550
295
200
375
52
100
300
412
400
500
360
205
400
600
585
540
250
290
510
590
100
400

DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

TheXmatrixandtheyvectorare

DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

Thevaluesofthemultiplicationsare

206
8,294
25
X' X = 206 2,396
77,177
8,294 77,177 3,531,848
725.82
X' y = 8,008.47
274,816.71

DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

Theleastsquaresestimatesare
1
0 25
206
8,294 725.82

8,008.47
206
2
,
396
77
,
177

=
1


8,294 77,177 3,531,848 274,816.71

2
2.27
= 2.74
0.01

The
Theregressionmodelcanbewrittenas
regression model can be written as
2.27
y = [1 x1 x2 ] 2.74
0.01
DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

Estimatingthevariance
Theunbiasedestimatorofthevarianceis
n

2 =

2
e
i

SS E
=
n p n p
i =1

wherenisthenumberofobservationsandpisthenumberof
regression coefficients
regressioncoefficients.

DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

Propertiesoftheleastsquaresestimators
ItisassumedthattheerrorsIarestatisticallyindependent
withmeanzeroandvariance2.
Underthisassumption,theleastsquaresestimatorsare
unbiasedestimatorsofthetrueregressioncoefficients.
TheinverseofX
The inverse of X'X
X times
times 2 representsthecovariancematrix
represents the covariance matrix
oftheregressioncoefficientestimates.
DefinethematrixCas

C00
C
1
C = (X' X ) = 10
M

Ck 0

C01
C11
M
Ck1

L C0 k
L C1k
O M

L Ckk

DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

ThematrixC issymmetric;C10 =C01,C20 =C02,etc.


Thediagonalelementsof2 C arethevariancesofthe
coefficientestimates,andtheoffdiagonalelementsarethe
covariancesofthecoefficientestimates:

V ( j ) = 2C jj
cov( i , j ) = 2Cijj

i j

Theestimatedstandarderroris

se( j ) = 2C jj

DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

Example
Theregressionmodel

y = 2.27 + 2.74 x1 + 0.01x2


hasbeenfittedtothedata.
Theerrorsumofsquaresis
SSE =139.19
Then,theestimateofthe
variance is
varianceis
139.19
2 =
= 6.33
25 3

Pull
Observation strength
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

Y
9.95
24.45
31.75
35
25.02
16.86
14.38
9.6
24.35
27.5
17.08
37
41.95
11.66
21.65
17.89
69
10.3
34.93
46.59
44.88
54.12
56.63
22.13
21.15

DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

Wire
length

Dieheight

x1
2
8
11
10
8
4
2
2
9
8
4
11
12
2
4
4
20
1
10
15
15
16
17
6
5

x2
50
110
120
550
295
200
375
52
100
300
412
400
500
360
205
400
600
585
540
250
290
510
590
100
400

yi
8.25
25.29
33.61
35.17
27.14
15.23
11.5
8.27
27.93
27.19
17.35
36.41
40.15
11.35
15.28
17.23
63.07
10.86
35.07
45.87
46.27
51.21
54.75
19.71
19.97

( yi yi )2
2.89
0.71
3.46
0.03
4.49
2.66
8.29
1.77
12.82
0.1
0.07
0.35
3.24
0.1
40.58
0.44
35.16
0.31
0.02
0.52
1.93
8.47
3.53
5.86
1.39

TheC matrixisequalto

206
8,294 1 0.21
0 21 - 0.01
0 01 0
25
(X' X )1 = 206 2,396 77,177 = - 0.01 0 0
8,294 77,177 3,531,848
0
0 0
Thecovariancematrixis
1.33 - 0.06 0
1
cov( ) = 2 (X' X ) = - 0.06 0 0
0
0 0

DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

Fromthecovariancematrix:
V ( ) = 1.33
0

V ( 1 ) 0
V ( ) 0
2

cov( 0 , 1 ) = -0.06
cov( , ) 0
0

cov( 1 , 2 ) 0

DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

Thepredictedresponseforx1 =4andx2 =201is

2.27
y = [1 4 201]2.74 = 15.24
0.01

DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

Significanceofregression
Thetestforsignificanceofregressionisstatedas
H0 :1 =2 ==k =0
H1 :j 0
foratleastonej
RejectionofH0 impliesthatatleastoneoftheregressor
variablescontributesignificantlytothemodel.
ariables contrib te significantl to the model
Theteststatisticis
SS R / k
MS R
F0 =
=
SS E /(n p ) MS E
ThenullhypothesisH0 shouldberejectedf0 >f,k,n
,k,np
p.

DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

Thecomputationsareorganizedintheanalysisofvariance
table:
Sourceof
variation

Sumof
squares

Degreesof
freedom

Mean
squares

Fo

Regression

SSR

MSR

MSR/MSE

Error

SSE

np

MSE

Total

SST

n1

(
(
y)
y)

=y
= y' y
n
n
(
y)

= ' X' y
2

where

SST

SS R

SS E = SST SS R
DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

Example
Refertothegivendata.
Wewanttotestsignificanceof
regression.
Thetotalsumofsquaresis
SST =27,178.53
27 178 53 725.82
725 822/25=
/25
6105.94
Theregressionsumofsquaresis
g
q
SSR =5990.77
Theerrorsumofsquaresis
SSE =6105.94 5990.77=115.17

Observation

Pullstrength

Wirelength

Dieheight

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

Y
9.95
24.45
31 75
31.75
35
25.02
16.86
14.38
9.6
24.35
27.5
17.08
37
41.95
11.66
21.65
17.89
69
10.3
34.93
46.59
44.88
54.12
56.63
22.13
21.15

x1
2
8
11
10
8
4
2
2
9
8
4
11
12
2
4
4
20
1
10
15
15
16
17
6
5

x2
50
110
120
550
295
200
375
52
100
300
412
400
500
360
205
400
600
585
540
250
290
510
590
100
400

DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

Theanalysisofvariancetableis
Sourceof
variation

Sumof
squares

Degreesof
freedom

Mean
squares

fo

Regression

5990.77

2995.39

571.64

Error

115.17

22

5.24

Total

6105.94

24

Since
Sinceff0 =571.64>f
= 571 64 > f0.05,2,22 =3.44,H
= 3 44 H0 shouldberejected.
should be rejected
Theconclusionisthatx1 orx2 orbothexplainthevariabilityin
theresponse.

DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

Coefficientofmultipledetermination
Thecoefficientofmultipledeterminationis

SS R
SS E
R =
= 1
SST
SST
WhenR2 =0.98,itcanbesaidthattheregressionmodel
acco nts for abo t 98% of the ariabilit in the response
accountsforabout98%ofthevariabilityintheresponse.
TheadjustedR2 iscomputedas
SS E /( n p )
2
Radj = 1
SST /(n 1)
2

TheadjustedR2 penalizesforaddingtermstothemodel,soit
guardsagainstoverfitting.

DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

Hypothesesonindividualcoefficients
Totestthepotentialusefulnessofoneoftheregressor
variables,thefollowinghypothesisisconsidered:
H0:j =j0
H1:j j0
Theteststatisticsis
The test statistics is
j j 0
T0 =
2C jj
Thenullhypothesisisrejectedif|t0|>t/2,n
/2,np
p.

DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

Example
Theregressionmodel

Observation

Pullstrength

Wirelength

Dieheight

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

Y
9.95
24.45
31 75
31.75
35
25.02
16.86
14.38
9.6
24.35
27.5
17.08
37
41.95
11.66
21.65
17.89
69
10.3
34.93
46.59
44.88
54.12
56.63
22.13
21.15

x1
2
8
11
10
8
4
2
2
9
8
4
11
12
2
4
4
20
1
10
15
15
16
17
6
5

x2
50
110
120
550
295
200
375
52
100
300
412
400
500
360
205
400
600
585
540
250
290
510
590
100
400

y = 2.27 + 2.74 x1 + 0.01x2


hasbeenfittedtothedata.
Theestimateofthevarianceis
6 33
6.33.
Wewanttotestthesignificance
oftheregressorvariablex
g
2.
Therelevanthypothesisis
H0:2 =0
H1:2 0

DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

FromtheC matrix,C22 =1.5106.


Theteststatisticis
0.01
t0 =
= 3.24
6
(6.33)(1.5 10 )
Fromthettables,t0.025,22 =2.074.
Sincet
Since t0 >t
> t0.025,22,H
H0 should
shouldberejectedanditisconcluded
be rejected and it is concluded
thatdieheightcanexplainthevariabilityinpullstrength.

DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

Confidenceintervalontheregressioncoefficients
Therandomvariable
j j
T0 =
2C jj
hasatdistributionwithnpdegreesoffreedom.
A
A100(1
100(1 )%confidenceintervalontheregressioncoefficient
)% confidence interval on the regression coefficient
j isgivenby

j t / 2,n p 2C jj j j + t / 2,n p 2C jj

DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

Example
Theregressionmodel

Observation

Pullstrength

Wirelength

Dieheight

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

Y
9.95
24.45
31 75
31.75
35
25.02
16.86
14.38
9.6
24.35
27.5
17.08
37
41.95
11.66
21.65
17.89
69
10.3
34.93
46.59
44.88
54.12
56.63
22.13
21.15

x1
2
8
11
10
8
4
2
2
9
8
4
11
12
2
4
4
20
1
10
15
15
16
17
6
5

x2
50
110
120
550
295
200
375
52
100
300
412
400
500
360
205
400
600
585
540
250
290
510
590
100
400

y = 2.27 + 2.74 x1 + 0.01x2


hasbeenfittedtothedata.
Theestimateofthevarianceis
6 33
6.33.
Wewanttoconstructa95%
confidenceintervalon1.
FromtheC matrix,C11 =1.7103.

DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

The95%confidenceintervalon1 is
2.74 2.07 6.33(1.7 10 3 ) 1 2.74 + 2.07 6.33(1.7 10 3 )
2.53

DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

2.95

Confidenceintervalonmeanresponse
Definethevectorx0:

1
x
01
x 0 = x02

M
x0 k
Themeanresponseatx0 is

E (Y | x 0 ) = Y |x 0 = x '0
Whichhasanestimateas

Y |x 0 = x '0
DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

Thevarianceofthemeanresponseis

V Y |x 0 = 2 x '0Cx 0
A100(1)%confidenceintervalonthemeanresponseatx0 is
gi en b
givenby

Y |x 0 t / 2,n p 2 x '0Cx 0 Y |x 0 Y |x 0 + t / 2,n p 2 x '0Cx 0

DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

Example
Theregressionmodel

Observation

Pullstrength

Wirelength

Dieheight

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

Y
9.95
24.45
31 75
31.75
35
25.02
16.86
14.38
9.6
24.35
27.5
17.08
37
41.95
11.66
21.65
17.89
69
10.3
34.93
46.59
44.88
54.12
56.63
22.13
21.15

x1
2
8
11
10
8
4
2
2
9
8
4
11
12
2
4
4
20
1
10
15
15
16
17
6
5

x2
50
110
120
550
295
200
375
52
100
300
412
400
500
360
205
400
600
585
540
250
290
510
590
100
400

y = 2.27 + 2.74 x1 + 0.01x2


hasbeenfittedtothedata.
Theestimateofthevarianceis
6 33
6.33.
Wewanttoconstructa95%
confidenceintervalonthemean
responsefor
1
x 0 = 8
275
DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

Themeanresponseis

Y |x 0

2.27
= [1 8 275] 2.74 = 26.94
0.01

Thevarianceofthemeanresponseisgivenby

V Y |x 0

0 21 - 0.01
0 01 0 1
0.21
= 6.33[1 8 275] - 0.01 0 0 8
0
0 0 275
= 0.32

DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

The95%confidenceonthemeanresponseis

26.94 2.07 0.32 Y |x 0 26.94 + 2.07 0.32


25.77

Y |x 0

DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

28.11

Confidenceintervalonpredictions
Forx'0 =[1,x01,x0k,,x0k],thepointestimatorofthe
observationY0 is
y = x '
0

Thevarianceofthepredictionis
The ariance of the prediction is

V ( y 0 ) = 2 1 + x '0Cx 0

A100(1)%confidenceintervalforthepredictionY0 atx0 is
givenby

y 0 t / 2,n p 2 1 + x '0Cx 0 Y0 y 0 t / 2,n p 2 1 + x '0Cx 0

DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

Example
Theregressionmodel

Observation

Pullstrength

Wirelength

Dieheight

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

Y
9.95
24.45
31 75
31.75
35
25.02
16.86
14.38
9.6
24.35
27.5
17.08
37
41.95
11.66
21.65
17.89
69
10.3
34.93
46.59
44.88
54.12
56.63
22.13
21.15

x1
2
8
11
10
8
4
2
2
9
8
4
11
12
2
4
4
20
1
10
15
15
16
17
6
5

x2
50
110
120
550
295
200
375
52
100
300
412
400
500
360
205
400
600
585
540
250
290
510
590
100
400

y = 2.27 + 2.74 x1 + 0.01x2


hasbeenfittedtothedata.
Theestimateofthevarianceis
6 33
6.33.
Wewanttoconstructa95%
confidenceintervalonthe
predictionfor
1
x 0 = 8
275
DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

Themeanresponseis

2.27
y 0 = [1 8 275] 2.74 = 26.94
0.01
Thevarianceofthemeanresponseisgivenby

0.21 - 0.01 0 1

V( y 0 ) = 6.331 + [1 8 275] - 0.01 0 0 8

0
275

= 6.65

DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

The95%confidenceonthepredictionis
26.94 2.07 6.65 Y0 26.94 + 2.07 6.65
21.60

Y0

32.28

DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

Das könnte Ihnen auch gefallen