Beruflich Dokumente
Kultur Dokumente
Multiplelinearregression
Objectivesofthetopic:
Buildingmultiplelinearregressionmodelstodata.
Building
multiple linear regression models to data
Applyingthemethodofleastsquarestoestimateregression
modelparameters.
Assessing the adequacy of the regression model
Assessingtheadequacyoftheregressionmodel.
Testinghypothesesandconstructingconfidenceintervalson
regressionmodelparameters.
Predicting future values and constructing prediction intervals.
Predictingfuturevaluesandconstructingpredictionintervals.
DrMuhammadAlSalamah,IndustrialEngineering,KFUPM
Multiplelinearregressionmodels
Regressionmodelshavingmorethanoneregressorvariable
arecalledmultiplelinearmodels.
TheresponseYmayberelatedtokregressorvariablesbythe
model:
Y = 0 +
Y=
+ 1 x1 +
+ 2 x2 ++
+ + k xk +
+
Theparametersj arecalledtheregressioncoefficients.
Thetermlinearisusedbecausethemodelisalinearfunction
oftheunknownparametersj.
Theparameterj representstheexpectedchangeinresponse
Y
Yperunitchangeinx
it h
i j whenalltheremainingregressorsare
h
ll th
i i
heldconstant.
DrMuhammadAlSalamah,IndustrialEngineering,KFUPM
Leastsquaresestimations
Assumen>kobservationsareavailable.
Letxij betheith observationofvariablexj.
Theith observationcanbewrittenas
(xi1,xi2,,xik,yi)
Itiscustomarytopresentthedataformultipleregressionina
tablelike:
y
x1
x2
xk
y1
x11
x12
x1k
y2
x21
x22
x2k
yn
xn1
xn2
xnk
DrMuhammadAlSalamah,IndustrialEngineering,KFUPM
Eachobservationsatisfiesthemultipleregressionmodel
yi = 0 + 1 xi1 + 2 xi 2 + ... + k xik + i
k
= 0 + j xij 1 xi1 + i
j =1
Theleastsquaresfunctionis
n
i =1
j =1
L = = ( yi 0 j xij ) 2
i =1
2
i
Theleastsquaresestimatesoftheregressioncoefficientswill
minimizeLandtheymustsatisfy
L
0
0 , 1 ,..., k
i =1
j =1
= 2 ( yi 0 j xij ) = 0
DrMuhammadAlSalamah,IndustrialEngineering,KFUPM
and
L
j
i =1
j =1
Simplificationleadstothenormalequations:
n
i =1
i =1
i =1
i =1
i =1
i =1
i =1
i =1
i =1
2
i1
i =1
i =1
i =1
i =1
i =1
Example
Seeexample121andthe
solutioninthetextbook.
Amultipleregressionmodelwillbe
fittedtothedata:
Y = 0 +
Y=
+ 1 x1 +
+ 2 x2 +
+
Thenormalequationsfortwo
regressorvariablesare
n
n
n
n +
x +
x =
y
0
i =1
i1
i =1
i2
Observation
Pullstrength
Wirelength
Dieheight
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Y
9.95
24.45
31 75
31.75
35
25.02
16.86
14.38
9.6
24.35
27.5
17.08
37
41.95
11.66
21.65
17.89
69
10.3
34.93
46.59
44.88
54.12
56.63
22.13
21.15
x1
2
8
11
10
8
4
2
2
9
8
4
11
12
2
4
4
20
1
10
15
15
16
17
6
5
x2
50
110
120
550
295
200
375
52
100
300
412
400
500
360
205
400
600
585
540
250
290
510
590
100
400
i =1
i =1
i =1
i =1
i =1
i =1
i =1
i =1
0 xi 2 + 1 xi 2 xi1 + 2 x = xi 2 yi
2
i2
i =1
DrMuhammadAlSalamah,IndustrialEngineering,KFUPM
Fromthedata,theessentialsummationsarecalculatedas
25
25
25
i =1
i =1
i =1
25
i =1
i =1
2
2
x
=
2
,
396
,
x
i1
i 2 = 3,531,848
25
x
i =1
i1 i 2
25
x
i =1
25
i2
i =1
yi = 274,816.71
DrMuhammadAlSalamah,IndustrialEngineering,KFUPM
Thenormalequationscanbewrittenspecificallyas
0 = 2.27,
1 = 2.74,
2 = 0.01
Hence,thefittedmultipleregressionmodelis
Hence, the fitted multiple regression model is
y = 2.27 + 2.74 x1 + 0.01x2
DrMuhammadAlSalamah,IndustrialEngineering,KFUPM
Matrixnotation
Itismuchmoreconvenienttowritethemultipleregression
intoamatrixnotation.
Theregressionmodel
y = X +
1 x11
1 x
21
X=
M M
1 xn1
x12
x22
M
xn 2
L x1k
0
L x2 k
= 1
M
O M
L xnk
k
DrMuhammadAlSalamah,IndustrialEngineering,KFUPM
1
= 2
M
n
Thesumofresidualsquaresis
n
L = i2 ='= (y X ) ' (y X )
i =1
Thenormalequationsare
X' X = X' y
Thesolutionoftheseequationsleadtotheleastsquares
estimate of :
estimateof
= (X' X )1 X' y
Thefittedregressionmodelis
y = x'
DrMuhammadAlSalamah,IndustrialEngineering,KFUPM
Example
Seeexample122andthe
solutioninthetextbook.
Amultipleregressionmodelwillbe
fittedtothedata:
Y = 0 +
Y=
+ 1 x1 +
+ 2 x2 +
+
Observation
Pullstrength
Wirelength
Dieheight
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Y
9.95
24.45
31 75
31.75
35
25.02
16.86
14.38
9.6
24.35
27.5
17.08
37
41.95
11.66
21.65
17.89
69
10.3
34.93
46.59
44.88
54.12
56.63
22.13
21.15
x1
2
8
11
10
8
4
2
2
9
8
4
11
12
2
4
4
20
1
10
15
15
16
17
6
5
x2
50
110
120
550
295
200
375
52
100
300
412
400
500
360
205
400
600
585
540
250
290
510
590
100
400
DrMuhammadAlSalamah,IndustrialEngineering,KFUPM
TheXmatrixandtheyvectorare
DrMuhammadAlSalamah,IndustrialEngineering,KFUPM
Thevaluesofthemultiplicationsare
206
8,294
25
X' X = 206 2,396
77,177
8,294 77,177 3,531,848
725.82
X' y = 8,008.47
274,816.71
DrMuhammadAlSalamah,IndustrialEngineering,KFUPM
Theleastsquaresestimatesare
1
0 25
206
8,294 725.82
8,008.47
206
2
,
396
77
,
177
=
1
8,294 77,177 3,531,848 274,816.71
2
2.27
= 2.74
0.01
The
Theregressionmodelcanbewrittenas
regression model can be written as
2.27
y = [1 x1 x2 ] 2.74
0.01
DrMuhammadAlSalamah,IndustrialEngineering,KFUPM
Estimatingthevariance
Theunbiasedestimatorofthevarianceis
n
2 =
2
e
i
SS E
=
n p n p
i =1
wherenisthenumberofobservationsandpisthenumberof
regression coefficients
regressioncoefficients.
DrMuhammadAlSalamah,IndustrialEngineering,KFUPM
Propertiesoftheleastsquaresestimators
ItisassumedthattheerrorsIarestatisticallyindependent
withmeanzeroandvariance2.
Underthisassumption,theleastsquaresestimatorsare
unbiasedestimatorsofthetrueregressioncoefficients.
TheinverseofX
The inverse of X'X
X times
times 2 representsthecovariancematrix
represents the covariance matrix
oftheregressioncoefficientestimates.
DefinethematrixCas
C00
C
1
C = (X' X ) = 10
M
Ck 0
C01
C11
M
Ck1
L C0 k
L C1k
O M
L Ckk
DrMuhammadAlSalamah,IndustrialEngineering,KFUPM
V ( j ) = 2C jj
cov( i , j ) = 2Cijj
i j
Theestimatedstandarderroris
se( j ) = 2C jj
DrMuhammadAlSalamah,IndustrialEngineering,KFUPM
Example
Theregressionmodel
Pull
Observation strength
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Y
9.95
24.45
31.75
35
25.02
16.86
14.38
9.6
24.35
27.5
17.08
37
41.95
11.66
21.65
17.89
69
10.3
34.93
46.59
44.88
54.12
56.63
22.13
21.15
DrMuhammadAlSalamah,IndustrialEngineering,KFUPM
Wire
length
Dieheight
x1
2
8
11
10
8
4
2
2
9
8
4
11
12
2
4
4
20
1
10
15
15
16
17
6
5
x2
50
110
120
550
295
200
375
52
100
300
412
400
500
360
205
400
600
585
540
250
290
510
590
100
400
yi
8.25
25.29
33.61
35.17
27.14
15.23
11.5
8.27
27.93
27.19
17.35
36.41
40.15
11.35
15.28
17.23
63.07
10.86
35.07
45.87
46.27
51.21
54.75
19.71
19.97
( yi yi )2
2.89
0.71
3.46
0.03
4.49
2.66
8.29
1.77
12.82
0.1
0.07
0.35
3.24
0.1
40.58
0.44
35.16
0.31
0.02
0.52
1.93
8.47
3.53
5.86
1.39
TheC matrixisequalto
206
8,294 1 0.21
0 21 - 0.01
0 01 0
25
(X' X )1 = 206 2,396 77,177 = - 0.01 0 0
8,294 77,177 3,531,848
0
0 0
Thecovariancematrixis
1.33 - 0.06 0
1
cov( ) = 2 (X' X ) = - 0.06 0 0
0
0 0
DrMuhammadAlSalamah,IndustrialEngineering,KFUPM
Fromthecovariancematrix:
V ( ) = 1.33
0
V ( 1 ) 0
V ( ) 0
2
cov( 0 , 1 ) = -0.06
cov( , ) 0
0
cov( 1 , 2 ) 0
DrMuhammadAlSalamah,IndustrialEngineering,KFUPM
2.27
y = [1 4 201]2.74 = 15.24
0.01
DrMuhammadAlSalamah,IndustrialEngineering,KFUPM
Significanceofregression
Thetestforsignificanceofregressionisstatedas
H0 :1 =2 ==k =0
H1 :j 0
foratleastonej
RejectionofH0 impliesthatatleastoneoftheregressor
variablescontributesignificantlytothemodel.
ariables contrib te significantl to the model
Theteststatisticis
SS R / k
MS R
F0 =
=
SS E /(n p ) MS E
ThenullhypothesisH0 shouldberejectedf0 >f,k,n
,k,np
p.
DrMuhammadAlSalamah,IndustrialEngineering,KFUPM
Thecomputationsareorganizedintheanalysisofvariance
table:
Sourceof
variation
Sumof
squares
Degreesof
freedom
Mean
squares
Fo
Regression
SSR
MSR
MSR/MSE
Error
SSE
np
MSE
Total
SST
n1
(
(
y)
y)
=y
= y' y
n
n
(
y)
= ' X' y
2
where
SST
SS R
SS E = SST SS R
DrMuhammadAlSalamah,IndustrialEngineering,KFUPM
Example
Refertothegivendata.
Wewanttotestsignificanceof
regression.
Thetotalsumofsquaresis
SST =27,178.53
27 178 53 725.82
725 822/25=
/25
6105.94
Theregressionsumofsquaresis
g
q
SSR =5990.77
Theerrorsumofsquaresis
SSE =6105.94 5990.77=115.17
Observation
Pullstrength
Wirelength
Dieheight
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Y
9.95
24.45
31 75
31.75
35
25.02
16.86
14.38
9.6
24.35
27.5
17.08
37
41.95
11.66
21.65
17.89
69
10.3
34.93
46.59
44.88
54.12
56.63
22.13
21.15
x1
2
8
11
10
8
4
2
2
9
8
4
11
12
2
4
4
20
1
10
15
15
16
17
6
5
x2
50
110
120
550
295
200
375
52
100
300
412
400
500
360
205
400
600
585
540
250
290
510
590
100
400
DrMuhammadAlSalamah,IndustrialEngineering,KFUPM
Theanalysisofvariancetableis
Sourceof
variation
Sumof
squares
Degreesof
freedom
Mean
squares
fo
Regression
5990.77
2995.39
571.64
Error
115.17
22
5.24
Total
6105.94
24
Since
Sinceff0 =571.64>f
= 571 64 > f0.05,2,22 =3.44,H
= 3 44 H0 shouldberejected.
should be rejected
Theconclusionisthatx1 orx2 orbothexplainthevariabilityin
theresponse.
DrMuhammadAlSalamah,IndustrialEngineering,KFUPM
Coefficientofmultipledetermination
Thecoefficientofmultipledeterminationis
SS R
SS E
R =
= 1
SST
SST
WhenR2 =0.98,itcanbesaidthattheregressionmodel
acco nts for abo t 98% of the ariabilit in the response
accountsforabout98%ofthevariabilityintheresponse.
TheadjustedR2 iscomputedas
SS E /( n p )
2
Radj = 1
SST /(n 1)
2
TheadjustedR2 penalizesforaddingtermstothemodel,soit
guardsagainstoverfitting.
DrMuhammadAlSalamah,IndustrialEngineering,KFUPM
Hypothesesonindividualcoefficients
Totestthepotentialusefulnessofoneoftheregressor
variables,thefollowinghypothesisisconsidered:
H0:j =j0
H1:j j0
Theteststatisticsis
The test statistics is
j j 0
T0 =
2C jj
Thenullhypothesisisrejectedif|t0|>t/2,n
/2,np
p.
DrMuhammadAlSalamah,IndustrialEngineering,KFUPM
Example
Theregressionmodel
Observation
Pullstrength
Wirelength
Dieheight
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Y
9.95
24.45
31 75
31.75
35
25.02
16.86
14.38
9.6
24.35
27.5
17.08
37
41.95
11.66
21.65
17.89
69
10.3
34.93
46.59
44.88
54.12
56.63
22.13
21.15
x1
2
8
11
10
8
4
2
2
9
8
4
11
12
2
4
4
20
1
10
15
15
16
17
6
5
x2
50
110
120
550
295
200
375
52
100
300
412
400
500
360
205
400
600
585
540
250
290
510
590
100
400
DrMuhammadAlSalamah,IndustrialEngineering,KFUPM
DrMuhammadAlSalamah,IndustrialEngineering,KFUPM
Confidenceintervalontheregressioncoefficients
Therandomvariable
j j
T0 =
2C jj
hasatdistributionwithnpdegreesoffreedom.
A
A100(1
100(1 )%confidenceintervalontheregressioncoefficient
)% confidence interval on the regression coefficient
j isgivenby
j t / 2,n p 2C jj j j + t / 2,n p 2C jj
DrMuhammadAlSalamah,IndustrialEngineering,KFUPM
Example
Theregressionmodel
Observation
Pullstrength
Wirelength
Dieheight
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Y
9.95
24.45
31 75
31.75
35
25.02
16.86
14.38
9.6
24.35
27.5
17.08
37
41.95
11.66
21.65
17.89
69
10.3
34.93
46.59
44.88
54.12
56.63
22.13
21.15
x1
2
8
11
10
8
4
2
2
9
8
4
11
12
2
4
4
20
1
10
15
15
16
17
6
5
x2
50
110
120
550
295
200
375
52
100
300
412
400
500
360
205
400
600
585
540
250
290
510
590
100
400
DrMuhammadAlSalamah,IndustrialEngineering,KFUPM
The95%confidenceintervalon1 is
2.74 2.07 6.33(1.7 10 3 ) 1 2.74 + 2.07 6.33(1.7 10 3 )
2.53
DrMuhammadAlSalamah,IndustrialEngineering,KFUPM
2.95
Confidenceintervalonmeanresponse
Definethevectorx0:
1
x
01
x 0 = x02
M
x0 k
Themeanresponseatx0 is
E (Y | x 0 ) = Y |x 0 = x '0
Whichhasanestimateas
Y |x 0 = x '0
DrMuhammadAlSalamah,IndustrialEngineering,KFUPM
Thevarianceofthemeanresponseis
V Y |x 0 = 2 x '0Cx 0
A100(1)%confidenceintervalonthemeanresponseatx0 is
gi en b
givenby
DrMuhammadAlSalamah,IndustrialEngineering,KFUPM
Example
Theregressionmodel
Observation
Pullstrength
Wirelength
Dieheight
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Y
9.95
24.45
31 75
31.75
35
25.02
16.86
14.38
9.6
24.35
27.5
17.08
37
41.95
11.66
21.65
17.89
69
10.3
34.93
46.59
44.88
54.12
56.63
22.13
21.15
x1
2
8
11
10
8
4
2
2
9
8
4
11
12
2
4
4
20
1
10
15
15
16
17
6
5
x2
50
110
120
550
295
200
375
52
100
300
412
400
500
360
205
400
600
585
540
250
290
510
590
100
400
Themeanresponseis
Y |x 0
2.27
= [1 8 275] 2.74 = 26.94
0.01
Thevarianceofthemeanresponseisgivenby
V Y |x 0
0 21 - 0.01
0 01 0 1
0.21
= 6.33[1 8 275] - 0.01 0 0 8
0
0 0 275
= 0.32
DrMuhammadAlSalamah,IndustrialEngineering,KFUPM
The95%confidenceonthemeanresponseis
Y |x 0
DrMuhammadAlSalamah,IndustrialEngineering,KFUPM
28.11
Confidenceintervalonpredictions
Forx'0 =[1,x01,x0k,,x0k],thepointestimatorofthe
observationY0 is
y = x '
0
Thevarianceofthepredictionis
The ariance of the prediction is
V ( y 0 ) = 2 1 + x '0Cx 0
A100(1)%confidenceintervalforthepredictionY0 atx0 is
givenby
DrMuhammadAlSalamah,IndustrialEngineering,KFUPM
Example
Theregressionmodel
Observation
Pullstrength
Wirelength
Dieheight
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Y
9.95
24.45
31 75
31.75
35
25.02
16.86
14.38
9.6
24.35
27.5
17.08
37
41.95
11.66
21.65
17.89
69
10.3
34.93
46.59
44.88
54.12
56.63
22.13
21.15
x1
2
8
11
10
8
4
2
2
9
8
4
11
12
2
4
4
20
1
10
15
15
16
17
6
5
x2
50
110
120
550
295
200
375
52
100
300
412
400
500
360
205
400
600
585
540
250
290
510
590
100
400
Themeanresponseis
2.27
y 0 = [1 8 275] 2.74 = 26.94
0.01
Thevarianceofthemeanresponseisgivenby
0.21 - 0.01 0 1
0
275
= 6.65
DrMuhammadAlSalamah,IndustrialEngineering,KFUPM
The95%confidenceonthepredictionis
26.94 2.07 6.65 Y0 26.94 + 2.07 6.65
21.60
Y0
32.28
DrMuhammadAlSalamah,IndustrialEngineering,KFUPM