Polynomial Regression Models: Possible Models For When The Response Function Is "Curved"

Polynomial regression models
Possible models for when the response

function is curved
Uses of polynomial models

When the true response function really is a
polynomial function.
(Very common!) When the true response
function is unknown or complex, but a
polynomial function approximates the true
function well.
Example
What is impact of exercise on human
immune system?
Is amount of immunoglobin in blood (y)
related to maximal oxygen uptake (x) (in a
curved manner)?
Maximal oxygen uptake (ml/kg)
Scatter plot
2000
1500
1000
30
40
50
Immunoglobin (mg)
60
70
A quadratic polynomial regression

function
Yi 0 1 X i 11 X i
2
i
where:
Yi = amount of immunoglobin in blood (mg)
Xi = maximal oxygen uptake (ml/kg)
typical assumptions about error terms (INE)
Estimated quadratic function

Regression Plot
igg = -1464.40 + 88.3071 oxygen - 0.536247 oxygen**2
S = 106.427
R-Sq = 93.8 %
R-Sq(adj) = 93.3 %
igg
2000
1500
1000
30
40
50
oxygen
60
70
Interpretation of the regression

coefficients
If 0 is a possible x value, then b0 is the
predicted response. Otherwise, interpretation
of b0 is meaningless.
b1 does not have a very helpful interpretation.
It is the slope of the tangent line at x = 0.
b2 indicates the up/down direction of curve
b2 < 0 means curve is concave down
b2 > 0 means curve is concave up
The regression equation is

igg = - 1464 + 88.3 oxygen - 0.536 oxygensq
Predictor
Coef
Constant -1464.4
oxygen
88.31
oxygensq -0.5362
S = 106.4
SE Coef
411.4
16.47
0.1582
R-Sq = 93.8%
T
-3.56
5.36
-3.39
P
0.001
0.000
0.002
VIF
99.9
99.9
R-Sq(adj) = 93.3%
Analysis of Variance
Source
Regression
Residual Error
Total
Source
oxygen
oxygensq
DF
1
1
DF
2
27
29
SS
4602211
305818
4908029
Seq SS
4472047
130164
MS
2301105
11327
F
203.16
P
0.000
A multicollinearity problem
5000
oxygensq
4000
3000
2000
1000
30
40
50
60
70
oxygen
Pearson correlation of oxygen and oxygensq = 0.995
Center the predictors

Mean of oxygen = 50.637
oxygen
34.6
45.0
62.3
58.9
42.5
44.3
67.9
58.5
35.6
49.6
33.0
oxcent
-16.037
-5.637
11.663
8.263
-8.137
-6.337
17.263
7.863
-15.037
-1.037
-17.637
oxcentsq
257.185
31.776
136.026
68.277
66.211
40.158
298.011
61.827
226.111
1.075
311.064
OxCent Oxygen 50.637

OxCentSq Oxygen 50.637
Does it really work?

400
oxcentsq
300
200
100
0
-20
-10
10
20
oxcent
Pearson correlation of oxcent and oxcentsq = 0.219
A better quadratic polynomial

regression function
Yi x x i
*
0
where
xi X i X
*
1 i
* 2
11 i
denotes the centered predictor, and
*0 = mean response at the predictor mean

*1 = linear effect coefficient
*11 = quadratic effect coefficient

igg = 1632 + 34.0 oxcent - 0.536 oxcentsq
Predictor
Constant
oxcent
oxcentsq
Coef
1632.20
34.000
-0.5362
S = 106.4
SE Coef
29.35
1.689
0.1582
R-Sq = 93.8%
T
55.61
20.13
-3.39
P
0.000
0.000
0.002
VIF
1.1
1.1
R-Sq(adj) = 93.3%
Analysis of Variance
Source
Regression
Residual Error
Total
Source
oxcent
oxcentsq
DF
1
1
DF
2
27
29
SS
4602211
305818
4908029
Seq SS
4472047
130164
MS
2301105
11327
F
203.16
P
0.000
Interpretation of the regression

coefficients
b0 is predicted response at the predictor mean.
b1 is the estimated slope of the tangent line at
the predictor mean; and, typically, also the
estimated slope in the simple model.
b2 indicates the up/down direction of curve
b2 < 0 means curve is concave down
b2 > 0 means curve is concave up
Estimated regression function

Regression Plot
igg = 1632.20 + 33.9995 oxcent - 0.536247 oxcent**2
S = 106.427
R-Sq = 93.8 %
R-Sq(adj) = 93.3 %
igg
2000
1500
1000
-20
-10
oxcent
10
20
Similar estimates
Regression Plot
igg = 1557.63 + 32.7427 oxcent
S = 124.783
R-Sq = 91.1 %
R-Sq(adj) = 90.8 %
igg
2000
1500
1000
-20
-10
oxcent
10
20
The relationship between the two

forms of the model
Original model:
Yi b0 b1 X i b11 X i
Centered model:
*
*
* 2
Yi b0 b1 xi b11 xi
Where:
b0 b0* b1* X b11* X 2

b1 b1* 2b11* X
b11 b
*
11
Yi 1632.2 34.0 xi 0.5362 xi

Mean of oxygen = 50.637
b0 1632.2 34(50.637) 0.5362(50.637) 2 1464.3

b1 34 2(.5362)(50.637) 88.3
b11 0.5362
Yi 1464.4 88.31X i 0.536 X i
Residuals Versus the Fitted Values

(response is igg)
200
Residual
100
-100
-200
1000
1500
Fitted Value
2000
Normal Probability Plot of the Residuals

(response is igg)
2
Normal Score
-1
-2
-200
-100
Residual
100
200
What is predicted IgG if maximal

oxygen uptake is 90?
Predicted Values for New Observations
New Obs Fit
SE Fit
95.0% CI
95.0% PI
1
2139.6
219.2 (1689.8,2589.5) (1639.6,2639.7) XX
X denotes a row with X values away from the center
XX denotes a row with very extreme X values
Values of Predictors for New Observations
New Obs
1
oxcent
39.4
oxcentsq
1549
There is an even greater danger in extrapolation when modeling

data with a polynomial function, because of changes in direction.
It is possible to overfit the data

with polynomial models.
Regression Plot
y = -38.4 + 34.9762 x
- 8.64286 x**2 + 0.666667 x**3
S = 2.62950
R-Sq = 64.0 %
R-Sq(adj) = 0.0 %
2
2
It is even theoretically possible to fit

the data perfectly.
If you have n data points, then a polynomial of order n-1
will fit the data perfectly, that is, it will pass through each data
point.
But, good statistical software will keep an unsuspecting user
from fitting such a model.
** Error ** Not enough non-missing observations
to fit a polynomial of this order; execution
aborted
The hierarchical approach

to model fitting
Widely accepted approach is to fit a higher-order model and then
explore whether a lower-order (simpler) model is adequate.
Yi 0 1 x i 11 x 111 x i
2
i
Is a first-order linear model (line) adequate?
H 0 : 11 111 0
3
i
The hierarchical approach

to model fitting
But then if a polynomial term of a given order is
retained, then all related lower-order terms are also retained.
That is, if a quadratic term was significant, you would use
this regression function:
2
i
0
1 i
11 i
E Y x x
and not this one:
E Yi 0 11 x
2
i
Example
Quality of a product (y) a score between
0 and 100
Temperature (x1) degrees Fahrenheit
Pressure (x2) pounds per square inch
82.725
quality
53.375
95
temp
85
57.5
pressure
52.5
A two-predictor, second-order
polynomial regression function
Yi 0 1 X i1 2 X i 2 11 X i21 22 X i22 12 X i1 X i 2 i
where:
Yi = quality
Xi1 = temperature
Xi2 = pressure
12 = interaction effect coefficient

quality = - 5128 + 31.1 temp + 140 pressure
- 0.133 tempsq - 1.14 presssq
- 0.145 tp
Predictor
Coef
Constant
-5127.9
temp
31.096
pressure
139.747
tempsq
-0.133389
Press
-1.14422
tp
-0.145500
S = 1.679
SE Coef
110.3
1.344
3.140
0.006853
0.02741
0.009692
R-Sq = 99.3%
T
-46.49
23.13
44.50
-19.46
-41.74
-15.01
P
0.000
0.000
0.000
0.000
0.000
0.000
R-Sq(adj) = 99.1%
VIF
1154.5
1574.5
973.0
1453.0
304.0
Again, some correlation

quality
temp
-0.423
pressure 0.182
tempsq
-0.434
presssq
0.162
tp
-0.227
temp pressure
0.000
0.999
0.000
0.773
0.000
1.000
0.632
Cell Contents: Pearson correlation
tempsq
presssq
-0.000
0.772
0.632
A better two-predictor, second-order

polynomial regression function
*
Yi 0* 1* xi1 2* xi 2 11* xi21 22
xi22 12* xi1 xi 2 i
where:
Yi = quality
xi1 = centered temperature
xi2 = centered pressure
*12 = interaction effect coefficient
Reduced correlation
quality
tcent
-0.423
pcent
0.182
tpcent
-0.274
tcentsq -0.355
pcentsq -0.762
tcent
pcent
tpcent
tcentsq
0.000
0.000
-0.000
0.000
0.000
0.000
0.000
0.000
0.000
-0.000
Cell Contents: Pearson correlation

quality = 94.9 - 0.916 tcent + 0.788 pcent
- 0.146 tpcent - 0.133 tcentsq
- 1.14 pcentsq
Predictor
Coef
Constant
94.9259
tcent
-0.91611
pcent
0.78778
tpcent
-0.145500
tcentsq -0.133389
pcentsq
-1.14422
S = 1.679
SE Coef
0.7224
0.03957
0.07913
0.009692
0.006853
0.02741
R-Sq = 99.3%
T
131.40
-23.15
9.95
-15.01
-19.46
-41.74
P
0.000
0.000
0.000
0.000
0.000
0.000
R-Sq(adj) = 99.1%
VIF
1.0
1.0
1.0
1.0
1.0
Residuals Versus the Fitted Values

(response is quality)
3
2
Residual
1
0
-1
-2
-3
40
50
60
70
Fitted Value
80
90
100
Normal Probability Plot of the Residuals

(response is quality)
2
Normal Score
-1
-2
-3
-2
-1
Residual
Predicted Values for New Observations

New Obs Fit
1
94.926
SE Fit
95.0% CI
0.722 (93.424,96.428)
95.0% PI
(91.125,98.726)
Values of Predictors for New Observations

New Obs
1
tcent
0.0000
pcent
0.0000
tpcent
0.0000
tcentsq
0.0000
pcentsq
0.0000

Polynomial Regression Models: Possible Models For When The Response Function Is "Curved"

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Polynomial Regression Models: Possible Models For When The Response Function Is "Curved"

Hochgeladen von

Copyright:

Verfügbare Formate

Polynomial regression models

Possible models for when the response

Uses of polynomial models

Maximal oxygen uptake (ml/kg)

A quadratic polynomial regression

Estimated quadratic function

Interpretation of the regression

The regression equation is

Pearson correlation of oxygen and oxygensq = 0.995

Center the predictors

OxCent Oxygen 50.637

Does it really work?

Pearson correlation of oxcent and oxcentsq = 0.219

A better quadratic polynomial

denotes the centered predictor, and

*0 = mean response at the predictor mean

The regression equation is

Interpretation of the regression

Estimated regression function

The relationship between the two

b0 b0* b1* X b11* X 2

Yi 1632.2 34.0 xi 0.5362 xi

b0 1632.2 34(50.637) 0.5362(50.637) 2 1464.3

Yi 1464.4 88.31X i 0.536 X i

Residuals Versus the Fitted Values

Normal Probability Plot of the Residuals

What is predicted IgG if maximal

There is an even greater danger in extrapolation when modeling

It is possible to overfit the data

It is even theoretically possible to fit

The hierarchical approach

Is a first-order linear model (line) adequate?

The hierarchical approach

and not this one:

The regression equation is

Again, some correlation

Cell Contents: Pearson correlation

A better two-predictor, second-order

Cell Contents: Pearson correlation

The regression equation is

Residuals Versus the Fitted Values

Normal Probability Plot of the Residuals

Predicted Values for New Observations

Values of Predictors for New Observations

Das könnte Ihnen auch gefallen