Sie sind auf Seite 1von 10

Stat 697R Homework 8

9.6

Forward stepwise regression

Step1 The stepwise regression routine first fits a simple linear regression model for each of the P - I
potential X variables. For each simple linear Regression model, the f* statistic (2.17) for testing
whether or not the slope is zero is obtained:

The X variable with the largest t* value is the candidate for first addition. If this f* value exceeds a
predetermined level, or if the corresponding P-value is less than a predetermined the X variable is
added. Otherwise, the program terminates with no X variable.

Considered sufficiently helpful to enter the regression model. Since the degrees of freedom associated
with MSE vary depending on the number of X variables in the model, and since repeated tests on the
same data are undertaken, fixed t* limits for adding or deleting a variable have no precise probabilistic
meaning. For this reason, software programs often favor the use of predetermined a-limits.

Step 2 Assume X7 is the variable entered at step 1. The stepwise regression routine now fits all
regression models with two X variables, where X7 is one of the pair. For each such regression model,
the t* test statistic corresponding to the newly added predictor Xk is obtained. This is the statistic for
testing whether or not k = 0 when X7 and Xk are the variables in the model. The X variable with the
largest t* value-or equivalently, the smallest P -value-is the candidate for addition at the second stage:
If this t* value exceeds a predetermined level (i.e., the P-value falls below a predetermined level), the
second X variable is added. Otherwise, the program terminates

Step 3. Suppose X3 is added at the second stage. Now the stepwise regression routine examines
whether any of the other X variables already in the model should be dropped. For our illustration, there
is at this stage only one other X variable in the model, X7, so that only one t* test statistic is obtained:
At later stages, there would be a number of these t* ~statistics, one for each of the variables in the
model besides the one last added. The variable for which this t* value is smallest (or equivalently the
variable for which the P-value is largest) is the candidate for deletion. If this t* value falls below-or the
P-value exceeds-a predetermined limit, the variable is dropped from the model; otherwise, it is
retained.

Step 4. Suppose X7 is retained so that both X3 and X7 are now in the model. The stepwise regression
routine now examines which X variable is the next candidate for addition, then examines whether any
of the variables already in the model should now be dropped, and so on until no further X variables can
either be added or deleted, at which point the search terminates.

Note that the stepwise regression algorithm allows an X variable, brought into the model at an earlier
stage, to be dropped subsequently if it is no longer helpful in conjunction with variables added at later
stages.
Forward Selection. The forward selection search procedure is a simplified version of forward stepwise
regression, omitting the test whether a variable once entered into the model should be dropped.

Backward Elimination. The backward elimination search procedure is the opposite of forward

selection. It begins with the model containing all potential X variables and identifies the one with the
largest P-value. If the maximum P-value is greater than a predetermined limit, that X variable is
dropped. The model with the remaining P - 2 X variables is then fitted, and the next candidate for
dropping is identified. This process continues until no further X variables can be dropped.

9.9 a>

Number in
Model

R-Square

Adjusted
R-Square

C(p)

0.6190

0.6103

8.3536

220.5294 X1

0.4155

0.4022

35.2456

240.2137 X3

0.3635

0.3491

42.1123

244.1312 X2

0.6761

0.6610

2.8072

215.0607 X1 X3

0.6550

0.6389

5.5997

217.9676 X1 X2

0.4685

0.4437

30.2471

237.8450 X2 X3

0.6822

0.6595

4.0000

Obs VarsInModel

AIC Variables in Model

216.1850 X1 X2 X3
Press

1 X1

5569.56

2 X2

9254.49

3 X3

8451.43

4 X1 X2

5235.19

5 X1 X3

4902.75

6 X2 X3

8115.91

7 X1 X2 X3

5057.89

According to the notes ideally we need to maximize adjusted R square while


minimize the Cp AIC and PRESS. And luckily we have X1 and X3 meet all the
criteria here and the plot of those also support this pair.

b> As seen in part a the answer is yes, unfortunately this is a very rare thing to
happen. As talked in class
c> Since forward step has advantage when covariates are large, and in our case it is
small So its not advantageous.
9. 10

Analysis of Variance
Source

DF

Sum of
Squares

Mean
Square

F Value

Model

8718.02248

2179.50562

129.74

Pr

>

<.0001

Analysis of Variance
Source

DF

Sum of
Squares

Mean
Square

Error

20

335.97752

16.79888

Corrected Total

24

9054.00000

Root MSE
Dependent Mean
Coeff Var

F Value

Pr

4.09864 R-Square

0.9629

92.20000 Adj R-Sq

0.9555

>

4.44538
Parameter Estimates

Variable

DF

Parameter
Estimate

Standard
Error

Value

Pr

>

Intercept

-124.38182

9.94106

-12.51

<.0001

X1

0.29573

0.04397

6.73

<.0001

X2

0.04829

0.05662

0.85

0.4038

X3

1.30601

0.16409

7.96

<.0001

X4

0.51982

0.13194

3.94

0.0008

Pearson Correlation Coefficients, N = 25


Prob > |r| under H0: Rho=0

proficiency

X1

proficiency

X1

X2

X3

X4

1.00000

0.51441

0.49701

0.89706

0.86939

0.0085

0.0115

<.0001

<.0001

1.00000

0.10227

0.18077

0.32666

0.6267

0.3872

0.1110

1.00000

0.51904

0.39671

0.0078

0.0496

1.00000

0.78204

0.51441
0.0085

X2

X3

X4

0.49701

0.10227

0.0115

0.6267

0.89706

0.18077

0.51904

<.0001

0.3872

0.0078

0.86939

0.32666

0.39671

<.0001
0.78204

1.00000

|t|

Pearson Correlation Coefficients, N = 25


Prob > |r| under H0: Rho=0
proficiency

X1

X2

X3

<.0001

0.1110

0.0496

<.0001

X4

F value tells us that there are some predictor to be kept, X2 has to be dropped according to the P Value
And the X2 X3 are highly related, while X3, X4 are highly related too. We need to consider X3, X4
correlation there after
9.11 a>

Number
in
Model

R- Adjusted
Square R-Square

C(p)

AIC

SBC Variables in
Model

0.8047

0.7962

84.2465 110.4685 112.90629 X3

0.7558

0.7452 110.5974 116.0546 118.49234 X4

0.2646

0.2326 375.3447 143.6180 146.05576 X1

0.2470

0.2143 384.8325 144.2094 146.64717 X2

0.9330

0.9269

17.1130

0.8773

0.8661

47.1540 100.8605 104.51716 X3 X4

0.8153

0.7985

80.5653 111.0812 114.73788 X1 X4

0.8061

0.7884

85.5196 112.2953 115.95191 X2 X3

0.7833

0.7636

97.7978 115.0720 118.72864 X2 X4

0.4642

0.4155 269.7800 137.7025 141.35916 X1 X2

0.9615

0.9560

3.7274

73.8473

78.72282 X1 X3 X4

0.9341

0.9247

18.5215

87.3143

92.18984 X1 X2 X3

0.8790

0.8617

48.2310 102.5093 107.38479 X2 X3 X4

0.8454

0.8233

66.3465 108.6361 113.51157 X1 X2 X4

0.9629

0.9555

5.0000

85.7272

74.9542

89.38384 X1 X3

81.04859 X1 X2 X3 X4

According to the table above the best 4 combinations are (X1 X3), (X1 X3 X4), (X1
X2 X3), (X1 X2 X3 X4). Because they have the largest adjusted R squre.
b> Those more ordered criteria are useful, as we can see in the above table, C(p), AIC
SBC are minimized for X1 X3 X4, So those criteria are useful

9.18 a>
Summary of Stepwise Selection
Step Variable Variable Number Partial Model
Entered Removed
Vars
RRIn Square Square

C(p)

F Pr
Value

>

1 X3

0.8047 0.8047 84.2465 94.78

<.0001

2 X1

0.1283 0.9330 17.1130 42.12

<.0001

3 X4

0.0285 0.9615

0.0007

3.7274 15.59

So according to the table above we say that the best subset is X1 X3 X4

b> We can see from 9.11a that X1 X3 X4 has the largest AdjR^2 so the result agrees
with that criteria.

Code:
option ls = 80 nodate;
title 'probelm 9.9';
data patient;
infile 'C:\Reg\hw9\9.9.txt';
input Y X1 X2 X3;
run;

proc reg data=patient ;


model Y = X1 X2 X3/selection = rsquare adjrsq cp aic mse;
plot adjrsq.*np.;
plot cp.*np.;
plot aic.*np.;
run;

proc reg data = patient outest = tp1 covout noprint;


model Y = X1/ press mse ;

model Y = X2/ press mse ;


model Y = X3/press mse ;
model Y = X1 X2/ press mse;
model Y = X1 X3/press mse ;
model Y = X2 X3/ press mse ;
model Y = X1 X2 X3/ press mse ;
run;
data temp100;
set temp100;
mse1=input(mse, 6.);
run;
data temp120;
set tp1;
if _type_ = 'PARMS';
mse1=input(_mse_, 6.);
run;
proc sql;
create table temp300
as select *
from temp120 , temp100
where temp120.MSE1 = temp100.mse1;
quit;
run;
proc print data=temp300 (rename=(_p_=p _edf_=df _press_=Press));
var varsinmodel Press;
run;

option ls = 80 nodate;
title 'probelm 9.10';
data hw9_910;
infile 'C:\Reg\hw9\9.10.txt';
input proficiency X1 X2 X3 X4;
run;

proc print data =hw9_910;


run;
proc reg data = hw9_910;
model proficiency=X1 X2 X3 X4/selection = stepwise;
run;
proc corr data = hw9_910;

run;

Das könnte Ihnen auch gefallen