Beruflich Dokumente
Kultur Dokumente
1.
Introduction
A problem that must be considered in
almost all multiple regression analyses is that
of multicollinearity among the regressor variables. Many authors even suggest than an
examination for the existence of multicollinearity should be routinely performed as an
initial step in regression analysis (cof.
Mansfield and Helms. 1982). Simply determining
the existence of multicollinearity often is not
enough to obtain effective remedies.' The
nature of the multicollinearity must often be
closely examined.
ing situation where the experiment~r has limited training in statistics, usually requires an
indepth exploration of the relationship between
regressor variables. This exploration is
easily accomplished through various procedures
in SAS, especially PROC MATRIX. This paper is
primarily concerned with the exploration techniques which can be used to detect the nature
Of the multicollinearity using SAS. The general objective is tutorial -and several examples
are presented that the author has found effective in classroom situations.
2.
of Multicollinearity
The presence of multicollinearity is first
felt (sometimes to the surprise of the investigator) when examining the statistics concerning
the regression coefficient estimates. Since
the least square estimates have inflated variances in the presence of multicollinearity,
some unusual results can occur. For example,
a signifi~ant overall regreSSion equation with
a large R may have none of the individual
coefficients significant. Many numerical
examples exist in the literature, one is a four
variable problem given by A. Hald on page 647
of his book Statistical Theory with Engineering
Applications published by Wiley, New York in
1952 and used extensively in Draper and Smith
(1981). This problem will be used throughout
this portion of the paper, referred to as
the Hald data, and is included in section 3.
The paradox discussed above is apparent from
the PROC REG output in Table I.
~tection
In addition to those options already discussed in PROC REG, the COllIN option also
displays the proportion of the variance of the
estimate accounted for by each principal component. This not only aids in detecting multicollinearity, but helps if a principal component regression is to be performed. A multicollinearity problem exists when a component
associated with a high condition index contributes strongly to the variance of two or more
variables. All the statistics discussed above
717
Exampl e 2:
OSS
11
(3)
12
13
Example 3:
(5)
reparameterization or centering of
the model.
(6)
New York.
Freund, R. J. and Minton, P. D., 1979,
Regression Methods, Marcel Dekker, Inc .
New York.
That
pp 249-260.
OSS
1
2
3
4
5
6
7
8
Xl
4
4
4
4
6
6
6
6
Y
42
39
48
51
49
53
61
60
X2
2
2
3
3
2
2
3
3
OSS
1
2
3
4
5
6
7
8
9
10
XI
7
1
11
11
7
11
3
1
2
21
Hald Data
X2
26
29
56
31
52
55
71
31
54
47
X3
6
15
8
8
6
9
17
22
18
4
Example 1:
( continued)
X3
Y
X4
23
34
83.8
9
12 113.3
8
12 109.4
4. References:
Belsley, D. A., Kuh, E., and Welsch, R. E.,
1980, Regression Diagnostics, John Wiley
and Sons, Inc., New York.
Examples
Hald Oata
X1
40
66
68
(4)
XI
1
11
10
X4
60
52
20
47
33
22
6
44
22
26
Y
78.5
74.3
104.3
87.6
95.9
109.2
102.7
72.5
93.1
115.9
718
--
DEP VARIABLE:
SOURCE
OF
SUM OF
SQUARES
MEAN
SQUARE
F VALUE
PROB>F
MODEL
ERROR
C TOTAL
4
8
12
2667.899
47.863639
2715.763
666.975
5.982955
111.479
0.0001
2.446008
95.423077
2.56333
R-SQUARE
ADJ R-SQ
0.9824
0.9736
PARAMETER
ESTIMATE
STANDARD
ERROR
ROOT MSE
DEP MEAN
C.V.
VARIABLE
OF
INTERCEP
1
1
1
1
1
Xl
X2
X3
X4
62.405369
1.551103
0.510168
0.101909
-0.144061
70.070959
0.744770
0.723788
0.754709
0.709052
0.891
2.083
0.705
0.135
-0.203
0.3991
0.0708
0.5009
0.8959
0.8441
N = 13
Xl
X2
X3
X4
Xl
1.00000
0.0000
X2
0.22858
0.4526
0.22858
0.4526
1.00000
0.0000
-0.82413
0.0005
-0.13924
0.6501
-0.24545
0.4189
-0.97295
0.0001
X3
-0.82413
0.0005
-0.13924
0.6501
1.00000
0.0000
0.02954
0.9237
X4
-0.24545
0.4189
-0.97295
0.0001
0.02954
0.9237
],00000
0.0000
CORRELATION OF ESTIMATES
CORRB
INTERCEP
Xl
X2
X3
X4
INTERCEP
Xl
X2
X3
X4
1.0000
-0.9678
-0.9978
-0.9769
-0.9983
-0.9678
1.0000
0.9510
0.9861
0.9568
-0.9978
0.9510
1.0000
0.9624
0.9979
-0.9769
0.9861
0.9624
1.0000
0.9659
-0.9983
0.9568
0.9979
0.9659
1.0000
TABLE II
Correlations for the Ha1d Data
f
r
PROB > I T I
TABLE I
PROC REG Output for Ha1d Data
,,
'
T FOR HO:
PARAMETER=O
719
= 0.00106766
(a)
ICI
(b)
Eigenvalue
Eigenvector
Xl
.475955
.508979
-.6755
.241052
2.2357
1.57607
0.186606
0.00162375
X2
.56387
-.413931
.31442
.641756
X3
- .394067
- .604969
- .637691
.268466
(c)
(d)
(e)
ICI~
.0326751
from PROC REG
VARIANCE PROPORTIONS
PORTION
PORTION
INTERCEP
Xl
0.0000
0.0004
0.0000
0.0100
0.0006
0.0000
0.0574
0.0001
0.9999
0.9316
PORTION
X2
0.0000
0.0000
0.0003
0.0028
0.9969
PORTION
X3
0.0002
0.0027
0.0016
0.0457
0.9498
PORTION
X4
0.0000
0.0001
0.0017
0.0009
0.9973
TABLE III
Multicollinearity Statistics for the Hald Data
720
X4
-.547931
.451235
.195421
.676734
PROC REG:
DEP VARIABLE:
OF
SUM OF
SQUARES
MEAN
SQUARE
F VALUE
PROB>F
2
5
7
402.250
17 .625000
419.875
201.125
3.525000
57.057
0.0004
ROOT MSE
DEP MEAN
C. V.
1.877498
50.375000
3.727044
R-SQUARE
ADJ R-SQ
0.9580
0.9412
SOURCE
MODEL
ERROR
C TOTAL
VARIABLE
OF
PARAMETER
ESTIMATE
STANDARD
ERROR
T FOR HO:
PARAMETER-O
INTERCEP
Xl
X2
I
I
I
0.375000
5.375000
9.250000
4.740451
0.663796
1. 327592
0.079
8.097
6.968
ITI
TOLERANCE
VARIANCE
INFLATION
0.9400
0.0005
0.0009
1.000000
1.000000
0.000000
1.000000
1.000000
PROB
>
1.0000
Icl
Eigenvector
Eigenvalues
X2
0
1
Xl
1
0
VARIANCE PROPORTIONS
COLLINEARITY DIAGNOSTICS
NUMBER EIGENVALUE
CONOITION
INDEX
PORTION
INTERCEP
PORTION
Xl
PORTION
X2
2.948
0.038462
0.013044
1.000
8.756
15.034
0.0022
0.0000
0.9978
0.0043
0.5000
0.4957
0.0043
0.5000
0.4957
2
3
Icl'-1.000
TABLE IV
721
PROC REG:
DEP VARIABLE:
SOURCE
OF
SUM OF
SQUARES
MEAN
SQUARE
F VALUE
PROB>F
MODEL
ERROR
C TOTAL
3
6
9
6336.529
3.137541
6339.666
2112.176
0.522924
4039.168
0.0001
ROOT MSE
OEP ~EAN
C.V.
0.723135
4.063061
17.79777
R-SQUARE
AOJ R-SQ
0.9995
0.9993
VARIABLE OF
PARAMETER
ESTIMATE
STANOARD
ERROR
T FOR HO:
PARAMETER=O
TOLERANCE
VARIANCE
INFLATION
INTERCEP
Xl
X2
X3
9.525137
4.591704
-3.042167
3.916058
0.2B0618
1.192333
0.292608
0.158279
33.943
3.B51
-10.397
24.742
0.0001
0.0084
0.0001
0.0001
0.059089
0.094653
0.061284
0.000000
16.923739
10.564858
16.317445
1
1
1
1
C=
Xl
X2
X3
1.00000
0.0000
-0.40291
0.2483
-0.67650
0.0317
-0.40291
0.2483
1.00000
0.0000
-0.36223
0.3037
-0.67650
0.0317
-0.36223
0.3037
1.00000
0.0000
ICI
.0513354
Eigenvalues
1.67829
1.29815
0.0235627
Eigenvectors
Xl
X2
X3
0.723058
-0.0620897
-0.687991
-0.295539
0.872397
-0.389334
-0.624375
-0.484839
-0.612444
COLLINEARITY DIAGNOSTICS
NUMBER EIGENVALUE
1
2
3
4
2.285
1.170
0.526483
0.018175
VARIANCE PROPORTIONS
CONDITION
INDEX
PORTION
INTERCEP
PORTION
Xl
PORTION
X2
PORTION
X3
1.000
1.397
2.083
11.212
0.0776
0.0035
0.9166
0.0023
0.0069
0.0000
0.0115
0.9815
0.0034
0.0468
0.0104
0.9394
0.0054
0.0151
0.0055
0.9741
ICI~ = .226573
TABLE V
722