Beruflich Dokumente
Kultur Dokumente
Sum of
Squares
A
Mean
Square
D
Error
np1
Total
n1
Source
DF
Model or Regression
F Value
Pr > F
<.0001
D=A/p
[F = MSR / MSE]
%1
(
%'1
1
1
4 100 =
4 100
)*+,-%.
1 0
Interpretation: If 0 is large, then tolerance is very small implies VIF is very large. Hence, kth
variable is correlated to other predictors, if VIF is large. As a rule of thumb, we say there is a
significant Multicollinearity due to a variable if its VIF is larger than 10. (i.e. tolerance is less
than 0.10 implies 0 is larger than 0.90)
Example:
Parameter Estimates
Parameter Standard
Standardized
Variance
Variable DF Estimate
Error t Value Pr > |t|
Estimate Tolerance Inflation
Intercept 1 -8.62347 5.90982 -1.46 0.1785
0
.
0
x2
1
0.09251 0.03912
0.0423
0.24063 0.98511 1.01511
x1
1
3.24751 0.36993
8.78 <.0001
0.89323 0.98511 1.01511
Also note that the model does not suffer from multicollinearity as Variance Inflation (VIF) of
both variables are quiet small (less than 10).
Important Note: In regression, Dependent variable is metric whereas Independent
variable can be any type (Metric / Categorical).
Variable
Product Quality
Complaint
Resolution
Advertising
Sales force Image
Competitive
Pricing
Warranty and
Claims
Total
Standard
Deviation
1.383
1.21
R-Square
/ (1-RSq)
0.4204
0
1.1471
1.1286
1.5813
1.1209
1.047
1.2808
0.3613
0.6034
1.3144
0.0499
0.1436
0.3472
0.0525
0.1677
0.5319
0.8753
0.8774
0.0212
0.0003
0.0003
F
Value
83.24
0
10.39
33.21
105.3
2
0.06
Pr > F
<.0001
0.9591
0.0015
<.0001
<.0001
0.8094
We note that all variables except Complaint Resolution and Warranty & Claims have their
means significantly different in both the groups. That is, there exists a significant deference
in the means of Product Quality in group 1 and group 2. Similarly there exists significant
deference in the means of other variables in the two groups. Hence, Product Quality,
Advertising, Sales force Image, Competitive Pricing can be used to discriminate the two
groups. Note that, we need to drop Complaint Resolution and Warranty & Claims in the
Discriminant model. Also, note that each row in the above table is summary from the
ANOVA table of the respective variable.
Step2: Compute Wilks Lambda to know the model significance in discriminating the
groups.
This is very much similar to ANOVA of a regression model. A small Wilks Lambda indicates
the significance of the Discriminant model. Alternatively, if p-value of Wilks Lambda is very
small, it indicates the significance of the Discriminant model.
Example:
Statistic
Wilks' Lambda
Pr > F
<.0001
In the above table we observe significance of Wilks Lambda. Therefore, the Discriminant
model we developed is useful in discriminating the groups.
Step3: Classification using the Discriminant model.
We obtain values of the dependent variable by supplying the values of the independent
variable as inputs to the Discriminant functions. If we have two groups then we will get two
Discriminant functions. By comparing the two values, we classify the new observation in to
the group that has the highest value of the Discriminant function for the group.
Example:
We have two Discriminant functions for each of the two groups, National Brand and Private
label. A new observation with values on Product Quality, Advertising, Sales force Image and
Competitive Pricing is supplied to the two functions. If the Discriminant function for Private
Label yields a higher classification score as compared to National brand, then classify the
new observation into Private Label.
Table 4: Linear Discriminant Function for Brand Shoppers
Variable
National Brand
Private Label
Constant
-75.74192
-79.47109
Product Quality (P)
6.56466
5.54115
Advertising (A)
1.27597
1.30033
Sales force Image (S)
1.15046
2.10374
Competitive Pricing (C)
5.37316
6.37743
678
9
and
Total
81
119
n = 200
:;
X2
X3
X4
X5
X6
X7
X8
X9
X10
X11
X12
X13
0.712
0.748
0.511
0.588
0.484
0.725
0.512
0.473
0.730
0.645
0.758
0.429
0.414
As the overall KMO = 0.64 is greater than 0.6, one can proceed with Factor Analysis.
However, variables X5, X8, X12 and X13 have poor Kaisers measure. If we are using 5
factors then there should be at least 15 variables as we need a minimum of 3 variables for
each factor. If we have too few variables satisfying Kaiser Criterion, we need to increase the
sample size or include more variables.
Step2: Interpretation of Communalities / variance explained by factors
'Communalities' tell us how much of the variance in each of the original variables is
explained by the extracted factors. Higher communalities are desirable. If the communality
for a variable is less than 50%, it is a candidate for exclusion from the analysis because the
factor solution contains less that half of the variance in the original variable, and the
explanatory power of that variable might be better represented by the individual variable. If
we did exclude a variable for a low communality (less than 0. 50), we should re-run the
factor analysis without that variable before proceeding.
Example:
Variance Explained by Each Factor
Factor1
Factor2
Factor3
Factor4
Factor5
3.247
2.103
1.643
1.185
1.103
The above values, 3.246307, 2.103337, 1.644186, 1.185304 and 1.102832, are also the first
five Eigen values extracted by principal component method. Note that mathematically the
number of factors equals to number of variables (In this example we have 13 variables),
however we analyse few factors decided by the criterion of minimum Eigen value to be
larger than 1 or by the Scree plot.
In the above table, variance explained by the first factor is 24.97% (=
variance explained by the five factors is 71.4% (=
<.>:
<
<.>:7.<7.;><7.?@7.<
<
100$. This
can be obtained from the following table also. Note that the cumulative value in row 5 of
the table below is 0.714 or 71.4% and the variance explained by first factor is 25% as the
proportion in first row is 0.25 which close to the value we computed above, 24.97%.
Difference
Proportion
Cumulative
3.247
1.144
0.250
0.250
2.103
0.460
0.162
0.411
1.643
0.458
0.126
0.538
1.185
0.0828
0.091
0.629
1.103
0.252
0.085
0.714
0.850
0.140
0.065
0.779
0.711
0.149
0.055
0.834
0.562
0.067
0.043
0.877
0.495
0.088
0.038
0.915
10
0.407
0.104
0.031
0.947
11
0.303
0.082
0.023
0.970
12
0.220
0.049
0.017
0.987
13
0.171
0.013
1.000
From the table below on initial factor solution, we note that the communality of variable X1
(i.e. the total variance captured by the five factors in variable X1) is 0.860643 or 86% (the
sum of squares of the first row, i.e. against variable X1, in the below table). Further if we
add the communalities of all the variables we get 9.281 (i.e. 0.860643 + )
Also note that each entry in the below table represents the factor loadings, which is also the
correlation between a variable and a factor. For example, as the first four variables are
highly correlated with factor1 (0.888, 0.788, 0.774, and 0.770), they are affected strongly by
factor1. Also note that the correlation between any two factors is always zero (As we extract
orthogonal factors the other method being Oblique).
Factor Pattern
Factor1
Factor2
Factor3
Factor4
Factor5
X1
0.888
-0.105
-0.203
-0.037
0.136
X2
0.788
0.156
0.140
0.019
-0.204
X3
0.774
0.218
0.077
0.192
-0.139
X4
0.770
-0.136
-0.312
-0.035
-0.098
X5
0.581
-0.469
0.210
-0.056
0.418
X6
-0.101
0.837
-0.001
-0.281
-0.023
X7
-0.049
0.600
0.500
0.298
-0.064
X8
0.256
0.536
-0.528
0.336
-0.122
X9
0.402
0.418
0.392
-0.103
0.028
X10
0.141
-0.041
0.625
0.293
0.403
X11
-0.003
0.235
-0.571
0.014
0.472
X12
0.119
0.360
0.067
-0.753
0.348
X13
-0.183
0.270
-0.180
0.445
0.559
Note the cross loadings of variables on different factors. For example, variable X5 has
loadings on Factor1, Factor2 and Factor5, while variable X13 has high loadings on factor4
and factor5. This leads us to a dilemma on the right group of variables under a factor. This
can be resolved to a great extent by factor rotation.
Step3: Factor Rotation
The idea of rotation is to reduce the number factors on which the variables under
investigation have high loadings (i.e. Cross loadings). Rotation does not actually change
anything but makes the interpretation of the analysis easier. In other words, rotation helps
us to classify each variable under a factor with much ease.
For a better idea on classifying the variables under each factor, compare the two tables
Factor Pattern given above and Rotated Factor Pattern given below.
Factor2
Factor3
Factor4
Factor5
X1
0.875
-0.192
0.191
0.107
0.096
X2
0.803
-0.271
-0.022
-0.007
-0.012
X3
0.784
0.301
0.005
-0.021
-0.050
X4
0.781
0.243
0.025
-0.188
0.054
X5
-0.068
0.834
-0.057
0.025
-0.002
X6
0.343
0.510
0.096
-0.097
0.325
X7
0.456
-0.210
0.725
-0.014
0.056
X8
0.008
0.455
0.665
0.069
-0.081
X9
0.402
0.162
-0.578
0.474
-0.112
X10
-0.170
0.183
0.097
0.753
-0.102
X11
0.048
-0.231
-0.148
0.694
0.210
X12
0.034
0.004
0.050
0.021
0.912
X13
-0.069
0.467
-0.477
0.150
0.564
Based on the initial factor pattern and rotated factor pattern, we list the variables for each
factor as follows:
Based on Factor Pattern table
F2: X5, X6
F4: X12
F5: X13
Observe the loadings of variable X5 in the initial factor pattern and rotated factor pattern to
have good idea on cross loadings.
Also note that the total variance explained by the five factors remains same at 71.4% (i.e.
(3.164 +1.798 + + 1.347) x 100 / 13). Also note the diminishing importance of the factors
given in the table below.
Factor2
Factor3
Factor4
Factor5
3.164
1.798
1.613
1.359
1.347
X2
X3
X4
X5
X6
X7
X8
X9
X10
X11
X12
X13
0.860
0.708
0.790
0.649
0.759
0.708
0.704
0.604
0.502
0.780
0.720
0.661
0.836
Observe from the table on final Communality estimates that the total communality remains
at 9.281 (0.860 + 0.708 + + 0.836). Compare the communalities obtained for the loadings
given in the earlier table on Factor Pattern.
Clusters Joined
OB2
OB3
CL19
OB11
OB7
OB8
OB13
OB17
OB14
OB15
CL17
OB9
OB19
OB20
CL18
OB5
OB16
OB18
CL16
CL15
CL12
OB4
OB6
CL14
CL10
CL11
CL7
CL13
OB1
OB12
CL9
CL8
CL5
OB10
CL4
CL6
CL3
CL2
FREQ SPRSQ
RSQ
Tie
2
0.0074 0.993 T
3
0.0091 0.984
2
0.0099 0.974 T
2
0.0099 0.964
2
0.0148 0.949
3
0.0165 0.932
2
0.0173 0.915
4
0.0193 0.896
2
0.0198 0.876
4
0.0247 0.851
5
0.0274 0.824
4
0.028 0.796
6
0.0461
0.75
8
0.0533 0.697
2
0.0568
0.64 T
9
0.0603
0.58
3
0.1309 0.449
17
0.1593 0.289
20
0.2893
0
Thus, the SPRSQ value should be small to imply that we are merging two homogeneous
groups.
The number of cluster is identified by reading the values of SPRSQ. Intuitively, SPRSQ jumps
to a high value if we are combining two or more heterogeneous groups.
groups. Therefore, we need
to observe the jumps in SPRSQ column. We notice jumps at no. of clusters (NCL) Seven
(from 0.028 to 0.0461) and Three (from
(
0.0603 to 0.1309) and One (from 0.1593 to 0.2893).
Therefore, we have two choices for the number clusters Three Clusters or Seven Clusters.
Ideally we group the observations into 3 or 4 clusters; we go with clustering of the data into
THREE Clusterss in this example. This can be observed in the Dendrogram given below.
Reading from the left of the above Dendrogram, we list the observations in each cluster as
follows.
Cluster No.
1
2
3
Observations
OB13,OB17, OB14, OB15, OB16, OB18, OB19, and OB20
OB2, OB3, OB11, OB5, OB4, OB6, OB7, OB8 and OB9
OB1, OB12 and OB10
S: Standard Model
D: Delux Model
Max 63E + 95S + 135D
S.t.
1E + 1S + 1D 200
1E + 2S + 4D 320
8E + 12S + 14 D 2400
E, S, D 0
Fan Motors
Cooling coils
Manufacturing Time
Adjustable Cells
Cell
$B$8
$C$8
$D$8
Name
No. of AirConditioners Economy Model
No. of AirConditioners Standard Model
No. of AirConditioners Delux Model
Final
Value
80
120
0
Reduced
Cost
0
0
-24
Objective
Coefficient
63
95
135
Allowable
Increase
12
31
24
Allowable
Decrease
15.5
8
1E+30
Name
Fan Motors Quantity Used
Cooling coils Quantity Used
Manufacturing time Quantity Used
Final
Value
200
320
2080
Shadow
Price
31
32
0
Constraint
R.H. Side
200
320
2400
Allowable
Increase
80
80
1E+30
Allowable
Decrease
40
120
320
Constraints
Cell
$B$11
$B$12
$B$13
A) Current Optimal Solution to the problem is in the column Final Value: #Economy models = 80, #Standard models = 120, and #Delux
models = 0.
B) Maximum Profit = Sumproduct(Final Value, Objective Coefficient) = $16440
C) Note that the decision variable $D$8 is not used in the optimal solution, because its Final Value = 0.
D) The current solution is optimal as long as the objective coefficients are in the range of: (63-15.5 = 47.5) E (63+12 = 75); 87 S 126;
D 159. Note that there is no lower limit for Obj. Coefficient of Delux model as #Delux models = 0 in final solution. The allowable
changes in a coefficient are valid provided all other coefficients remain fixed at their current values and the presence of a 0 in any
Allowable Increase or Decrease indicates that alternative optimal solutions exist.
E) The current solution will be still optimal for these combinations of obj. coefficients: (48, 95, 135), (70, 95, 135), (63, 100, 135) or (63, 95,
150). Note that there is a change in one obj. coefficient at a time satisfying the ranges given above.
F) If we wish to have simultaneous changes the obj. coefficients, we use 100% rule. For example consider the obj. coefficients (70, 100,
135). The % change in obj. coefficient of E is 58.33% [= (70-63) * 100 /(Allowable increase =12)] and % change in obj. coefficient of S is
16.13%
[=(100-95)*100/Allowable increase = 31)], implying a total change of 74.46% (=58.33% + 16.13%) which is less than 100%. Similarly you
can try simultaneous decrease/increase in obj. coefficients subject to a total change less than 100%.
G) The Reduced Cost tells you by how much the profit margin of this variable would have to improve for it to be optimal to use that
variable. Here it is $24/unit.
H) Reduced Cost: if the Objective Coefficient of decision variable $D$8 improved to $159 [= 135 ( 24 )], then this variable would be
included in the optimal solution, i.e. its Final Value would be > 0. Also note that, you do not have a choice of decreasing its profit per
unit as it will not impact its inclusion in final solution. Note that a negative reduced cost means increasing of the value as minus of
minus is plus.
I) Shadow Price: measures by how much the optimal objective value (here, total profit) would change if the Constraint Right Hand Side
changed by one unit. For example, if the number of Fan Motors is increased to 201, then the profit will be increased to $16471 (=
16440 + 31). Similarly, if the number of Cooling coils available is decreased to 300, then the total profit would go down by 32x20 = 640.