Beruflich Dokumente
Kultur Dokumente
1x
Prof U Dinesh Kumar, IIMB
1.
2.
3.
4.
5.
Evaluate Model
6.
Population slopes
Random
error
Yi 0 1 X 1i 2 X 2i k X ki i
Dependent
(response)
variable
Independent
(explanatory)
variables
All Rights Reserved, Indian Institute of Management Bangalore
Prediction Model
Y 0 1 X 1i 2 X 2i k X ki
Y
Yi = Treatment Cost
Response
Plane
X2i = HR Pulse
X1
Yi = 0 + 1X1i + 2X2i + i
(Observed Y)
X2
(X1i,X2i)
YX = 0 + 1X1i + 2X2i
All Rights Reserved, Indian Institute of Management Bangalore
Y 0 1 x1 ... k xk
y 0 1 x1 ... k xk
SSE yi 0 j xij
i 1
i 1
j 1
2
i
n
k
SSE
2 yi 0 j xij 0
0
i 1
j 1
and
n
k
SSE
2 yi 0 j xij xij 0 j
j
i 1
j 1
DAD Example
MODEL
Interpretation of Coefficients
Interpretation of Coefficients
The beta weights are the regression coefficients for standardized data.
Standardized Beta is the average increase in the dependent variable when the
independent variable is increased by one standard deviation and other
independent variables are held constant.
Beta Weights
Sx
Standardized Beta i
Sy
Sx Standard deviation of X
Sy Standard deviation of Y
Partial Correlation
Partial correlation coefficient measures the relationship between two
variables (say Y and X1) when the influence of all other variables (say X2,
X3,, Xk) connected with these two variables (Y and X1) are removed.
Semi-partial (or part correlation) coefficient measures the relationship between two
variables (say Y and X1) when the influence of all other variables (say X2, X3, , Xk)
connected with these two variables (Y and X1) are removed from one of the variables
(X1).
Only the predictor is residualized. The denominator of the coefficient (the total variance
of the response variable, Y) remains the same no matter which predictor is being
examined.
Variance in Y = A + B + C + D
Variance in Y explained by the model = A + B + C
B
C
F
X1
IMPORTANT
G
X2
Partial Correlation
r12,3
r13, 2
r23,1
Semi-partial (or part) correlation sr12,3 is the correlation between y1 and x2 when
influence of x3 is partialled out of x2 (not on y1).
sr12,3
2
23
Part Correlation
Partial Correlation
(1 r )
2
13
0.3482 = 0.121
Change in
R2
Y1
A
C
B
A
D
X1
Variance specific to
X1 (part A)
X2
Variance common to X1
and X2 (part C)
R2
Variance specific
to X2 (part B)
Left over
1 - R2
SSR
SSE
R
1
SST
SST
2
2
(
y
y
)
i
i
2
(
y
y
)
i
i
R2 is 0.173
17.3% of variation in Y is
explained by Bodyweight and
HR Pulse
All Rights Reserved, Indian Institute of Management Bangalore
Adjusted R2
Inclusion of additional explanatory variable will increase the R2 value.
By introducing an additional explanatory variable, we increase the numerator of
the expression for R2 while the denominator remains the same.
To correct this defect, we adjust the R2 by taking into account the degrees of
freedom.
Adjusted R2
SSE/(n k 1)
2
RA 1
SST/(n 1)
2
R A Adjusted R - Square
n number of observations
k number of explanatory variables
Adjusted R2 is 0.167
H 0 : 1 2 ....... k 0
H A : Not all values are zero
Test for overall significance of multiple regression model.
Checks if there is a statistically significant relationship between Y and any of the
explanatory variables (X1, X2, , Xk).
F Statistic
F = MSR/MSE
R2 / k
(1 R 2 ) /( n k 1)
F = 3.216 x 1011/12524679898
i
t
Se ( i )
T- test:
By rejecting the null hypothesis, we can claim that there is a statistically
significant relationship between the response variable Y and explanatory
variable Xi.
Full Model:
Treatment cos t 0 1 Bodyweight 2 Bodyheight 3 HRpulse
4 RR
Reduced Model:
Treatment cos t 0 1 Bodyweight 2 HRpulse
1.253 10
0.9172
p value 0.4700
All Rights Reserved, Indian Institute of Management Bangalore
Dummy variable
When there are more than one qualitative variable, it is advisable to use (n-1)
dummy variables for both qualitative variables along with the intercept.
Use of n dummy variables along with intercept will result in
collinearity, known as dummy variable trap.
multi-
1. Gender (2 levels)
2. Marital Status (2 levels)
3. Key Complaints (6 levels)
Y = 0 + 1 x Female
Y=
x Female +
x Body Weight
Y 0 2 X 2 (M)
60
Y ( 0 1 ) 2 X 2 (F)
50
40
30
20
10
0
0
50
100
150
200
250
Key Complaints
1. ACHD
4. OS-ASD
2. CAD
5. OTHER
3.NONE
6.RHD
Derived variables may help to explain the variation in the response variable.
For example, consider a credit rating problem, which can be used for approving
housing loan. A bank may use factors such as loan amount and value of the
property etc.
Customer
Loan Amount
250,000
350,000
750,000
Derived Variables
Customer
Loan Amount
250,000
350,000
750,000
Value of Property
300,000
500,000
1,000,000
Loan / Value
0.83
0.70
0.75
Interaction Variables
Interaction
variable
The interaction variable is usually an interaction between a
quantitative variable and qualitative variable
All Rights Reserved, Indian Institute of Management Bangalore
Interaction Variable
Multicollinearity
Multi-collinearity
Symptoms of Multi-collinearity
F-test rejects the null hypothesis, but none of the individual t-tests are
rejected.
Effects of Multi-collinearity
Y 0 1 X 1 2 X 2
2
e
S
Var ( 1 )
2
SS
(
1
r
X2
12 )
S e2
Var ( 2 )
2
SS
(
1
r
X1
12 )
1
1 R 2j
VIF
Tolerance
(1-R2)
VIF
1.0
1.0
1.0
0.4
0.84
1.19
1.09
0.6
0.64
1.56
1.25
0.8
0.36
2.78
1.67
0.87
0.25
4.0
2.0
0.9
0.19
5.26
2.29
Impact on Se()
VIF
DAD CASELET
In forward selection procedure in which variables are sequentially entered into the model.
The first variable considered for entry is the one with smallest p-value based on F-test.
A variable selection procedure in which all variables are entered into the
equation and then sequentially removed starting with the most nonsignificant variable.
Stepwise Regression
The first variable considered for entry into the equation is the one with the largest F value
or smallest p-value based on F-test.
At each step, the independent variable not in the equation that has the smallest probability
of F is entered. Variables already in the regression equation are removed if their
probability of F becomes sufficiently large.
+ 33998.332 OTHERS