Beruflich Dokumente
Kultur Dokumente
Regression Diagnostic asks 3 questions: Are the assumptions of multiple regression complied with? Is the model adequate? Is there anything unusual about any data points?
Linearity of relationship between each X and Y can be checked by scatter plot of Y against each X. Normality of distribution of Y data points can be checked by plotting a histogram of residuals. Independence of explanatory variables from each other can be checked by scatter matrix, Variance Inflation Factor and Durbin-Watson statistic.
Diagnosis of Multi-collinearity
Check by means of correlation matrix Significant F but non-significant t-ratios. Variance Inflation. Large changes in regression coefficients when variables are added or deleted. Variance Inflation Factor (VIF) > 4 or 5 suggests multicollinearity; VIF > 10 is strong evidence that collinearity is affecting the regression coefficients. Durbin Watson statistic is another check for collinearity. (Normal value 0-4).
Fitted values (Fits) are the estimates of Y as determined by the regression equation. Residuals (Resids) are the differences between each observed value and the corresponding fitted value.
Residual Plots
Histogram of the Residuals
(response is Crimrate)
10
2
Frequency
Normal Score
-40 -30 -20 -10 0 10 20 30 40 50
-1
-2
-40
-30
-20
-10
10
20
30
40
50
60
Residual
Res idual
60 50 40 30
100 50 0
100 50 0
5
Residual
Residual
20
1st Qtr
50
3rd Qtr
100
Fitted Value
1st Qtr
10
15
3rd Qtr
20
Figures a). and b). suggest non-linear relationship between X and Y. Fig. c). Suggests autocorrelation. Fig. d). Suggests variance is not the same since the spread of Y values is far greater for larger values of X.
Check for outliers long distance away from the rest of the data. They exercise leverage, which is checked by hi. It is considered large if more than 3 x p /n (p=number of predictors including the constant). Flagged by X in printout. Cooks Distance which measures the influence of a data point on the regression equation. Cooks D > 1 requires careful checking; > 4 suggests potentially serious outliers.
Patterns of Outliers
a). Outlier is extreme in both X and Y but not in pattern. Removal is unlikely to alter regression line. b). Outlier is extreme in both X and Y as well as in the overall pattern. Inclusion will strongly influence regression line c). Outlier is extreme for X nearly average for Y. d). Outlier extreme in Y not in X. e). Outlier extreme in pattern, but not in X or Y.