Sie sind auf Seite 1von 9

Regression Diagnostics

Regression Diagnostic asks 3 questions: Are the assumptions of multiple regression complied with? Is the model adequate? Is there anything unusual about any data points?

Checking for Non-violation of Assumptions

Linearity of relationship between each X and Y can be checked by scatter plot of Y against each X. Normality of distribution of Y data points can be checked by plotting a histogram of residuals. Independence of explanatory variables from each other can be checked by scatter matrix, Variance Inflation Factor and Durbin-Watson statistic.

Diagnosis of Multi-collinearity

Check by means of correlation matrix Significant F but non-significant t-ratios. Variance Inflation. Large changes in regression coefficients when variables are added or deleted. Variance Inflation Factor (VIF) > 4 or 5 suggests multicollinearity; VIF > 10 is strong evidence that collinearity is affecting the regression coefficients. Durbin Watson statistic is another check for collinearity. (Normal value 0-4).

Diagnosis of Violation of Assumptions


Residual Plots are used to check for: Variance not being constant across the explanatory variables. Fitted relationship not being linear. Random variation not having a Normal distribution.

Fitted Values and Residuals

Fitted values (Fits) are the estimates of Y as determined by the regression equation. Residuals (Resids) are the differences between each observed value and the corresponding fitted value.

Residual Plots
Histogram of the Residuals
(response is Crimrate)
10
2

Normal Probability Plot of the Residuals


(response is Crimrate)

Frequency

Normal Score
-40 -30 -20 -10 0 10 20 30 40 50

-1

-2

-40

-30

-20

-10

10

20

30

40

50

60

Residual

Res idual

60 50 40 30

100 50 0

Residuals Versus the Fitted Values


(response is Crimrate)
60 50

100 50 0
5

Residuals Versus the Order of the Data


(response is Crimrate)

Residual

10 0 -10 -20 -30 -40

Residual

20

1st Qtr
50

3rd Qtr
100

East West North


150 200

40 30 20 10 0 -10 -20 -30 -40

Fitted Value

1st Qtr
10

15

3rd Qtr
20

East West North


25 30 35 40 45

Obs ervation Order

Abnormal Patterns in Residual Plots

Figures a). and b). suggest non-linear relationship between X and Y. Fig. c). Suggests autocorrelation. Fig. d). Suggests variance is not the same since the spread of Y values is far greater for larger values of X.

Checking Unusual Data Points

Check for outliers long distance away from the rest of the data. They exercise leverage, which is checked by hi. It is considered large if more than 3 x p /n (p=number of predictors including the constant). Flagged by X in printout. Cooks Distance which measures the influence of a data point on the regression equation. Cooks D > 1 requires careful checking; > 4 suggests potentially serious outliers.

Patterns of Outliers

a). Outlier is extreme in both X and Y but not in pattern. Removal is unlikely to alter regression line. b). Outlier is extreme in both X and Y as well as in the overall pattern. Inclusion will strongly influence regression line c). Outlier is extreme for X nearly average for Y. d). Outlier extreme in Y not in X. e). Outlier extreme in pattern, but not in X or Y.

Das könnte Ihnen auch gefallen