Sie sind auf Seite 1von 44

Regression Issues

Normality of Residuals
Probability Plots
X Cum Freq Cum (E) Cum (O)
0 1 0.006738 0.005
1 3 0.040428 0.015
2 13 0.124652 0.065
3 42 0.265026 0.21
4 76 0.440493 0.38
5 123 0.615961 0.615
6 150 0.762183 0.75
7 165 0.866628 0.825
8 186 0.931906 0.93
9 196 0.968172 0.98
10 200 0.986305 1
>10 200 1 1
Probability Plots

1.2 1.2

1 1

0.8
0.8

0.6
0.6

0.4
0.4

0.2
0.2

0
1 2 3 4 5 6 7 8 9 10 11 12
0
0 0.2 0.4 0.6 0.8 1 1.2
Heteroscedasticity

• One of the assumptions of the classical linear regression


(CLRM) is that the variance of ui, the error term, is constant, or
homoscedastic.
• Variance of the residuals is different across different
observations
• Var (ei) = σi2
Heteroscedasticity
• If the width of the scatter plot of the residuals either increase or
decrease then the variance of the error term is not constant. This
problem is called Heteroscedasticity.

• When heteroscedasticity exist, we cannot use the ordinary least-


squares method for estimating the regression and should use a more
complex method called generalized least squares.
No Pattern is Good Pattern
Heteroscedasticity
Heteroscedasticity
Heteroskedasticity
United Technologies-Normal Distribution
QQ-Plot
Park’s Test
• Simple test is Park Test
• Fit a regression equation
• Ln ei2 = a + bi ln Xi
• If b is statistically significant, then there is a presence of Heteroscedasticity

Other Tests
Breusch-Pagan (BP) Test
White’s Test of Heteroscedasticity
Other tests such as Glejser, Spearman’s rank correlation, and Goldfeld-Quandt tests of
heteroscedasticity
Breusch-Pagan (BP) Test
• Estimate the OLS regression, and obtain the squared residuals
• Regress the square residuals on the k regressors included in the model.
• Other regressors also can be used if they have some bearing on the error
variance.
• Set up H0 that the error variance is homoscedastic and hence all the slope
coefficients are simultaneously equal to zero.
• Use the F statistic from this regression to test H0.
• Reject H0 based on the F test
White’s Test
• Obtain the squared residuals from OLS
• Regress the squared residuals on the regressors, the squared terms of
these regressors, and the pair-wise cross-product term of each regressor.
• Multiply the R2 value of this regression with n
• Under the null hypothesis that there is homoscedasticity, this product
follows the Chi-square distribution with df = number of coefficients
estimated.
• The White test is more general and more flexible
Possible Reasons
• The presence of outliers in the data
• Incorrect functional form of the regression model
• Incorrect transformation of data
• Mixing observations with different measures of scale (such as mixing
observations from different groups with different variances)
Consequences

The OLS estimators are still unbiased and


consistent, yet the estimators are less efficient
(i.e., the variance is not necessarily the lowest),
Statistical inference becomes less reliable
The estimators are simply linear unbiased
estimators (LUE).
Remedial Measures
• Use method of Weighted Least Squares (WLS)
• Take natural log of dependent variable
• Use of Deflators
MULTICOLLINEARITY
 One of the assumptions of the classical linear regression (CLRM) is that
there is no exact linear relationship among the regressors.
 High correlation between the independent variables (X variables) is called
multicollinearity.
 Regression coefficients in presence of strong linear relationship measure
combined effect.
 Leads to unstable coefficients depending on X variables in model
Multicollinearity
 Always exists up to some degree
Perfect collinearity: A perfect linear relationship
between the two variables exists.
Not very common but can happen
Imperfect collinearity: The regressors are highly (but
not perfectly) collinear.

 Example
When Collinearity Is Not Perfect, But High
The OLS estimators are still BLUE, but one or more regression coefficients
have large standard errors compared to the values of the coefficients, and
the t ratios small leading to the conclusion that the true values of these
coefficients are not different from zero.
the R2 value may be very high, but some regression coefficients are
statistically not significant,
The regression coefficients become very sensitive to small changes in the
data. Happens if the sample is relatively small.
Impact of Multicollinearity
• The variances of regression coefficient estimators are inflated.
• The magnitudes of regression coefficient estimates may be different.
• Adding and removing variables produce large changes in the coefficient
estimates.
• Removing a data point causes large change in the coefficient estimate.
• In some cases, the F ratio is significant, but none of the t ratios are.
Variance Inflation Factor
Consider the regression model:

𝑌 𝑖=𝐵1 +𝐵2 𝑋 2𝑖 +𝐵3 𝑋 3 𝑖+𝑢𝑖


 

 
𝜎2 𝜎2
var (¿ 𝑏2)= 2 2
= 2 𝑉𝐼𝐹 ¿
∑ 𝑥 2 𝑖 (1−
2 𝑟 23 ) ∑ 𝑥22 𝑖
 
𝜎 𝜎
var (¿𝑏3 )= 2 2
= 2 𝑉𝐼𝐹 ¿
∑ 𝑥3 𝑖 (1 −𝑟 23 ) ∑ 𝑥 3 𝑖
• where σ2 is the variance of the error term ui, and r23 is the coefficient of
correlation between X2 and X3.
Variance Inflation Factor
  1
𝑉𝐼𝐹 = 2
1 − 𝑟 23

is the variance-inflating factor.


• VIF is a measure of the degree to which the variance of the
OLS estimator is inflated because of collinearity.
• A VIF greater than 10 indicates that collinearity is a problem.
Effect of r on Variance of B2
r VIF={1/(1-r2)} Var (B2)
0 1 A
0.5 1.33 1.33 * A
0.7 1.96 1.96 * A
0.8 2.78 2.78 * A
0.9 5.26 5.26 * A
0.95 10.26 10.26 * A
0.97 16.92 16.92 * A
0.98 25.25 25.25 * A
0.99 50.25 50.25 * A
0.995 100.25 100.25 * A
0.999 500.25 500.25 * A
Detection Of Multicollinearity
• High R2 but few significant t ratios
• High pair-wise correlations among explanatory variables
• High partial correlation coefficients
• Significant F test for auxiliary regressions (regressions of each regressor
on the remaining regressors)
• High Variance Inflation Factor (VIF) and low Tolerance Factor
(Tolerance Factor is the inverse of VIF)
Remedial Measures
• Do Nothing! We have no control over the data.
• Redefine the model by excluding variables may solve the problem, but, make
sure not to omit relevant variables.
• Try non-linear (different) forms
• Use PCA
• Construct artificial variables from the regressors such that they are orthogonal to one another.
• These principal components become the regressors in the model.
• Interpretation of the coefficients on the principal components can be difficult.
Autocorrelation
• In regression, it is assumed that the successive error terms (residuals) are
independent i.e., the covariance between ui, and uj, is zero.
• Usually, Autocorrelation is defined as correlation between members of
series of observations ordered in time.
Reasons for autocorrelation include:
The possible strong correlation between the value in time t with the
value in time t+1
Commonly found in time series data
Consequences

The OLS estimators are unbiased and consistent.


They are normally distributed in large samples.
They are no longer efficient, meaning that they are no longer BLUE.
In most cases standard errors are underestimated.
Thus, the hypothesis-testing procedure becomes suspect, since the
estimated standard errors are not be reliable, even in large samples
Detection Of Autocorrelation
• Graphical method
• Plot the residuals, et, in the original temporal order.
• If discernible pattern is detected, autocorrelation likely a problem
• Use Durbin-Watson test
• Use Breusch-Godfrey (BG) test
Durbin Watson Statistic
n

2
 (ei  ei 1 ) ee i i 1
d  i 2 d  2(1  )
n
 i
e 2 e i
2
i

i 1

It can be shown that d value lies between 0 and 4

There is no unique critical value of d that will lead to the


rejection or acceptance of the null hypothesis
Assumptions for D W Test
•1.  The regression model includes an intercept term.

2. The X values are fixed


3. The error term follows the first-order autoregressive (AR1) scheme:
  𝑢𝑡 = 𝜌 𝑢𝑡 − 1 + 𝑣 𝑡

where (rho) is the coefficient of autocorrelation, a value between -1 and 1.


4. The error term is normally distributed.
5. The regressors do not include the lagged value(s) of the dependent variable, Yt-1
Critical Regions of Durbin Watson test

0 dL dU 4 -dU 4 -dL 4

Positive No Negative
autocorrelation autocorrelation autocorrelation

Inconclusive Inconclusive
Durbin-Watson Decision rules
Null hypothesis Decision If

No positive autocorrelation Reject 0 < d < dL

No Positive autocorrelation No decision dL < d < d U

No negative correlation Reject (4-dL) < d < 4

No negative correlation No decision (4-dU) < d < (4-dL)

No autocorrelation positive or Do not reject dU < d < (4 – dU)


negative
Breusch-Godfrey (BG) Test
• This test allows for:
• Lagged values of the dependent variables to be included as regressors
• Higher-order autoregressive schemes, such as AR(2), AR(3), etc.
• Moving average terms of the error term, such as ut-1, ut-2, etc.
• The error term in the main equation follows the following AR(p)
autoregressive structure:

𝑢𝑡 =𝜌1 𝑢𝑡 −1 +𝜌2 𝑢𝑡 − 2+...+𝜌 𝑝 𝑢𝑡 − 𝑝 +𝑣 𝑡


 

• The null hypothesis of no serial correlation is:

 
𝜌1= 𝜌2=...=𝜌 𝑝 =0
Breusch-Godfrey (BG) Test
• The BG test involves the following steps:
• Regress et, the residuals from our main regression, on the regressors in the
model and the p autoregressive terms given in the equation on the previous
slide, and obtain R2 from this auxiliary regression.
• If the sample size is large, BG have shown that: (n – p) R2 ~ χ2p
•  Test H0
• Use the F value obtained from the auxiliary regression.
• This F value has (p , n-k-p) degrees of freedom in the numerator and denominator,
respectively, where k represents the number of parameters in the auxiliary
regression (including the intercept term).
Remedial Measures
First-Difference Transformation
If autocorrelation is of AR(1) type, then
𝑢𝑡 − 𝜌 𝑢𝑡 −1= 𝑣 𝑡
 

Assume ρ=1 and run first-difference model (taking first difference of dependent
variable and all regressors)

Generalized Transformation
Estimate value of ρ through regression of residual on lagged residual and use
value to run transformed regression

Model Evaluation
Outliers
• Mahalanobis Distance: A measure of how much an observation’s value on
the independent variables differ from the average of all observations.
• Cook’s Distance A measure of how much the residuals of all cases would
change if a particular case were excluded from the calculation of the
regression coefficients.
• Leverage Values – Measures the influence of a point on the fit of the
regression. Leverage value ranges from 0 to (n -1)/n
• DFFIT and DFBETA values
Mahalanobis Distance
• Identify
  influential observations
• It is the distance between a specific observation and the centroid of all observations
of the explanatory variables.

• Where is the Mahalanobis distance of point Xi and μi is the mean. S-1 is the co-
variance matrix
• should be less than the critical value with degrees of freedom equal to the number
of independent variables in the model
• A simple thumb rule can be to consider any observation with larger than 10 as
influential observation.
Mahalanobis Distance
• Mahalanobis distance is used to check whether the sample point is an outlier or not.
• If the Mahalanobis distance is greater than the critical value, then the sample would
be rejected as an outlier.
• As a rule of thumb, the maximum Mahalanobis distance should not exceed the
critical chi-squared value with degrees of freedom equal to number of predictors
and alpha =.001, or else outliers may be a problem in the data.
• The minimum, maximum, and mean Mahalanobis distances are displayed by SPSS
in the "Residuals Statistics" table when "Casewise diagnostics" is checked under
the Statistics button in the Regression dialog.
Cook’s Distance
•  Measures the change in regression parameters
• Measures how much the predicted value of the dependent variable will change for
all observations in the sample, when a particular observation is excluded from the
sample

• Where Di is the Cook’s distance measure for ith observation, is the predicted value
of jth observation with ith observation included in the model and is the predicted
value of jth observation with ith observation excluded from the model. k is the
number of independent variables in the model.
• A value of more than 1 for Cook’s distance indicates influential observation. Some
times a value of (4/(n-k-1)) is also used as a cut off value
Leverage
• If a point has a large leverage, then the slope of the regression line follows
more closely the slope of the line between that point and the mean point.
• Points with small leverage may not have much influence on the regression
coefficients.
Leverage Value

• Recall Hat matrix


• H=X(XTX)-1XT
• The leverage value of an observation (ith) is

• Hi = [Hii] = Xi(XTX)-1XiT

• A Leverage value greater than 2(k+1)/n is treated as an influential observation


• A Leverage value greater than 3(k+1)/n is treated as a highly influential
observation
Questions??

Das könnte Ihnen auch gefallen