Multicollinearity

PRESENTATION
ON
MULTICOLINEARITY
DUSHYANT
2018MBA011
MULTICOLLINEARITY
• Multicollinearity occurs when two or more

independent variables in a regression model
are highly correlated to each other.
• Standard error of the OLS parameter estimate

will be higher if the corresponding independent
variable is more highly correlated to the other
independent variables in the model.
THE NATURE OF MULTICOLLINEARITY
Multicollinearity originally it meant the existence of a “perfect,” or
exact, linear relationship among some or all explanatory variables of a
regression model. For the k-variable regression involving explanatory
variable X1, X2, . . . , Xk (where X1 = 1 for all observations to allow for
the intercept term), an exact linear relationship is said to exist if the
following condition is satisfied:
λ1X1 + λ2X2 +· · ·+λkXk = 0
where λ1, λ2, . . . , λk are constants such that not all of them are zero
simultaneously.
Today, however, the term multicollinearity is used to include the case
where the X variables are intercorrelated but not perfectly so, as
follows:
λ1X1 + λ2X2 +· · ·+λ2Xk + vi = 0
where vi is a stochastic error term.
PERFECT MULTICOLLINEARITY
• Perfect multicollinearity occurs when there is a

perfect linear correlation between two or more
independent variables.
• When independent variable takes a constant

value in all observations.
CAUSES OF MULTICOLLINEARITY
CAUSES OF MULTICOLLINEARITY
Constraints
on the Statistical
Data An over
model or in model
collection
method
the determine
population specificati
employed d model
being on
sampled.
CONSEQUENCES OF MULTICOLLINEARITY
CONSEQUENCES OF MULTICOLLINEARITY
In cases of near or high multicollinearity, one is

likely to encounter the following consequences:
1. the OLS estimators have large variances and
covariance, making precise estimation difficult.
2. The confidence intervals tend to be much wider,
leading to the acceptance of the “zero null
hypothesis” (i.e., the true population coefficient is
zero) more readily.
2. WIDER CONFIDENCE INTERVALS
Because of the large standard errors, the
confidence intervals for the relevant
population parameters tend to be larger.
Therefore, in cases of high multicollinearity,
the sample data may be compatible with a
diverse set of hypotheses. Hence, the
probability of accepting a false hypothesis
(i.e., type II error) increases.
3) t ratio of one or more coefficients tends to be
statistically insignificant.
• In cases of high collinearity, as the estimated

standard errors increased, the t values
decreased..
Therefore, in such cases, one will increasingly
accept the null hypothesis that the relevant
true population value is zero
4) Even the t ratio is insignificant, R2 can be very high.
Since R2 is very high, we reject the null

hypothesis i.e H0: β1 =β2 = Βk = 0; due to the
significant relationship between the variables.
But since t ratios are small again we reject H0.
Therefore there is a contradictory conclusion

in the presence of multicollinearity.
DETECTION OF MULTICOLLINEARITY
VARIANCE INFLATION FACTORS
• VARIANCE INFLATION FACTORS ARE VERY USEFUL

IN DETERMINING IF MULTICOLLINEARITY IS
PRESENT.
2 1
VIFj  C jj  (1  R )
j
• VIFS > 5 TO 10 ARE CONSIDERED SIGNIFICANT. THE

REGRESSORS THAT HAVE HIGH VIFS PROBABLY
HAVE POORLY ESTIMATED REGRESSION
COEFFICIENTS
DETECTION OF
MULTICOLLINEARITY
• Multicollinearity cannot be tested; only the degree of
multicollinearity can be detected.
• Multicollinearity is a question of degree and not of kind.

The meaningful distinction is not between the presence
and the absence of multicollinearity, but between its
various degrees.
• Multicollinearity is a feature of the sample and not of the

population. Therefore, we do not “test for
multicollinearity” but we measure its degree in any
particular sample.
A. THE FARRAR
TEST
• Computation of F-ratio to test the location of
multicollinearity.
• Computation of t-ratio to test the pattern of

multicollinearity.
• Computation of chi-square to test the presence of

multicollinearity in a function with several
explanatory variables.
B. BUNCH ANALYSIS
a. Coefficient of determination, R2 , in the
presence of multicollinearity, R2 is high.
b. In the presence of muticollinearity in the
data, the partial correlation coefficients, r12
also high.
c. The high standard error of the parameters
shows the existence of multicollinearity.
REMEDIAL MEASURES
REMEDIAL MEASURES
• Do nothing
– If you are only interested in prediction, multicollinearity
is not an issue.
– t-stats may be deflated, but still significant, hence
multicollinearity is not significant.
– The cure is often worse than the disease.
• Drop one or more of the multicollinear variables.
– In an effort to avoid specification bias a researcher can
introduce multicollinearity, hence it would be
appropriate to drop a variable.
• Transform the multicollinear variables.
– Form a linear combination of the multicollinear variables.
– Transform the equation into first differences or logs.
• Increase the sample size.

– The issue of micronumerosity.
– Micronumerosity is the problem of (n) not exceeding (k). The
symptoms are similar to the issue of multicollinearity (lack of
variation in the independent variables).
• A solution to each problem is to increase the sample size.

This will solve the problem of micronumerosity but not
necessarily the problem of multicollinearity.
CONCLUSION
CONCLUSION
• Multicollinearity is a statistical phenomenon in
which there exists a perfect or exact relationship
between the predictor variables.
• When there is a perfect or exact relationship
between the predictor variables, it is difficult to
come up with reliable estimates of their individual
coefficients.
• The presence of multicollinearity can cause
serious problems with the estimation of β and the
interpretation.
• When multicollinearity is present in the data,
ordinary least square estimators are
imprecisely estimated.
• If goal is to understand how the various X
variables impact Y, then multicollinearity is a
big problem. Thus, it is very essential to detect
and solve the issue of multicollinearity before
estimating the parameter based on fitted
regression model.
• Detection of multicollinearity can be done by
examining the correlation matrix or by using
VIF.
• Remedial measures help to solve the problem
of multicollinearity.

Multicollinearity

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Multicollinearity

Hochgeladen von

Copyright:

Verfügbare Formate

PRESENTATION

• Multicollinearity occurs when two or more

• Standard error of the OLS parameter estimate

• Perfect multicollinearity occurs when there is a

• When independent variable takes a constant

In cases of near or high multicollinearity, one is

• In cases of high collinearity, as the estimated

Since R2 is very high, we reject the null

Therefore there is a contradictory conclusion

• VARIANCE INFLATION FACTORS ARE VERY USEFUL

• VIFS > 5 TO 10 ARE CONSIDERED SIGNIFICANT. THE

• Multicollinearity is a question of degree and not of kind.

• Multicollinearity is a feature of the sample and not of the

• Computation of t-ratio to test the pattern of

• Computation of chi-square to test the presence of

• Increase the sample size.

• A solution to each problem is to increase the sample size.

Das könnte Ihnen auch gefallen