Sie sind auf Seite 1von 17

Quantile

Regression
Recall: Ordinary Least Squares Model
𝑌𝑖 = 𝛽0 + 𝛽1 𝑥𝑖1 + ⋯ + 𝛽𝑘 𝑥𝑖𝑘 + 𝜀𝑖 , 𝑤ℎ𝑒𝑟𝑒 𝜀~𝑁(0, 𝜎 2 )

• attempts to describe how the location of the conditional distribution


behaves by utilizing the mean of a distribution to represent its central
tendency
• it invokes a homoskedasticity assumption; that is, the conditional
variance, Var (y|x), is assumed to be a constant 2 for all values of the
covariate
• assumes normality
• outliers may affect estimates
Illustration
Illustration
Suppose that the number of years of schooling and race (white vs non-white) are used to
explain income.

෣ = −42655.95 + 6313.654𝑆𝐶𝐻𝑂𝑂𝐿 + 11451.75𝑊𝐻𝐼𝑇𝐸


𝐼𝑁𝐶𝑂𝑀𝐸

Also, using the Breusch-Pagan Test results in rejecting the null hypothesis of
homoskedasticity.
Comparison vs Quantile Regression
Ordinary Least Squares Quantile Regression Model

objective function sums of squared residuals asymmetrically weighted absolute


residuals
estimates conditional mean functions conditional quantile functions, such
as conditional median functions

allows no yes
heteroskedasticity?
distributional normality and homoskedasticity of none
assumptions error terms

comprehensiveness only yields information about the yields information about the whole
conditional mean E(Y|X) conditional distribution of Y
Quantile Regression Model
Proposed by Koenker and Bassett (1978), quantile regression models conditional quantiles as
functions of predictors. It estimates the effect of a covariate on various quantiles in the
conditional distribution.

(𝑝) (𝑝) 𝑝 (𝑝)


𝑌𝑖 = 𝛽0 + 𝛽1 𝑥𝑖1 + ⋯ + 𝛽𝑘 𝑥𝑖𝑘 + 𝜀𝑖

The distance of points from a line is measured using a weighted sum of vertical distances
(without squaring):
• points below the fitted line are given a weight 1-p;
• points above the fitted line are given a weight p.

Each choice for this proportion p gives rise to a different fitted conditional-quantile function.
The task is to find an estimator with the desired property for each possible p.
Obtaining Estimates
Several algorithms exist:
• Simplex Method - for moderate data size
• Interior Point Method - for large data size
• Interior Point Method with Preprocessing - for very large data sets (n>105)
• Smoothing Method
Inference
Methods of Constructing Confidence Intervals
• Sparsity (for very large datasets)
o based on the asymptotic distribution of the 𝛽෢ 𝑝 ′𝑠: the asymptotic dispersion matrix
involves the reciprocal of the density function of the error terms
• Inversion of Rank Tests (for small datasets)
o based on the relationship between order statistics and rank scores
o involves linear programming (simplex method)
• Bootstrap (Resampling) (for moderate datasets)
o does not make use of any distributional assumption
o the number of resamples, M, is usually between 50 and 200
Comparison
Assessing Goodness of Fit
• In OLS: measured by the coefficient of determination (R2)
• An analog of the R2 is also developed for quantile regression models.
• Since quantile-regression models are based on minimizing a sum of weighted distances –
with different weights used depending on whether 𝑦𝑖 > 𝑦ො𝑖 or 𝑦𝑖 < 𝑦ො𝑖 , goodness of fit is
measured that is consistent with this criterion.
• This is called the pseudo-R2.
Interpretation of Coefficients
• The QRM coefficient estimate is interpreted as the estimated change in the pth quantile
of the response variable corresponding to a unit change in the regressor.
Median-Regression Model (MRM)
• The simplest QRM is the median-regression model (MRM), expresses the conditional
median of a response variable given predictor variables and alternative to OLS that fits
the conditional mean. MRM and OLS both attempt to model the central location of a
response variable.

• Median-regression model is more suitable in modeling the behavior a collection of


skewed conditional distributions. For instance, if these conditional distributions are
skewed to the right, their means reflects what is happening in the upper tail and not in
the middle.

• Interpretation: In the case of a continuous X, the coefficient estimate is interpreted as


the change in the median of the response variable corresponding to a unit change in the
predictor.
Using QRM to Detect Shape Shifts
• Arrays of QRM coefficients for a range of quantiles can be used to determine how a one-
unit increase in the covariate affects the shape of the response distribution.
• This shape shift is highlighted using the graphical method. For a particular covariate, we
plot the coefficients and the confidence envelope, where the X effects on the y-axis and
the value of p (quantile) is on the x-axis.
Interpreting Shape Shifts
Using the graphical method,
• A horizontal line indicates a pure location shift by a one-unit increase in the covariate.
• An upward-sloping curve indicates an increase in the scale: The effect of one unit
increase of the regressor is positive for all values of p and steadily increasing with p.
• A downward-sloping curve indicates a decrease in the scale.
Example
Suppose that the number of years of schooling and race (white vs non-white: dummy
variable) are used to explain income (in USD).

Estimates:
.05 .10 .20 .25 .30 .40 .50 .60 .70 .75 .80 .90 .95
ED 1130 1782 2757 3172 3571 4266 4794 5571 6224 6598 6954 8279 9575
WHITE 3197 4689 6557 6724 7541 8744 9792 11091 11739 12142 12972 14049 17484
Example
Suppose that the number of years of schooling and race (white vs non-white) are used to
explain income.

25000
10000

20000
8000

10000 15000
6000

white
ed
4000

5000
2000

0
0 .2 .4 .6 .8 1 0 .2 .4 .6 .8 1
Quantile Quantile
Using QRM to Detect Skewness Shifts
• Using QRM coefficients (parameter estimates), the quantity skewness shift (SKS) can be
computed.
• When SKS=0, it indicates either no scale shift or a proportional scale shift.
• SKS<0 indicates a reduction of right-skewness due to the effect of the explanatory
variable (unit increase); SKS>0 indicates an exacerbation of right-skewness.

Das könnte Ihnen auch gefallen