Beruflich Dokumente
Kultur Dokumente
Department of Economics
sungwonlee@utexas.edu
Abstract
It is common for empirical studies to specify models with many covariates to eliminate the
omitted variable bias, even if some of them are potentially irrelevant. In the case where models
are nonparametrically specied, such a practice results in the curse of dimensionality. There-
fore, it is important in nonparametric models to drop irrelevant variables and consider a small
number of covariates to improve the precision of estimation. This paper develops nonpara-
metric signicance tests for censored quantile regression models with duration outcomes. The
null hypothesis is characterized by a conditional moment restriction. I adopt the integrated
conditional moment (ICM) approach, which was developed by Bierens (1982); Bierens (1990),
to construct test statistics. The testing procedure does not require to estimate the alternative
models. Two test statistics are considered: one is the Kolmogorov-Smirnov type statistic and
the other is the Cramer-von-Mises type statistic. These test statistics are functionals of a sto-
chastic process which converges weakly to a centered Gaussian process. The test has non-trivial
power against local alternatives at the parametric rate. A subsampling procedure is proposed
to obtain critical values.
I am indebted to my advisor Stephen Donald for his guidance and support. I am also grateful to Jason Abrevaya,
Sukjin Han, Brendan Kline, Tong Li, and Haiqing Xu for helpful comments and suggestions. In particular, I sincerely
thank Pedro Sant'Anna for his invaluable helps. I also thank to Jessie Coe, Xinchen Gu, and Peter Toth. This paper
has also beneted from participants in the 27th Annual Meeting of Midwest Econometrics Group. All errors are
mine.
1 Introduction
Since the seminal work of Koenker and Bassett (1978), quantile regression (QR) models have re-
ceived much attention from both theoretical and applied econometrics. Due to many appealing
properties of QR, conditional quantile models have become a good alternative to conditional mean
models and thus increasingly gained attention recently. One of the desirable features of the QR
is that it can provide information on the distribution of the dependent variable conditional on co-
variates which allows to capture heterogeneous eects of the covariates on the dependent variable
across quantiles. In the context of the treatment eects literature, even if the average treatment
eect is the most common measure of treatment eects, the information on heterogeneity of treat-
ment eects, if exist, is likely to be missed as the mean eect integrates the heterogeneous factors .
1
In addition, it is well-known that the QR is less sensitive to the outliers than the mean regres-
sion, performs better than the mean regression, and places less assumptions on the distribution of
estimation and inference procedures and provides a natural way to interpret the model. However,
since economic theories seldom imply parametric specications, parametric models are vulnerable to
misspecication which results in misleading implications of the models. To alleviate the sensitivity
Even if a nonparametric model is attractive for its exibility, it typically requires a larger
sample than a parametric model to obtain estimators of reasonable precision. Moreover, the rate of
convergence is very slow when the number of covariates is large, known as the curse of dimensionality.
There are many attempts to circumvent the curse of dimensionality in the literature. The main
idea of these attempts is to impose structures on the model to improve the rate of convergence
and examples include additive separability of regression models and partially linear models (e.g.
Robinson (1988); Andrews and Whang (1990); Lee (2003); Horowitz and Lee (2005)). It is shown
that these structures can improve rate of convergence and consequently bypass the eciency issue of
fully nonparametric models. However, those models with additional structures are also not free from
This paper considers nonparametric tests for a null hypothesis that indicates that only a subset
of the entire covariates is of signicance. Once the model is nonparametrically estimated, rejection
of the null hypothesis is not an indication of misspecication but a suggestion that there are
omitted variables. The results of these tests provide information on variable selection in QR and
can mitigate problems caused by large dimensionality of covariates. To formulate test statistics, I
characterize the null hypothesis as a conditional moment restriction and then employ the integrated
conditional moment (ICM) approach which was proposed by Bierens (Bierens (1982, 1990)). Bierens
unconditional moments with an appropriately chosen weighting function in his series of papers.
1 Buchinsky (1994) uses QR to describe changes in the returns to education and experience across the distribution
of wage and part of his results indicates that the returns to education and experience exhibit heterogeneity across
the quantiles. Bitler et al. (2006) examine the eects of policy reforms on welfare including earnings, taking the
heterogeneity of the eect across the distribution into account. They nd out that there is a substantial heterogeneity
across the distribution from the estimation of the quantile treatment eect and that the average treatment eect
may result in a misleading prediction.
1
2
This result is used to construct a parametric specication testing . Bierens and Ploberger (1997)
establish the asymptotic theory for the ICM test statistic and obtain upper bounds on the critical
values that guarantee the actual size of test is bounded by the nominal size specied by researchers.
The tests of this paper dier from the original test of Bierens (Bierens (1982, 1990); Bierens and
Ploberger (1997)) in that this paper considers a nonparametric null hypothesis, which involves
innite-dimensional parameters.
One of the desirable features of the ICM approach is that it does not require to estimate
alternative models and this reduces some computational burden to obtain the test statistics. In
nonparametric tests, if a test statistic contains nonparametric objects and directly compares the
null model with alternatives, it is hard to achieve the power against local alternatives at n-rate
(e.g. Hardle and Mammen (1993); Hong and White (1995); Fan and Li (1996)). The ICM approach,
however, makes it possible to have a non-trivial power against local alternatives at the parametric
rate even if the test statistic contains some innite-dimensional parameters. Subsequent studies
that combine the ICM approach with nonparametric null hypotheses include Chen and Fan (1999),
Delgado and Manteiga (2001), Li et al. (2003), and Huang et al. (2016) just to name a few.
Unlike the model specication tests for conditional mean regression, the test statistics of this
paper contain an indicator function which is non-smooth and this non-smoothness introduces dif-
culty in applying the approach used for testing conditional mean models. I employ a stochastic
equicontinuity argument to obtain a stochastic expansion of the test statistics with the non-smooth
function to derive the asymptotic distribution of the test statistics. The stochastic equicontinuity
has been applied to both parametric and semi-/non-parametric models in, for example, Andrews
(1994a,b), Newey (1994), and Chen et al. (2003). Those papers use the stochastic equicontinuity
to derive the asymptotic distribution of an estimator in the presence of non-smooth functions and
innite-dimensional parameters. It is hard to directly show that some processes are stochasti-
cally equicontinuous and thus I make use of the empirical process theory to prove the stochastic
equicontinuity.
This paper also incorporates censoring for the dependent variable. In some empirical applications
such as a duration or survival analysis, the dependent variable is not completely observed due to
censoring. Consider, for example, the case where one may be interested in estimating the eect
of unemployment insurance benets on the unemployment duration and suppose that individuals'
duration spells are only observed when they were receiving the unemployment benet. In this case,
the duration spell is not completely observed and thus it is subject to censoring.
The test statistics require to estimate the conditional quantile function under the null hypothe-
sis. Even if a censoring variable is (conditionally) independent of the outcome of interest, ignoring
the censoring results in inconsistency of estimators. Therefore, I estimate the conditional distribu-
tion function by using a variant of the Kaplan-Meier (KM) estimator (Kaplan and Meier (1958))
then invert the estimated distribution function to obtain the conditional quantile function. To
accommodate the regressors, it is required to use a conditional KM estimator (Beran (1981); Da-
browska (1989, 1992); Gonzalez-Manteiga and Cadarso-Suarez (1994); Wang and Wang (2009)).
Specically, I use the local conditional KM estimator proposed by Kong and Xia (2017), which is
2 Similar questions are addressed in Stute (1997) and Koul and Stute (1999), but they use the indicator function
as the weighting function to transform conditional moment restrictions into unconditional ones.
2
The test statistics in this paper are asymptotically smooth functionals of a Gaussian process and
their asymptotic distributions depend on the data generating process. Therefore, it is dicult to
tabulate critical values for the test statistics. To resolve this problem, I use a subsampling method
to approximate the asymptotic distributions of the test statistics. It is shown that, under a set of
conditions, the subsampling method yields critical values that guarantee the asymptotically correct
size of test.
As one of the most related works, Volgushev et al. (2013) recently propose a nonparametric test
for signicance of covariates in conditional quantile models and they address the same question as
the one of this paper. The test proposed in this paper is similar to theirs in terms of the form of
the test statistic. The main dierence from Volgushev et al. (2013) is that this paper covers the
case where the dependent variable is subject to censoring. A minor dierence between Volgushev
et al. (2013) and this paper is that I use dierent weighting functions to construct test statistics.
Specically, Volgushev et al. (2013) use the indicator function as a weighting function in the test
statistic, but I consider another class of weighting functions for the test statistic. The class of
functions is called generically comprehensively revealing (GCR, hereafter) class, which is a termed
by Stinchcombe and White (1998). Huang et al. (2016) discuss the choice of weighting function
between the indicator function and the GCR class and point out that there are several advantages
of the GCR class over the indicator function for testing the conditional quantile independence.
Sant'Anna (2016) is another closely related work to this paper, regarding that he considers tests
for nonparametric models with duration outcomes. He proposes nonparametric specication tests
in the treatment eect context. His tests also rely on the Bierens's ICM approach, and he suggests
using a bootstrap procedure to obtain the critical values. The tests of Sant'Anna (2016) can also
be used to test similar hypotheses, such as homogeneity of the conditional average treatment eect,
to the hypothesis considered in this paper, but this paper focuses on QR models with duration
outcomes.
The rest of this paper is organized as follows. I briey review related studies in the literature in
the following subsection. Section 2 formalizes the model and construct the test statistics. Section
3 establishes the asymptotic theory of the tests. A subsampling procedure is provided in Section 4.
Section 5 concludes.
There are a lot of studies on both parametric and nonparametric QR models since Koenker and
Bassett (1978) who pioneer the theory on QR using the check function approach and establish the
asymptotic distribution of the QR estimator. Specically, this paper proposes a specication testing
estimated by the local polynomial quantile regression. Chaudhuri (1991) develops a nonparametric
estimation for the conditional quantile function and proposes a method similar to the local polyno-
mial regression and derives a Bahadur's representation. Some renements of the representation are
examined in several studies such as Kong et al. (2010); Guerre and Sabbah (2012); Lee et al. (2015);
Qu and Yoon (2015). For censored QR models, dierent methods to estimate conditional quantile
functions have been available in the literature (e.g. Buchinsky and Hahn (1998); Chernozhukov
and Hong (2002); Honore et al. (2002); Portnoy (2003); Wang and Wang (2009)). In contrast to
the standard QR models, the literature on nonparametric QR with (random) censoring is relatively
3
small and it includes Beran (1981), Dabrowska (1989), Dabrowska (1992), Kong et al. (2013), and
In terms of testing signicance of covariates, this paper is closely related to Fan and Li (1996)
who consider a nonparametric specication testing for conditional mean regression models. They
construct the test statistic only using the restricted model estimated by the kernel method and
derive the asymptotic distribution of the test statistic under the null. Their test statistic is based
on an equivalent conditional moment restriction and they show that the test is consistent and has
the power against local alternatives at a nonparametric rate slower than n. Chen and Fan (1999)
propose a nonparametric test for more general hypotheses for conditional mean models. Their test
procedure makes use of the ICM approach and they derive the asymptotic distribution of some
stochastic process by using the central limit theorem for Hilbert-valued random arrays that could
be serially correlated. Delgado and Manteiga (2001) address the same question to the one of Fan
and Li (1996) and they take the ICM approach with the kernel methods. Chen and Fan (1999)
and Delgado and Manteiga (2001) share some common features with the test of this paper: (i)
the null hypothesis is nonparametrically or semiparametrically specied and (ii) the testings rely
on the ICM approach. Whereas Chen and Fan (1999) and Delgado and Manteiga (2001) consider
conditional mean models, the test in this paper focuses on conditional QR models. One can also
refer to Lavergne and Vuong (2000); Lavergne and Patilea (2008); Lavergne et al. (2015) for testing
There are also a myriad of studies on the specication testings for QR models, but most of
studies focus on testing parametric specications. Koenker and Bassett (1982) investigate three
tests for linear QR models - the Wald, the likelihood ratio (LR), and the Lagrange multiplier (LM)
tests - with the focus on the signicance of covariates. Zheng (1998) proposes a testing procedure for
parametric specications under the conditional quantile restriction, which is similar to the approach
of Fan and Li (1996). The test of Bierens and Ginther (2001) relies on the ICM approach to testing
parametric specications of conditional quantile models. Horowitz and Spokoiny (2002) develop a
test statistic and provide a resampling method for obtaining critical values of their test statistic,
but their specication test is also applicable to the parametric specication testings. He and Zhu
(2003) and Whang (2006a) use a similar idea of the ICM approach, but the weighting function they
3
use is dierent from the ones considered in the original work of Bierens . Both suggest using some
resampling methods to simulate the distributions of the test statistics. Whang (2006b) develops a
specication test for the parametric conditional quantile model and focuses on the case where the
parameters in the model is estimated by the empirical likelihood method. As mentioned above, this
paper is dierent from those in that the test statistic accommodates innite-dimensional parameters
as the null hypothesis is nonparametrically formulated and that all of these studies do not consider
This paper is also related to variable selection in QR models. Belloni and Chernozhukov (2011)
investigate the issue on variable selection in high-dimensional sparse models. They propose l1 -
penalized QR to deal with the high-dimensionality with sparsity and establish asymptotic results
on the penalized QR estimators. The main dierence from Belloni and Chernozhukov (2011) is that
neither high-dimensional data nor the sparsity assumption are considered in this paper.
3 They consider the indicator function as an alternative to the weighting function of the form in Bierens and
Ginther (2001) and Bierens and Ploberger (1997).
4
2 Model and Test Statistics
Let T be the dependent variable of interest and X be a vector of covariates of dimension dx 2.
In cases where the outcome variable is censored by a variable denoted by C, researchers usually
0
2+dx
observe W (Y, D, X ) R , where
Yi = min(Ti , Ci ), Di = 1(Ti Ci ).
Let (0, 1) be given and FT |X (t|x) be the conditional distribution function of T given X = x.
Then -th conditional quantile function of T given X=x is dened as
Suppose that X can be divided into two parts X1 and X2 , where X1 Rd1 , X2 Rd2 , and
d1 + d2 = dx and that it is believed that the variable X2 is not signicant to the -th conditional
quantile of Y, conditional on X1 (i.e. the -th conditional quantile of Y given X only depends on
a specic form of QT |X . However, it is well-known that nonparametric estimators suer from the
curse of dimensionality, it would be desirable to omit such insignicant variables from regression to
Considering the duality between conditional quantile processes and the conditional distributions,
the notion of insignicance of covariates in QR is equivalent to the notion called conditional quantile
independence. Before formalizing the null hypothesis, recall the formal denition of conditional
quantile independence.
Denition 2.1. Let Y , X , and Z be random variables. For a given (0, 1) the variable Z is
said to be conditionally -quantile independent of Y on X if
In the QR context with a single quantile level , the conditional T -independence means that Z
is not signicant for -th conditional quantile function and thus the variable Z can be dropped out
of the regression. The concept of conditional -quantile independence is also related to important
Example 2.1 (The eect of unemployment insurance benet on unemployment duration) Many .
studies have examined factors aecting individual unemployment duration spell (e.g. Heckman and
Singer (1984); Han and Hausman (1990); Katz and Meyer (1990); Meyer (1990)), and unemploy-
ment insurance has been considered as one of the determinants of unemployment duration. Let U I
be the level of unemployment insurance benets and X be other covariates. Letting T denote unem-
ployment duration spell, one can formulate the null hypothesis for testing the eect of U I on the
quantile of T as QT |X,U I ( |X, U I) = QT |X ( |X) for some (0, 1). Meyer (1990) analyzes the
eect of unemployment insurance benet on unemployment duration, but he focuses on estimating
the eect of U I on the hazard rather than quantile of the unemployment duration.
5
Example 2.2 (Intergenerational association in timing of the rst marriage) Berrington and Di- .
amond (2000) investigate the eect of individual characteristics on timing of the rst partnership
formation. The individual characteristics can be divided into two groups: the rst group includes
current social characteristics and the other group consists of variables of family background. They
show that education is a key factor that aects the timing, but one may be interested in examining
intergenerational association in timing of formation of cohabitation at some quantile. Let X s be
the vector of the social characteristics of an individual and T f be the timing of formation of his pa-
rents' partnership. Dene T ( |X s , T f ) be the -th conditional quantile function of timing of the rst
partnership formation of the next generation. Then one can test the absence of intergenerational
association at -th quantile by considering T ( |X s , T f ) = T ( |X s ).
Suppose that a researcher is interested in estimating QT |X ( |x). Since the dependent variable T
is subject to censoring and thus incomplete, the conditional quantile function may not be identied
without additional structures of the model. In this paper, I assume that T and C are conditionally
independent given X to identify QT |X . Let (t|X) be the cumulative hazard function, then it can
be written as
t
dFT |X (s|X)
Z
(t|X) = ln(1 FT |X (t|X)). (2)
0 1 FT |X (s|X)
Equation (2) indicates that if (t|X) is identied, then FT |X (t|X) is identied and vice versa. Let
FC|X (|X = x) be the conditional distribution of C given X = x. Then the following lemma shows
that conditional independence of T and C given X is sucient for identication of FT |X and FC|X .
All mathematical proofs are presented in Appendix.
Lemma 2.1. Suppose that T C|X . Then, the conditional distribution functions FT |X and FC|X
are identied for almost all X X .
the conditional -th quantile of Y given X depends only on X1 . Thus, the null hypothesis is
It is natural to measure a distance between QT X ( |X) and QT X1 ( |X1 ) to test the null hypothesis
in (3). If both QY |X ( |X) and QY |X1 ( |X1 ) are estimated, then one can compare these two
estimators to measure a distance between them. In the case where test statistics contain innite-
dimensional parameters, however, comparing a null model with an alternative model generally fails
to achieve power against alternatives at the parametric rate n1/2 . Many nonparametric tests
that compare null models with alternative models suer such a loss of power (e.g. Hardle and
Mammen (1993); Hong and White (1995); Su and White (2008)) and some nonparametric tests
involving nonparametric objects may fail to detect local alternatives at the rate n1/2 even if they
require to only estimate null models (e.g. Fan and Li (1996); Zheng (1998) ). This is because the
4
nonparametric estimators have slower rates of convergence and thus a direct comparison involving
some innite-dimensional parameter would be costly. In this paper, I adopt Bierens's approach to
4 Zheng (1998) considers specication tests for parametric conditional quantile models and the models are para-
metrically estimated. However, his test statistic needs to be nonparametrically estimated and he uses the kernel
method to construct the test statistic. Consequently, the test detects local alternatives at a slower rate than n1/2 .
6
specication testings (Bierens (1990); Bierens and Ploberger (1997)) which is known as the ICM
test. The main idea of the ICM approach is to transform a conditional moment restriction into an
To utilize the ICM approach, the null hypothesis in (3) needs to be reformulated as a con-
ditional moment restriction. By denition of the conditional quantile function, one has that
Pr(Y QY |X ( |X)|X) = , T and thus it is true that Pr(Y QY |X1 ( |X1 )|X) = , T
under the null hypothesis. This observation turns out to be true under some condition and one can
formulate an equivalent null hypothesis to (3) with an additional condition. The following lemma
demonstrates that (3) can be rewritten as above and that they are equivalent under a uniqueness
assumption.
Lemma 2.2. Suppose that FT |X is identied. Let (0, 1) be given. Suppose that the -th
conditional quantile function QT |X ( |X) is unique almost surely in X , that Pr(D = 1|X = x)
(0, 1] uniformly in x X . Then equation (3) holds if and only if the moment condition
The conditional moment characterization of the null hypothesis in (4) leads us to adopting
to innitely many unconditional moment restrictions. However, Bierens (Bierens (1982, 1990))
develops a tractable way to handle the innitely many unconditional moments by appropriately
for all t I. Therefore, one can consider testing (5) to see if the conditional moment restriction in
1 X
Jn (t; ) Di {1(Yi QT |X1 ( |X1i )) }(Xi , t) (6)
n i
which is a sample analogue of (5). Since the moment restrictions in (5) are indexed by t and they
need to hold for all t I, either the Kolmogorov-Smirnov (KS) or the Cramer-von-Mises (CM)
where () is a (probability) measure on I . Note that these test statistics are continuous functionals
then replaced by its estimator. To do so, I estimate the conditional distribution function FT |X1 and
7
then invert it to obtain the conditional quantile function. Since T is not completely observed in the
data, I propose to estimate the conditional distribution by using a local KM estimator. The local
KM estimators are proposed in, for example, Dabrowska (1989); Gonzalez-Manteiga and Cadarso-
Suarez (1994); Wang and Wang (2009). Specically, Gonzalez-Manteiga and Cadarso-Suarez (1994)
Bnj (x)
FT |X (y|X = x) = 1 nj=1 {1 Pn }bj (y) ,
k=1 1(Yk Yj )Bnk (x)
where bj (y) = 1(Yj y, Dj = 1) and {Bnk (x) : k = 1, 2, ..., n} is a sequence of nonnegative weights
adding up to 1. While several types of weights are considered in the literature, most of them usually
focus on the situation where the dimension of X is small. I use the weight in Kong and Xia (2017)
weight.
Under the conditional independence between T and C given X, Lemma 2.1 shows that the
conditional distribution of T given X , FT |X is identied and thus one can estimate the distribution
function. It, however, is only required to estimate the conditional quantile function of T given
X1 to construct test statistics. Assuming further that C is independent of X, one can show that
T C|X1 , and thus the -th conditional quantile function given X1 is estimated by
where F1n (y|X1 = x1 ) is a local KM estimator of FT |X1 . Finally, a feasible version of Jn (t; ) is
given by
1 X
Jn (t; ) Di {1(Yi QT |X1 ( |X1i )) }(Xi , t) (9)
n i
Notation Before proceeding, I introduce several notations that will be used throughout this
paper. Let (, A, P) be a probability space. For x Rd ,||x||E means the Euclidean norm of x
in Rd . Let l2 (W) be the space of functions that are square-integrable on a set W. Similarly,
dene l (W) as the space of functions that are uniformly bounded on a set W. For a generic
2 1/2
||g|| supwW |g(w)| are the L2 - and sup
R
function g on a set W , ||g||2 ( W
g dP ) and
R
norm, respectively. The expectation of g is denoted by Eg g(w)dFW (w), where FW () is the
distribution function of W. For a sequence of random maps Xn : R and a random variable X,
d
Xn X (Xn X , resp.) indicates that Xn converges weakly (in distribution, resp.) to X in the
sense of Denition 1.3.3 in van der Vaart and Wellner (1996). For any real numbers a and b, let
a b min(a, b) and a b max(a, b). For any real sequences (an ) and (bn ), an . bn means that
there is a constant C>0 such that |an | C |bn | for all n N. For a set A, denote the interior of
A by int(A).
8
3 Asymptotic Theory
In this section, I develop the asymptotic theory for test statistics n
KS and n.
CM Since the test
statistics are continuous functionals of the process Jn (; ), I rst establish the weak convergence
of Jn (; ). Then the asymptotic distribution of the test statistics can be obtained by using the
3.1 Assumptions
Let p0 (X) Pr(D = 1|X) and Ui Ti QT |X ( |Xi ). For a real number p, denote the largest
integer smaller than p by bpc. Let C p (X ) be the space of bpc-times continuously dierentiable real-
[]
valued functions on X . Denote the dierential operator by D and let D . Dene the
x1 1 ...xd d
Hlder norm for h C p (X ) as following :
|D h(x) D h(y)|
||h||p sup |D h(x)| + sup sup pbpc
< ,
[]bpc,xX []=bpc x,yX ,x6=y ||x y||E
||h||p < } is called a Hlder class. A Hlder ball with radius R > 0 is dened by pR (X ) {h
p (X ) : ||h||p R} for some R (0, ). Let G be a class of functions and || || be a norm on G .
Let G (g0 ) {g : G : ||g g0 || < , g0 G}. A function m on G is called pathwise dierentiable at
Assumption 1.
0 0 0
(i) The data {Wi (Yi , Di , Xi ) }ni=1 are i.i.d; (ii) (T, X ) C ; (iii) X and X1
are compact and convex subsets of Rdx and Rd1 , respectively; (iv) p0 (x) (0, 1] for all x X .
Assumption 2. There exists R > 0 such that QT |X ( |X = ) pR1 (X1 ), where p1 > 2 .
d1
Assumption 3. The conditional distribution function FU |X admits its density function fU |X (t|X =
x) satisfying the following condition: (i) fU |X (t|X = ) p2 (X ) for some p2 > 0, uniformly in t
in a neighborhood of t = 0; (ii) fU |X (0|X) is bounded away from zero uniformly in X X ; (iii)
The rst-order derivative with respect to t is bounded and continuous, uniformly on a neighborhood
of t = 0 and uniformly in X X .
Assumption 4. The marginal distribution functions of X and X1 have their own density functions
fX and fX1 with the following properties: (i) fX , fX1 p3 (X ) for some p3 > 0; (ii) fX () and
fX1 () are positive on X and X1 , respectively.
Assumption 5. (i) KF () is a symmetric probability density function on Rd1 with nite second
moments and bounded rst order derivative; (ii) the bandwidth associated with KF , hF n , satises
d + 4 (p +1)
the following conditions: hF n 0, log dn1 0, and nhF1n 3 2 0.
nhF n
Assumption 6. The class {(Xi , t) : t I} satises the following conditions: (i) The
weighting function (, ) : X I R is uniformly bounded and I is a compact subset of Rdx ;
9
0
(ii) (Xi , t) = w(Xi t) for some w() real analytic and non-polynomial and there exists a function
G () such that for any t1 , t2 X , |(X, t1 ) (X, t2 )| G (X)||t1 t2 ||E with E[G (Xi )2 ] < .
Assumption 1 contains conditions on the data generating process. Condition (i) imposes that
0
the data are i.i.d. Condition (ii) assumes that the conditional independence between (T, X ) and
C. This condition is satised when C is completely random, and guarantees the identication of
conditional quantile functions of T given X 5. This restriction is considered in, for example, Bang
and Tsiatis (2000) and Honore et al. (2002). It also implies that T C|X1 and thus one can use a
local KM estimator of FT |X1 to estimate the conditional quantile function QT |X1 . Condition (iii)
is a support condition and the last condition (condition (iv)) implies that not all observations are
censored ones. This condition is crucial for the equivalence between (3) and (4).
Assumptions 2 and 3 impose smoothness of the conditional quantile functions and the conditional
density functions, respectively. Note that Assumption 3 implies that FT |X1 (|x1 ) and FT |X (|x) are
Lipschitz continuous in y for all x1 X 1 and x X, respectively. Moreover, since FT |X1 (|x1 )
is
p
pathwise dierentiable for all x1 X1 under Assumption 3, one can show that for any q R (X1 )
and n 0,
sup |FT |X1 (q|X1 ) FT |X1 (q|X1 ) (q q)fT |X1 (q|X1 )| = O(||q q||2 ). (12)
xX1 ,qp
R (X1 ):||qq|| n
Assumption 4 considers the smoothness of marginal density functions of X1 and X and guarantees
that the sparsity function- 1/fX () - is well-dened on X and imposes smoothness of the marginal
density functions of X and X1 . Assumption 5 restricts the kernel function used to estimate the
can be considered and they call such functions GCR functions. Assumption 6 comes from Corollary
3.9 in Stinchcombe and White (1998) and one can choose the logistic distribution function, the
normal distribution or density function, or the exponential function for w(). Note that, since w()
is assumed to be analytical and the support of X and the index set I are assumed to be compact,
the rst- and the second- order derivatives of (x, t) with respect to x are uniformly bounded on
X I.
As an alternative class of weighting functions, one may choose I {1(Xi t) : t I} and this
class of functions is used in Volgushev et al. (2013). Even if the indicator function is not GCR, the
class of functions I has been used to construct test statistics (e.g. Delgado and Manteiga (2001);
Escanciano and Goh (2014); Volgushev et al. (2013)). However, there are several advantages of
using the class of weighting functions in Assumption 6 over using I (see, for example, Huang et al.
enough when using I . Lastly, it is worth noting that one can replace the condition that w()
is analytical with the one that w() is smooth enough in terms of that w() is just nitely-many
continuously dierentiable without any cost, but being analytical will be imposed throughout this
5 The
identication of FT |X is based on Lemma 2.1. To be specic, for any bounded and continuous functions g
and h, one can show that E[g(T )h(C)|X] = E[E[g(T )h(C)|X, T ]|X] = E[g(T )E[h(C)|X, T ]|X] = E[h(C)]E[g(T )|X] =
E[h(C)|X]E[g(T )|X]. Therefore, one obtains that T C|X , and identication is achieved by Lemma 2.1.
10
paper. Lastly, I take the distribution function of X and the support of X for the measure () in
I rst establish the weak convergence of the infeasible process Jn (; ). The next theorem establishes
that the empirical process Jn () converges weakly to a Gaussian limit under Assumptions 1 and 6:
Jn (; ) G() in l (I),
where G() is a Gaussian process with zero mean and covariance kernel
As mentioned before, the conditional quantile function needs to be estimated and estimation of
the function introduces a sampling error to be dealt with. To investigate the asymptotic behavior of
the feasible process Jn (t; ), let (X1i , t) E[(Xi , t)|X1i ] and consider a decomposition of Jn (t; )
as following:
1 X
Jn (t; ) = Di {1(Yi QT |X1 ( |X1i )) }(Xi , t)
n i
1 X
= Jn (t; ) + Di {1(Yi q1i ) 1(Yi q1i )}(Xi , t)
n i
where
1 X
np (t, q; ) Di {1(Yi qi ) FT |X1 (qi |X1i )}(X1i , t),
n i
1 X
Jsn (t; ) Di {FT |X1 (q1i |X1i ) FT |X1 (q1i |X1i )}(X1i , t).
n i
Let
min(Yj ,y)
1(Yj y) Dj fT |X1 (s|x)ds
Z
(Yj , Dj , y, x) [ ],
(1 FT |X1 (Yj |x))(1 FC|X1 (Yj |x)) 0 (1 FT |X1 (s|x))2 (1 FC|X1 (s|x))
then it can be shown that E[(Yj , Dj , y, x)] = 0 and V ar((Yj , Dj , y, x)) < for any y and x (cf.
Gonzalez-Manteiga and Cadarso-Suarez (1994)). To establish the weak convergence of
Jn (; ), I
employ a stochastic equicontinuity argument and the theory of U-processes. To be more specic,
a stochastic argument can be utilized in this way: if one can show that the process np (, ; ) is
p
stochastically equicontinuous, it can be derived that n (t, q1 ; ) np (t; q1 ; ) = op (1) uniformly in
tI by the stochastic equicontinuity since the estimated conditional quantile function converges
to QT |X1 ( |X1 = x1 ) uniformly in x1 under several conditions. Moreover, the smoothed term
Jsn (; ) can be handled as following: one can approximate FT |X1 (q1i |X1i ) up to the second-order
11
approximation with respect to q1i . Since it is possible to make the second-order term op (n1/2 )
under some conditions on the bandwidth, the process is asymptotically equivalent to the rst-order
approximation of FT |X1 (q1i |X1i ) FT |X1 (q1i |X1i ). Then I use the theory of U-processes to deal
with this rst-order term which contains q1i q1i . The next theorem shows that the feasible process
Then,
Jn (; ) G() in l (I),
where G() is a Gaussian process with zero mean and covariance kernel
Finally, one can derive the asymptotic distributions of the test statistics under the null hypot-
hesis by using Theorem 3.2 and the continuous mapping theorem. The asymptotic distributions
Corollary 3.3. Suppose that the conditions in Theorem 3.2 are satised. Then
n d
KS sup |G(t)|,
tI
Z
d
n G(t)2 d(t).
CM
Now I examine the power properties and show that the tests have non-trivial power against local
alternatives at the parametric rate. To investigate the power properties, consider local alternatives
as following:
1
QT |X ( |X) = QT |X1 ( |X1 ) + Q( |X) (14)
n
Assumption 7. The conditional quantile function QT |X under the local alternative in (14) belongs
to pRA (X ) for some R > 0 and pA > d2x , for all n.
The following theorem demonstrates that the tests can detect local alternatives at n-rate.
Theorem 3.4. Suppose that Assumptions 1 and 3 through 7 hold. Under the local alternative in
(14),
Jn (; ) G() Ra () in l (I),
where Ra (t) E[p0 (Xi )(Xi , t)f (QT |X1 ( |X1i )|Xi )Q( |Xi )].
12
Corollary 3.5. Suppose that the conditions in Theorem 3.4 are satised. Then, under the local
alternative in (14),
d
n
KS sup |G(t) Ra (t)|,
tI
Z
d
n
CM |G(t) Ra (t)|2 d(t).
I
4 Subsampling Approximation
Since the asymptotic distribution of the process Jn () depends on the data generating processes of
X1i and Xi , it is hard to calculate critical values for the test statistics n
KS and n6
CM and this
diculty has drawn attention to methods of obtaining asymptotically valid critical values. Bierens
and Ginther (2001) and Bierens and Ploberger (1997) propose a method to obtain critical values.
Their approach is to use upper bounds on the critical values. but these bounds do not deliver
asymptotically correct size. Another way to obtain critical values is to use a variant of resampling
methods, and some bootstrap methods have been developed in several studies (e.g. Delgado and
I employ a subsampling method to approximate the asymptotic distributions of the test statis-
tics. Subsampling is widely used to overcome diculty in obtaining critical values of statistics. In
Velasco (2010) and Whang (2006a) as a way to mimic asymptotic distributions of statistics. One
of main reasons for using a subsampling instead of bootstrap is that subsampling is much more
eective than bootstrap in terms of its applicability. Specically, Politis and Romano (1994) show
of given statistic. Another reason is that bootstrap may be problematic if a statistic is non-smooth
8
and in that case bootstrap needs to be carefully applied . Lastly, the subsampling method proposed
in this paper does not require to estimate the inuence function of Jn (t; ) and hence it is easy to
conduct. One can refer to Sant'Anna (2016) for the validity of the multiplier bootstrap in a similar
situation
9 .
The subsampling procedure employed in this paper is the same to the one in Whang (2006a).
following:
n z); FnCM (z) Pr(CM
FnKS (z) Pr(KS n z).
Let {Wi , Wi+1 , ..., Wi+b1 } be a subsample from the original sample {Wj : j = 1, 2, ..., n} of size
b, where i = 1, 2, ..., n b + 1. Let Jn,b,i (t, ) be the process Jn (t, ) that is computed by only
using the subsample {Wi , Wi+1 , ..., Wi+b1 } for i = 1, 2, ..., n b + 1. Then KS n,b,i and CM n,b,i
are dened by the same way of (11) but with Jn,b,i (t, ). To approximate the distributions FnKS ()
6 Bierens and Ploberger (1997) and Chen and Fan (1999) derive the asymptotic distribution of the Cramer-von-
Mises type statistic in parametric and nonparametric specication testings, respectively, and they show that the test
statistic is asymptotically equivalent to an innite sum of weighted 2 (1) random variables.
7 These conditions include the weak convergence of statistic and the conditions on the size of subsamples.
8 The process J (; ) contains an indicator function, and bootstrap may be challenging due to the non-smoothness
n
of the indicator function.
9 For the tests in the standard QR, one can refer to Volgushev et al. (2013) for bootstrap validity. They suggest
using a bootstrap to approximate the asymptotic distribution of their test statistic.
13
and FnCM (), consider the following objects:
nb+1 nb+1
KS 1 X
CM 1 X
n,b,i z).
Fn,b (z) 1(KS n,b,i z); Fn,b (z) 1(CM
nb+1 i nb+1 i
Let cKS CM KS CM
n () and cn () be the (1)-th quantiles of Fn () and Fn () under the null hypothesis.
In the same way, let cKS
n,b () and cCM
n,b () be the (1 )-th quantiles of
KS
Fn,b () and
CM
Fn,b (),
respectively. The following theorem demonstrates that the subsampling provides the asymptotically
Z
KS CM
F (z) Pr(sup |G(t)| z); F (z) Pr( |G(t)|2 d(t) z);
tI I
p p
cKS KS CM CM
n,b () c (); cn,b () c (),
and
n > cKS
Pr(KS n,b ()) ,
n > cCM ()) .
Pr(CM n,b
(ii) If conditions in Theorem 3.4 hold, then under the local alternative in (14), then
n > cKS
Pr(KS KS
n,b ()) Pr(sup |G(t) Ra (t)| > c ()),
tI
Z
Pr(CM n > c ()) Pr( |G(t) Ra (t)|2 d(t) > cCM
CM
()).
I
Remark 4.2. It is possible to calculate the test statistics over all nb subsamples to obtain Fn,b
KS
()
and Fn,b (), as shown in Chernozhukov and Fernndez-Val (2005), but this approach is computa-
CM
5 Conclusions
In this paper, I propose nonparametric tests for conditional quantile independence for a class of
models with duration outcomes. Duration outcomes are usually subject to censoring, therefore I
use a local KM estimator to estimate the conditional quantile function. Since conditional quantile
independence can be formalized as a conditional moment restriction, I adopt Bierens's ICM ap-
proach to construct test statistics. I show that the test statistics are continuous functionals of a
Gaussian process under suitable conditions and that the tests have non-trivial power against local
alternatives at the parametric rate even if the tests are nonparametric. Since the asymptotic dis-
tributions of test statistics depend on the data generating process, I provide a subsampling method
to obtain the critical values and establish the validity of the subsampling method.
14
There are several rooms for the extension of this paper. First, one can consider tests that are
uniform in the quantile index . The tests proposed in this paper are pointwise in the sense that they
focus on conditional quantile independence at a specic (0, 1). One of advantages of the uniform
tests is that they can partly answer the question related to arbitrariness of choice of a specic
quantile. Uniformity in quantile index is an important issue and many studies have considered
uniform inference in QR (Koenker and Bassett (1982), Koenker and Machado (1999), Koenker and
Xiao (2002), Su and White (2012), and so on). In a similar spirit of Manski (1988, p.733), one may
wonder why a variable is not signicant at a specic quantile but at others. In addition, the eects
of covariates tend to be continuous in the quantile index and similar within a range of quantile levels
(e.g. Koenker and Hallock (2001, p.150)) in many situations. Therefore, uniform tests would be
more preferable to pointwise tests in the sense that they can handle arbitrariness of choice of and
possess empirical contents. Moreover, the uniform tests are closely related to tests for conditional
independence. If one is interested in the uniform test of conditional quantile independence over
the quantile [0, 1], this test becomes a test for conditional independence and several studies
have proposed tests for conditional independence (e.g. Su and White (2008); Song (2009); Huang
et al. (2016)). While pointwise conditional quantile independence and conditional independence
are rather extreme hypotheses, uniform quantile independence can be regarded as an intermediate
Secondly, it is also expected that the tests in this paper can be extended to the case where one
is interested in conducting semiparametric or other nonparametric specication tests (e.g. tests for
additive separability and partially linear structure). Related to this extension, another potential di-
rection would be to consider testing with dierent estimation strategies for the conditional quantile
function. Recently, Belloni et al. (2016) propose a series-based estimation method for the condi-
tional quantile processes and establish the asymptotic theory, and Chao et al. (2016) also study
quantile processes with series estimation and provide conditions under which the series estimator
of a conditional quantile process converges weakly to a Gaussian process. The series estimation
is very useful to impose some structures on the model, which are mentioned above. Based on the
results in those recent studies, one could develop some test procedures with series estimation
11 .
Lastly, one can consider specication testings for conditional quantile functions with endogenous
censoring. This paper assumes that censoring is independent of the covariates and the latent
dependent variable, but this assumption is not likely to be satised in many empirical examples
and several studies consider models with endogenous censoring (e.g. Khan and Tamer (2009), Khan
et al. (2011), and Fan and Liu (2013)). Therefore, it would be worth extending the tests to the case
of endogenous censoring.
References
Andrews, D. W. (1994a). Asymptotics for semiparametric econometric models via stochastic equi-
15
Andrews, D. W. (1994b). Empirical process methods in econometrics. Handbook of Econometrics 4,
22472294.
Andrews, D. W. and Y.-J. Whang (1990). Additive interactive regression models: circumvention
Bang, H. and A. A. Tsiatis (2000). Estimating medical costs with censored data. Biometrika 87 (2),
329343.
Beran, R. (1981). Nonparametric regression with randomly censored survival data. Technical report,
rst-partnership formation among the 1958 british birth cohort. Journal of the Royal Statistical
Society: Series A (Statistics in Society) 163 (2), 127151.
Bierens, H. J. (1982). Consistent model specication tests. Journal of Econometrics 20 (1), 105134.
Bierens, H. J. (1990). A consistent conditional moment test of functional form. Econometrica 58 (6),
14431458.
Bierens, H. J. and D. K. Ginther (2001). Integrated conditional moment testing of quantile regres-
Bierens, H. J. and W. Ploberger (1997). Asymptotic theory of integrated conditional moment tests.
Bitler, M. P., J. B. Gelbach, and H. W. Hoynes (2006). What mean impacts miss: Distributional
Buchinsky, M. (1994). Changes in the us wage structure 1963-1987: Application of quantile regres-
Buchinsky, M. and J. Hahn (1998). An alternative estimator for the censored quantile regression
Chao, S.-K., S. Volgushev, and G. Cheng (2016). Quantile processes for semi and nonparametric
Chaudhuri, P. (1991). Nonparametric estimates of regression quantiles and their local Bahadur
Chen, X. and Y. Fan (1999). Consistent hypothesis testing in semiparametric and nonparametric
16
Chen, X., O. Linton, and I. Van Keilegom (2003). Estimation of semiparametric models when the
cesses. Sankhy
a: The Indian Journal of Statistics 67 (2), 253276.
Chernozhukov, V. and H. Hong (2002). Three-step censored quantile regression and extramarital
Escanciano, J. C. and S.-C. Goh (2014). Specication analysis of linear quantile models. Journal
of Econometrics 178 (3), 495507.
sums of non and semiparametric residuals for estimation and testing. Journal of Econome-
trics 178 (3), 426443.
Fan, J. and I. Gijbels (1996).Local polynomial modelling and its applications, Volume 66 of Mono-
graphs on statistics and applied probability. CRC Press.
Fan, Y. and Q. Li (1996). Consistent model specication tests: omitted variables and semiparame-
Fan, Y. and R. Liu (2013). Partial identication and inference in censored quantile regression: A
kaplan-meier estimator with some applications. Journal of Nonparametric Statistics 4 (1), 6578.
Guerre, E. and C. Sabbah (2012). Uniform bias study and Bahadur representation for local poly-
nomial estimators of the conditional quantile function. Econometric Theory 28 (1), 87129.
Han, A. and J. A. Hausman (1990). Flexible parametric estimation of duration and competing risk
17
Hardle, W. and E. Mammen (1993). Comparing nonparametric versus parametric regression ts.
He, X. and L.-X. Zhu (2003). A lack-of-t test for quantile regression. Journal of the American
Statistical Association 98 (464), 10131022.
Heckman, J. and B. Singer (1984). A method for minimizing the impact of distributional assump-
Hong, Y. and H. White (1995). Consistent specication testing via nonparametric series regression.
Honore, B., S. Khan, and J. L. Powell (2002). Quantile regression under random censoring. Journal
of Econometrics 109 (1), 67105.
Horowitz, J. L. and V. G. Spokoiny (2002). An adaptive, rate-optimal test of linearity for median
Huang, M., Y. Sun, and H. White (2016). A exible nonparametric test for conditional indepen-
Kaplan, E. L. and P. Meier (1958). Nonparametric estimation from incomplete observations. Journal
of the American Statistical Association 53 (282), 457481.
Katz, L. F. and B. D. Meyer (1990). Unemployment insurance, recall expectations, and unemploy-
Khan, S., M. Ponomareva, and E. Tamer (2011). Sharpness in randomly censored linear models.
Khan, S. and E. Tamer (2009). Inference on endogenously censored regression models using condi-
Koenker, R. and G. Bassett (1982). Tests of linear hypotheses and l" 1 estimation. Econome-
trica 50 (6), 15771583.
Koenker, R. and K. Hallock (2001). Quantile regression. Journal of Economic Perspectives 15 (4),
143156.
Koenker, R. and J. A. Machado (1999). Goodness of t and related inference processes for quantile
Koenker, R. and Z. Xiao (2002). Inference on the quantile regression process. Econometrica 70 (4),
15831612.
18
Kong, E., O. Linton, and Y. Xia (2010). Uniform Bahadur representation for local polynomial
estimates of m-regression and its application to the additive model. Econometric Theory 26 (5),
15291564.
Kong, E., O. Linton, and Y. Xia (2013). Global bahadur representation for nonparametric censored
Kong, E. and Y. Xia (2017). Uniform bahadur representation for nonparametric censored quantile
New York.
Koul, H. L. and W. Stute (1999). Nonparametric model checks for time series. The Annals of
Statistics 27 (1), 204236.
Lavergne, P., S. Maistre, V. Patilea, et al. (2015). A signicance test for covariates in nonparametric
Lavergne, P. and V. Patilea (2008). Breaking the curse of dimensionality in nonparametric testing.
Lavergne, P. and Q. Vuong (2000). Nonparametric signicance testing. Econometric Theory 16 (4),
576601.
Lee, S. (2003). Ecient semiparametric estimation of a partially linear quantile regression model.
Lee, S., K. Song, and Y.-J. Whang (2015). Uniform asymptotics for nonparametric quantile regres-
Li, Q., C. Hsiao, and J. Zinn (2003). Consistent specication tests for semiparame-
tric/nonparametric models based on series estimation methods. Journal of Econometrics 112 (2),
295325.
Manski, C. F. (1988). Identication of binary response models. Journal of the American Statistical
Association 83 (403), 729738.
Ossiander, M. (1987). A central limit theorem under metric entropy with L2 bracketing. The Annals
of Probability 15 (3), 897919.
Politis, D. N. and J. P. Romano (1994). Large sample condence regions based on subsamples
Portnoy, S. (2003). Censored regression quantiles. Journal of the American Statistical Associa-
tion 98 (464), 10011012.
19
Powell, J. L., J. H. Stock, and T. M. Stoker (1989). Semiparametric estimation of index coecients.
Qu, Z. and J. Yoon (2015). Nonparametric estimation and inference on conditional quantile pro-
954.
Sant'Anna, P. H. (2016). Nonparametric tests for treatment eect heterogeneity with duration
Sering, R. J. (1980). Approximation theorems of mathematical statistics. John Wiley & Sons.
Song, K. (2009). Testing conditional independence via Rosenblatt transforms. The Annals of
Statistics 37 (6B), 40114045.
Stinchcombe, M. B. and H. White (1998). Consistent specication testing with nuisance parameters
Stute, W. (1997). Nonparametric model checks for regression. The Annals of Statistics 25 (2),
613641.
Su, L. and H. White (2008). A nonparametric hellinger metric test for conditional independence.
Su, L. and H. L. White (2012). Conditional independence specication testing for dependent
processes with local polynomial quantile regression. In Essays in Honor of Jerry Hausman, pp.
van der Vaart, A. and J. Wellner (1996). Weak convergence and empirical processes: with applica-
tions to statistics. Springer, New York.
Van der Vaart, A. W. (1998). Asymptotic statistics, Volume 3. Cambridge university press.
Volgushev, S., M. Birke, H. Dette, N. Neumeyer, et al. (2013). Signicance testing in quantile
Wang, H. J. and L. Wang (2009). Locally weighted censored quantile regression. Journal of the
American Statistical Association 104 (487), 11171128.
Whang, Y.-J. (2006a). Consistent specication testing for quantile regression models. Econometric
Theory and Practice: Frontiers of Analysis and Applied Research , 288308.
Whang, Y.-J. (2006b). Smoothed empirical likelihood methods for quantile regression models.
Zheng, J. X. (1998). A consistent nonparametric test of parametric regression models under condi-
20
6 Proofs
6.1 Proof of Lemma 2.1
Proof. Dene
and hence dH1 (t|X) = (1 G(t|X))dFT |X (t|X). Observing that the event {Y t} is equivalent to
t t
(1 FC|X (s|X))dFT |X (s|X)
Z Z
dH1 (s|X)
=
0 H(s|X) 0 (1 FT |X (s|X))(1 FC|X (s|X))
t
dFT |X (s|X)
Z
=
0 (1 FT |X (s|X))
= ln(1 FT |X (t|X)). (16)
Since H1 and H are identied from data, Equation (16) implies that FT |X (t|X) is identied. For
Rt
identication of FC|X , one can show that H0 (t|X) = 0
(1 FT |X (s|X)dFC|X (s|X) by a similar
Z t
dH0 (s|X)
= ln(1 FC|X (t|X))
0 H(s|X)
21
6.2 Proof of Lemma 2.2
Proof. Suppose that QT |X ( |Xi ) = QT |X1 ( |X1i ) almost surely. Note that
E[Di {1(Yi QT |X1 ( |X1i )) }|Xi ] = E[1(Yi QT |X1 ( |X1i )) |Di = 1, Xi ] Pr(Di = 1|Xi )
= E[1(Ti QT |X1 ( |X1i )) |Xi ] Pr(Di = 1|Xi )
= 0 p0 (Xi ) = 0
under the null hypothesis. Conversely, suppose that the null hypothesis in (4) holds. Then one can
show that
Since the conditional quantile is assumed to be unique, QT |X1 ( |X1i )QT |X ( |Xi )+QT |X ( |Xi ) =
QT |X ( |Xi ) and this leads to Equation (3).
Proof. Dene gi (t; ) Di {1(Yi QT |X1 ( |X1i )) }(Xi , t) and G {gi (t; ) : t I}. For any
|gi (t1 ) gi (t2 )| |Di {1(Yi QT |X1 ( |X1i )) }| G (Xi )||t1 t2 ||E .
Since E[{1(Yi QT |X1 ( |X1i )) }Di G (Xi )]2 < , it follows that G is a type IV class dened in
2
Andrews (1994b) and thus G satises Ossiander's L entropy condition by Theorem 5 in Andrews
(1994b). Applying Theorem 3.1 in Ossiander (1987) yields that F is Donsker and thus Jn () G()
in l (I).
To handle the term np (t, q1 ; ) np (t; q1 ; ) in (13), I prove that np (, ; ) is stochastically equi-
continuous. The following lemma is used to verify one of conditions of Theorem 2.11.9 in van der
22
any > 0. Let C E[G (Xi )2 ], where G () > 0.
is dened in Assumption 6, and
C
dx
Since I is assumed to be compact in R , it is compact and thus totally bounded. That is, there
N ()
exists a set {ti X : i = 1, 2, ..., N ()} such that X i=1 B (ti ). It suces to show that
N ()
i=1 B ((X, ti )). Let h(X) be an arbitrary element of (i.e. h(X) = (X, t) for some
t I ). Since I is totally bounded, there exists ti0 {ti X : i = 1, 2, ..., N ()} such that
N ()
and thus that i=1 B ((X, ti )). Since N () < and was arbitrary, (, || ||2 ) is totally
bounded. Since it is assumed that (F, || ||F ) is totally bounded, the product of and F is totally
pR1 (X1 ).
Proof. Dene
1
Zni (, q) Di (X1i , t){1(Yi qi ) FT |X1 (qi |X1i )}
n
and let F {Zni (, q) : (, q) pR1 (X1 )}. Noting that q is a function of only X1 , it is
clear that EZni (, q) = 0 for any (, q) . I verify the conditions of Theorem 2.11.9 in van der
C
E[||Zni ||F 1(||Zni ||F > )] Pr(||Zni ||F > )
n
C E||Di {1(Yi qi ) FT |X1 (qi |X1i )}(X1i , t)||F
n n 2
1
. = o(n). (17)
n n
P
This implies that i E[||Zni ||F 1(||Zni ||F > )] = o(1) and thus that the rst condition of the
theorem is met.
Take any (, q) pR1 (X1 ) and > 0. Note that for any = (, q) such that (, ) ,
23
Using the argument of Chen et al. (2003, p.1600), it can be shown that
1 1
sup E[|1(Ti qi ) 1(Ti qi )|] E[1(Ti qi + ) 1(Ti qi )]
||qq|| n n
1
E[FT |X1 (qi + |X1i ) FT |X1 (qi |X1i )]
n
1
sup |fT |X1 (t|x1 )| 2
n tR,x1 X1
1
. (19)
n
1 1
sup E[(FT |X1 (qi |X1i ) FT |X1 (qi |X1i ))2 ] . 2 . (20)
||qq|| n n
for any positive sequence n such that n 0, and thus the second condition of the theorem is also
satised.
Lastly, I calculate the bracketing L2 -entropy of = pR1 (X1 ). Take any > 0. Since
2
N (, , ||||2 ) N[] (, , ||||2 ), it suces to calculate the L bracketing number. By the denition
pR1 (X1 ), || ||2 ) N[] ( , , || ||2 ) N[] ( , pR1 (X1 ), || ||2 ).
N[] (, (23)
2 2
Since the weighting function (x, t) is Lipschitz in t as shown in (21), it follows that N[] ( 2 , , ||||2 )
N ( 4||G ||2 , I, || ||E ) by Theorem 2.7.11 in van der Vaart and Wellner (1996). Since I is assumed
dx dx
to be compact in R , the covering number is bounded by C for some constant C > 0. By
2
Corollary 2.7.2 in van der Vaart and Wellner (1996) with Assumption 2, the L -bracketing number,
d1
log N[] (, , || ||2 ) log N[] ( , , || ||2 ) + log N[] ( , pR1 (X1 ), || || ) . p1 .
2 2
24
Therefore, for any positive sequence {n } such that n 0,
Z n d
1 2p1
q
log N[] (, , || ||2 )d . n 1
= o(1) (24)
0
by Assumption 2.
Note that a Hlder ball is compact under the sup-norm by the embedding theorem (e.g. The-
orems 1 and 2 in Freyberger and Masten (2015)), and thus (pR1 (X1 ), || || ) is totally bounded.
Thus, (, ) is a totally bounded pseudo-metric space by Lemma 6.1 and equations (17), (22), and
(24) together establish that all conditions of Theorem 2.11.9 in van der Vaart and Wellner (1996)
are satised. In all, np (, ; ) is asymptotically tight. Since E[n Zni ()2 ] < for any , all
by the multivariate central limit theorem. Since np (, ; ) is asymptotically tight and all of its
(1994b, p.2251)).
Let q1i QT |X1 ( |X1i ) and q1i QT |X1 ( |X1i ) and recall the smoothed term
1 X
Jsn (t; G) = Di {FT |X1 (q1i |X1i ) FT |X1 (q1i |X1i )}(X1i , t)
n i
1 X
= Di fT |X1 (q1i |X1i )(q1i q1i )(X1i , t) + Op (||q1 q1 ||2 ).
n i
By the construction of the conditional quantile function, the leading term can be rewritten as
1 X 1 X 1
Di fT |X1 (q1i |X1i )(q1i q1i )(X1i , t) = Di fT |X1 (q1i |X1i ){F1n ( |X1i ) F11 ( |X1i )}(X1i , t).
n i n i
by, for example, Kong and Xia (2017); Van der Vaart (1998). Since ||F1n FT |X1 || = op (1) and
||F (|) FT |X1 (|)||2 = O( logdn1 ) = o(n1/2 ) by Lemma 1 in Kong and Xia (2017), it can be shown
nhF n
that for large enough n,
1 X 1
Di fT |X1 (q1i |X1i ){F1n ( |X1i ) FT1
|X1 ( |X1i )}(X1i , t)
n i
1 X log n
= Di (X1i , t){F1n (q1i |X1i ) FT |X1 (q1i |X1i )} + nOp ( d1 ), (25)
n i nhF n
where the equality comes from Lemma 1 in Kong and Xia (2017). Note that under the conditions
on the bandwidth hF n , the remainder term is op (1). Therefore, one needs to investigate the leading
12 One can refer to, for example, Kosorok (2008); Van der Vaart (1998); van der Vaart and Wellner (1996) for the
denition of Hadamard dierentiability.
25
term in (25) and the following lemma establishes that the leading term admits an asymptotic linear
representation.
1 X 1 X
Di (X1i , t){F1n (q1i |X1i )FT |X1 (q1i |X1i )} = (1 )(X1i , t)(Yi , Di , q1i , X1i )+op (1),
n i n i
where
min(Yj ,y)
1(Yj y) Dj
Z
f1 (s|x)ds
(Yj , Dj , y, x) [ ],
(1 FT |X1 (Yj |x))(1 G(Yj |x)) 0 (1 FT |X1 (s|x))2 (1 G(s|x))
(X1i , t) E[(X1i , t)p0 (Xi )|X1i ],
uniformly in t I .
Proof. Using the asymptotic representation given in Lemma 1 in Kong and Xia (2017), it is straig-
1 X log n
F1n (y|x)FT |X1 (y|x) = d1
(1F T |X1
(y|x)) Bhn (X1j ; x)(Yj , Dj , y, x)+O(( d1 )3/4 ), (26)
nhF n j
nhF n
where
0 X1j x X1j x
BhF n (X1j ; x) = e1 1
1 ( )KF ( )/fX1 (x).
hF n hF n
1 X
Di (X1i , t){F1n (y|x) FT |X1 (y|x)}
n i
(1 FT |X1 (y|x)) X X 1 log n
= Di (X1i , t) d1 BhF n (X1j ; x)(Yj , Dj , y, x) + nOp (( d1 )3/4 ),
n n i j
hF n nhF n
where the remainder term is op (1) under the conditions on the bandwidth hF n . To analyze the
leading term in the above equation, I utilize the theory of U-processes. Let
1
pF n (Wi , Wj ; y, t) Di (X1i , t) BhF n (X1j ; X1i )(Yj , Dj , y, X1i ).
hdF1n
To make the U-statistic kernel pF n (, ; t) symmetric, dene pF n (Wi , Wj ; y, t) 21 {pF n (Wi , Wj ; y, t)+
pF n (Wj , Wi ; y, t)}. Then one obtains that
1 XX 1
Di (X1i , t) d1 Bhn (X1j ; X1i )(Yj , Dj , y, X1i )
n n i j hF n
1 1 X
pF n (Wi , Wj ; y, t) +
X X
= p (Wi , Wi ; y, t)
n n i n n i Fn
j6=i
n 1 n 1 X X
1 X
= n pF n (Wi , Wj ; y, t) + p (Wi , Wi ; y, t).
n 2 i j>i
n n i Fn
26
1
pF n (Wi , Wi ; y, t) = op (1).
P
By a similar reasoning of Kong et al. (2013, p.966), On the other
n n i
hand, the leading term can be analyzed by using the theory of U-processes. To keep the simplicity
of notations, let pF n (Wi , Wj ; t) pF n (Wi , Wj ; q1i , t) and pF n (Wi , Wj ; t) pF n (Wi , Wj ; q1i , t).
1 P P
Let Un (y, t) n2 i j>i pF n (Wi , Wj ; y, t) and consequently denote Un (q1i , t) by Un (t). I
and rF n (Wi ; t), F n (t), and pF n (Wi , Wj ; t) are dened by the same way above. Then, applying
1 X X
2X n
Un (y, t) = F n (y, t) + rF n (Wi ; y, t) + pF n (Wi , Wj ; y, t)
n i 2 i j>i
2X
F n (y, t) + rF n (Wi ; y, t) + RF n (y, t)
n i
First, I show that RF n (t) = op (1) where RF n (t) RF n (q1i , t). It is obvious that ERF n (t) = 0 for
any t I. To calculate the variance of RF n (t) for given t I, I refer to Powell et al. (1989). Note
that
2 X X 2
n 2 n
V ar(RF n (t)) EpF n (Wi , Wj ; t) = O(n2 )EpF n (Wi , Wj ; t)2 .
2 i j>i
2
where the inequality on the third line holds by a property of U-statistic, given by Sering (1980,
p.182). Thus,
2
n
V ar( nRf n (t)) n O(n2 ) EpF n (Wi , Wj ; t)2 = O(n1 )EpF n (Wi , Wj ; t)2
2
and it will suce to show that EpF n (Wi , Wj ; t)2 = o(n) to prove that RF n (t) = op (n1/2 ) for given
t I. Note that
1
EpF n (Wi , Wj ; t)2 = E[pF n (Wi , Wj ; t) + pF n (Wj , Wi ; t)]2 EpF n (Wi , Wj ; t)2
4
27
and that
1
EpF n (Wi , Wj ; t)2 = E[Di (X1i , t) BhF n (X1j ; X1i )(Yj , Dj , q1i , X1i )]2
hdF1n
1 1 0 X1j X1i 1
. E[ {e1 1
1 (X1j X1i ; hF n )KF ( ) (Yj , Dj , q1i , X1i )}2 ]
hdF1n hdF1n hF n fX1 (X1i )
1 1 X1j X1i 1
E[ ||e1 ||2 ||1 2 2
1 (X1j X1i ; hF n )|| KF ( ){ (Yj , Dj , q1i , X1i )}2 ]
hdF1n hdF1n hF n fX1 (X1i )
1 1 X1j X1i
. E[ KF ( )(Yj , Dj , q1i , X1i )2 ]
hdF1n hdF1n hF n
1 1 X1j X1i
= d1 E[ d1 KF ( )E[(Yj , Dj , q1i , X1i )2 |X1i , X1j ]]
hF n hF n h Fn
O(hd
Fn )
1
since E[(Yj , Dj , q1i , X1i )2 ] < . Hence, EpF n (Wi , Wj ; t)2 = o(n) as nhdF1n and this implies
that V ar( nRF n (t)) = o(1).
To obtain the uniform convergence, I adopt the approach used in the proof of Lemma 3 in Huang
0
et al. (2016). Recall that the weighting function is of the form of (Xi , t) = w(Xi t) where w()
is analytical, and hence is Vapnik-ervonenkis (VC)-type. Let P {p(Wi , Wj ; t) : t I}, then
1
P = (t) {K(Wi , Wj ) + K(Wj , Wi )}, where K(Wi , Wj ) d Di BhF n (X1j ; X1i )(Yj , Dj , q1i , X1i ).
hF1n
Since K is a bounded function, P is again VC-type with a square-integrable envelope by Lemma
2.6.18 in van der Vaart and Wellner (1996). Note that, by the standard arguments in the local
2 d1
polynomial regression, EK (Wi , Wj ) = O(hF n ) under the conditions imposed in this lemma and
thus one obtains
n(n 1)
E sup | Rn (t)|2 . EK2 (Wi , Wj ) = O(hd Fn )
1
tI n
2 d1 1
by Proposition 4 in Delgado and Manteiga (2001). Thus, E suptI | nRn (t)| = O((nhF n ) )=
o(1) and this leads to that suptI | nRn (t)| = op (1).
2
P
Now, I consider the projected term F n (t) + i rF n (Wi ; t). Recall that
n
where (X1j , t) = E[p0 (Xj )(X1j , t)|X1j ]. Since (Yi , Di , y, X1j ) is continuously dierentiable with
28
respect to X1j , applying change of variables and the standard arguments of the local polynomial
1
2rF n (Wi ; y, t) = E[(X1j , t) Bhn (X1j ; X1i )(Yi , Di , y, X1j )|Wi ]
hdF1n
1 0 1 x1 X1i x1 X1i
Z
= (x1 , t)(Yi , Di , y, x1 ) e (
d1 1 1
)KF ( )dx1
hF n hF n hF n
= (X1i , t)(Yi , Di , y, X1i ) + op (n1/2 ), (27)
where op (n1/2 ) holds uniformly in y and t. This also implies that F n (t) = op (n1/2 ) uniformly
in t. In all, we have
1 X S 1 X
Di (Xi , t){F1n (q1i |X1i ) F1 (q1i |X1i )} = (1 )(X1i , t)(Yi , Di , q1i , X1i ) + op (1)
n i n i
By Lemma 6.2 and the fact that ||QT |X1 ( |) QT |X1 ( |)|| = op (1), one has that np (t, q1 ; )
np (t; q1 ; ) = op (1). On the other hand, (25) and Lemma 6.3 together imply that
1 X
Jsn (t; ) = (1 )(X1i , t)(Yi , Di , q1i , X1i ) + op (1).
n i
Therefore,
1 X
Jn (t; ) = [(Xi , t)Di {1(Yi QT |X1 ( |X1i )) }(1 )(X1i , t)(Yi , Di , q1i , X1i )]+op (1).
n i
To nalize the proof, one needs to show that the class of functions
is Donsker. Let m(Wi , t) be a generic element of M. Since is square-integrable and the weighting
function, one can easily show that under Assumption 6 and for any t1 , t2 I ,
for some square-integrable function B(). Thus, the class of functions M is a type-IV class and
thus it satises Ossiander's L2 -entropy condition. In all, M is Donsker by Theorem 3.1 in Os-
siander (1987), and thus the process Jn (; ) converges weakly to a tight Gaussian process G()
in l (I), where G() is a Gaussian process with zero mean and covariance kernel (t1 , t2 ) =
E[m(Wi , t1 )m(Wi , t2 )].
29
6.5 Proof of Corollary 3.3
Proof. Since the test statistics are continuous functionals of the process Jn (; ), it can be proven
by applying the continuous mapping theorem (e.g. Theorem 1.11.1 in van der Vaart and Wellner
(1996)).
To prove Theorem 3.4, I rst examine the asymptotic behavior of the feasible process Jn (; ) under
the local alternative specication. Under the local alternative (14), observe that the infeasible
1 X
Jn (t; ) = (Xi , t)Di {1(Yi QT |X1 ( |X1i )) }
n i
1 X
= (Xi , t)Di {1(Yi QT |X1 ( |X1i )) 1(Yi QT |X ( |Xi )) + 1(Yi QT |X ( |Xi )) }
n i
1 X
= (Xi , t)Di {1(Yi QT |X ( |Xi )) } + nA (t; QT |X1 , ) nA (t; QT |X , )
n i
1 X
+ (Xi , t)Di {F (QT |X1 ( |X1i )|Xi ) F (QT |X ( |X)|Xi )}
n i
1 X 1X
= (Xi , t)Di {1(Yi QT |X ( |Xi )) } (Xi , t)Di f (QT |X1 ( |X1i )|Xi )Q( |Xi )
n i n i
nA (t; q, ) 1
P
where
n i (Xi , t)Di {1(Yi qi )FT |X (qi |Xi )} is a stochastic process indexed by t
and q. If one can show that show that nA (; , ) is stochastically equicontinuous with respect to an
appropriately chosen norm, then it implies that nA (t; QT |X1 , ) nA (t; QT |X , ) = op (1) uniformly
in t I. One may think that the Bracketing central limit theorem (Theorem 2.11.9 in van der
Vaart and Wellner (1996)) can be applied to prove the stochastic equicontinuity of the process
nA (; , ) in a similar way of Lemma 6.2, but it is needed to nd another way as the conditional
quantile function under the local alternative depends on n. Therefore, I use Theorem 2.11.23 in
van der Vaart and Wellner (1996), which generalizes Theorem 2.11.9 in van der Vaart and Wellner
(1996).
Lemma 6.4. Suppose that Assumptions 1, 3, 6, and 7 hold. Then under the local alternative given
in (14),
Jn (; ) G() Ra () in l (I),
Ra (t) E[p0 (Xi )(Xi , t)f (QT |X1 ( |X1i )|Xi )Q( |Xi )].
Proof. Dene I pRA (X ) and let n (t, qn ) n , where n = for all n. I rst show that
nA (; , ) is stochastically equicontinuous with respect to the norm (1 , 2 ) ||t1 t2 ||E +||q1 q2 ||
by verifying the conditions of Theorem 2.11.23 in van der Vaart and Wellner (1996). Note that
the space is totally bounded with respect to the semi-norm as before. Consider the class of
30
functions
pA
GnA {(X, t)D{1(Y qn ) FT |X (qn |X)} : t T , qn R (X ) for all n},
which is indexed by t and qn . Since one can choose the sequence of envelope functions for each n as
a constant function, say Cg , one can show that ECg = O(1) and that E[Cg2 1(Cg > / n)] = o(1)
for any > 0. Thus the rst two conditions of (2.11.21) in van der Vaart and Wellner (1996)
obviously hold. For the last condition of (2.11.21) in van der Vaart and Wellner (1996), take any
n 0, then
sup E[Di {1(Yi q1ni ) F (q1ni |Xi )(Xi , t1 )} Di {1(Yi q2ni ) F (q2ni |Xi )(Xi , t2 )}]2
(1n ,2n )n
sup E[{1(Ti q1ni ) F (q1ni |Xi )(Xi , t1 )} {1(Ti q2ni ) F (q2ni |Xi )(Xi , t2 )}]2
(1 ,2 )n
. sup E[(Xi , t1 ) (Xi , t2 )]2 + sup E[1(Ti q1ni ) 1(Ti q2ni )]2 + sup ||q1 q2 ||2
(1 ,2 )n (1 ,2 )n (1 ,2 )n
=o(1)
by the same logic in the proof of Lemma 6.2. Thus, all conditions of (2.11.21) in van der Vaart
and Wellner (1996) are met. Lastly, it is required to calculate the L2 -bracketing number of GnA .
A A A pA
Observe that Gn = Gn , where Gn {D{1(Y qn ) FT |X (qn |X)} : qn (X )}. Since both
R
A
spaces and Gn are uniformly bounded, Lemma A.1 in Escanciano et al. (2014) implies that
N[] (Cg , GnA , || ||2 ) N[] (C, , || ||2 ) N[] (C, GnA , || ||2 )
E[Di {1(Yi q1n ) 1(Yi q2n )}]2 = E[p0 (Xi ) |1(Ti q1n ) 1(Ti q2n )|2 ]
E[1(|Ti q1n | 2||q1n q2n || )]
. ||q1n q2n || ,
Theorems 2.7.11 and 2.7.1 in van der Vaart and Wellner (1996) together yield that
A pA pdx
log N[] (C, G1n , || ||2 ) log N (C, R (X ), || ||) A .
A
On the other hand, the class of functions G2n is also Lipschitz in the index qn under Assumption
pdx
A
log N[] (C, G2n , || ||2 ) log N (C, pRA (X ), || ||) A .
31
Therefore, it follows that
pdx
log N[] (Cg , GnA , || ||2 ) . A
. Since ||QT |X ( |) QT |X1 ( |)|| = 1 supxX |Q( |x)| = o(1) and nA (; , ) is stochastically
n
A
equicontinuous, it follows that n (t; QT |X1 , ) nA (t; QT |X , ) = op (1) uniformly in t I.
1
P
Next, one can show that the leading term
n i (Xi , t)Di {1(Yi QT |X ( |Xi )) } converges
weakly to the Gaussian process G() dened in Theorem 3.1 by verifying conditions of Theorem
2.11.23 in van der Vaart and Wellner (1996) and the verication can be accomplished by the same
way above. Note that in this case the only indexing variable is t and thus the condition on the
1X
E (Xi , t)Di f (QT |X1 ( |X1i )|Xi )Q( |Xi ) = E[p0 (Xi )(Xi , t)f (QT |X1 ( |X1i )|Xi )Q( |Xi )]
n i
= Ra (t)
and that the class of functions {(Xi , t)Di f (QT |X1 ( |X1i )|Xi )Q( |Xi ) : t I} is Donsker, so it is
Glivenko-Cantelli. In all,
1X
sup | (Xi , t)Di f (QT |X1 ( |X1i )|Xi )Q( |Xi ) Ra (t)| = op (1).
tI n i
Jn (; ) G() Ra () in l (I)
which is the same to (13). By Lemma 6.2, np (t, q1 ; ) np (t; q1 ; ) = op (1) uniformly in t. Likewise,
the asymptotic behavior of the smoothed term Jsn (t; ) under the local alternative remains the
same to the one under the null hypothesis. From the proof of Lemma 6.4, one obtains that
1 X
Jn (t; ) = [(Xi , t)Di {1(Yi QT |X ( |Xi )) } (X1i , t)(Yi , Di , q1i , X1i )]
n i
1X
(Xi , t)Di f (QT |X1 ( |X1i )|Xi )Q( |Xi ) + op (1).
n i
32
Since the leading term converges weakly to the Gaussian process G() in l (I) and latter term
Jn (; ) G() Ra (t)
Proof. I follow the proof of Theorem 2 in Whang (2006a). To prove (i), recall that
Z
n,b,i = sup |Jn,b,i (t; )|; CM
KS n,b,i = |Jn,b,i (t; )|2 d(t).
tI I
Dene
n,b,i z); FbCM (z) Pr(CM
FbKS (z) Pr(KS n,b,i z).
Since n
KS and n
CM are functionals of a Gaussian process with a nonsingular covariance kernel,
KS p p
Fn,b (z) FbKS (z) 0; Fn,b
CM
(z) FbCM (z) 0
as the proof of Theorem 2 in Whang (2006a). From now, I only consider the case of the CM statistic
Note that
CM
EFn,b (z) = FbCM (z). (29)
Let n,b,i z)
Ii 1(CM for i = 1, 2, ..., n b + 1. Then one can show that
nb+1
CM 1 X
V ar(Fn,b, (z)) = V ar( Ii )
n b + 1 i=1
nb+1 nb+1
1 2
X
CM
X
=( ) E[{ (Ii Fb (z))}{ (Ij FbCM (z))}]
nb+1 i=1 j=1
nb+1 nb+1
1 X X X
=( )2 [ V ar(Ii ) + Cov(Ii , Ij )]
nb+1 i i j6=i,|ij|b
nb+1
1 1 X X p q
O( ) + ( )2 V ar(Ii ) V ar(Ij )
n nb+1 i j6=i,|ij|b
1 b 1 b
= O( ) + O( ) = O( ) + O( ) = o(1). (30)
n nb+1 n n
CM p
Combining (29) and (30) yields that Fn,b (z) FbCM (z) 0 for all z R.
To prove (ii), one can refer to the proof of Corollary 5 in Whang (2006a) and the proof of (i)
above.
33