Journal of Applied Econometrics Volume 1 Issue 2 1986 (Doi 10.1002-Jae.3950010203) John Geweke - Exact Inference in The Inequality Constrained Normal Linear Regression Model

JOURNAL OF APPLIED ECONOMETRICS, VOL.
1, 127-141 (1986)
EXACT INFERENCE IN THE INEQUALITY CONSTRAINED NORMAL LINEAR REGRESSION MODEL

JOHN GEWEKE
Department of Economics, Duke University, Durham, NC 27706, V.S.A
SUMMARY lnference in the inequality constrained normal linear regression model is approached as a problem in Bayesian inference, using a prior that is the product of a conventional uninformative distribution and an indicator function representing the inequality constraints. The posterior distribution is calculated using Monte Carlo numerical integration, which leads directly to the evaluation of expected values of functions of interest. This approach is compared with others that have been proposed. Three empirical examples illustrate the utility of the proposed methods using an inexpensive 32-bit microcomputer.
1. INTRODUCTION
Inference in the normal linear regression model subject to inequality constraints is one of the most common tasks in applied econometrics. It is usually undertaken because the economic model restricts the signs of marginal effects of explanatory variables but not their absolute values, and because the normal linear regression model is a very convenient statistical tool. Procedures differ from one investigator to another but are invariably informal. In reporting results it is common to compare the signs of coefficient estimates with those implied by the substantive model, with allowance for the size of standard errors and the plausibility of the point estimates. The reported statistics themselves are typically the termination of a process of experimentation with different variables. This experimentation may in fact be an analysis of sensitivity, and the outcome interpreted as the construction of a posterior given a prior that all coefficients are small as suggested by Chamberlain and Leamer (1976) and Leamer and Chamberlain (1976). More often, we suspect, it is an effort to achieve consistency between the actual signs of estimated coefficients and the anticipated signs. This is suggested by the tendency t o report only the best regression, the sensitivity of signs of coefficients to reasonable elaborations of reported results, and observation of common practice at the terminal or printer. Although much maligned, the search for estimated regression coefficients with the right signs has a reasonable interpietation which may help to explain the pervasiveness of the practice. Of a list of candidate regressors, that subset which produces the highest multiple correlation coefficient (R *) without violating inequality constraints also produces maximum likelihood estimates of the coefficients subject to those constraints in the normal linear regression model. Determining that subset is a straightforward quadratic programming problem, whose solution has been known to applied statisticians at least since Judge and Takayama (1966). However, inequality constrained linear regression is not an option of any of the more popular econometrics software packages. Many practitioners may lack the gall (if not the resources) to determine the solution using standard ordinary least squares regression packages by estimating 2 equations given r linear inequality constraints. A more important
0883-72S2/86/020127-1S$O7.SO 0 1986 by John Wiley & Sons, Ltd.
Received October I985 Final revision January I986
128
J. GEWEKE
problem is the absence of any distribution theory for the inequality constrained linear regression model. Liew (1976) has approached the problem directly, and his solution is conditional on knowing which constraints are binding and which are not: this leads to the conventional ordinary least squares variance matrix excluding those regressors for which the constraints are binding. Given that the investigator does not, in fact, know ahead of time which constraints will be binding, this variance matrix is incorrect. Tests based on this matrix can be seriously misleading (Love11 and Prescott, 1970). The approaches in the literature most similar to the one taken here are those of OHagan (1973) and Davis (1978). OHagan assumes a prior which is conjugate except that a single coefficient is constrained to be non-negative, and obtains the posterior distribution of the constrained coefficient. The complete posterior distribution is not obtained and the paper recognizes that the approach taken does not generalize readily to the case of several non-negativity constraints. Davis restricts consideration to linear constraints, whereas that is not the case here. On the other hand the approach here assumes that the prior probability of binding constraints is zero, whereas Davis allows this probability to be positive. The focus of Daviss paper is on deriving expressions for the posterior probability of binding constraints, whereas this work concentrates on the computation of the posterior distribution of arbitrary functions of interest to the investigator. The literature provides a detailed treatment of the very specific problem of testing the null hypothesis that a subset of regression coeficients is zero against the alternative that they are all positive. (The more general problem of testing the null hypothesis that a subset of coefficients is a specified vertex of the boundary of a set defined by linear constraints against the alternative that it is in the interior of that set can be cast in this form.) This problem is discussed by Gourieroux, Holly and Monfort (1982), Farebrother (1984a,b), Hillier (1985) and Kodde and Palm (1985). This work has little direct bearing on the problem taken up here. In this paper, we take an approach to the problem of inequality constrained linear regression that departs from this earlier work conceptually and leads to practical procedures for those doing applied work. The approach is Bayesian, but the priors are all either uninformative or are the product of an uninformative distribution and an indicator function representing inequality constraints. It is well known that in the linear regression model with only equality constraints, the maximum likelihood estimator ) and the sampling variance s2(XTx)- correspond to the mean and variance parameters of the posterior multivariate 1-distribution for Jl given the uninformative prior p ( a , B) = 6. It is also well known, and evident from published discussions, that estimates and standard errors are often given an informal Bayesian interpretation, with means and confidence intervals treated as fixed, and parameters treated as random. This computational coincidence of sampling and posterior distributions does not extend to inequality constrained linear regression. For many investigators the inequality constrained maximum likelihood estimator is not likely to be especially interesting, since it may well lie at one end-point of its distribution (whether that distribution is the posterior, or the sampling theoretic distribution). Instead of attempting to extend the sampling theory of the conventional model to the inequality-constrained model, we extend the computation of the posterior. distribution to that case. This task turns out to be straightforward, and leads to practical methods for exact inference that are impossible to treat using a sampling-theoretic approach. In the next section, we set out some notation and discuss the nature of the inference problem in the inequality constrained regression model. Computational procedures using Monte Carlo integration are set forth in Section 3. These are rather straightforward adaptations of well-established techniques to our problem; it is the recent development of inexpensive 32-bit
THE INEQUALITY CONSTRAINED N O R M A L LINEAR REGRESSION MODEL
129
microcomputers that makes them convenient tools for problems like this one. Examples in the following three sections illustrate the utility of the proposed methods.
2. T H E NATURE OF T H E PROBLEM A N D ITS SOLUTION

It is useful to begin with the simplest inequality constrained normal linear regression model: a normal population with unknown non-negative mean p and known variance 6 : . Denote the arithmetic mean from a sample of size n from this population by , i i , and for ease of exposition assume u2/n = 1. The maximum likelihood estimator ji of I . / is F if F 2 0, and 0 if fi < 0. The standard error for j i suggested by Liew (1 976) is var(F) = 1 if fi 2 0, and var(F) = 0 if fi < 0. Unless p = 0 these values are correct asymptotically. In the finite sample with oz/n = 1, var(F) has only two possible values, and its expected value is @ ( p ) . Consequently inference based on , and i var(ji) can produce badly biased results; e.g. in the test of H,,: p 2 p* against H A : p < p * , at the nominal five per cent level, the actual test size is five per cent if p* > 1.645 and @( -p*) if 0 < p* <1.645. The maximum likelihood estimator is not well suited for inference about p. It tends to be concentrated below values of p that are plausible given the sample, and the probability that it lies on the boundary of the parameter space is never zero. The last characteristic of j i renders extension of classical interference procedures to this problem very difficult, and is responsible for the fact that there are no formal, practical methods for inference in the inequality constrained regression model. The formal Bayesian approach to this inequality constrained mean estimation problem leads to a very simple result. With the improper priorp(p) = 1 for p 2 0 andp(p) = 0 for p < 0, the posterior distribution for p is p N(F,l) truncated below at p = 0. In Table I we have tabulated the mean and standard deviation of the posterior for some values of fi in order to contrast them with r( and its standard error. (In application one would work with the exact posterior since its evaluation is trivial.) As j i increases above zero the posterior essentially becomes N@,l); for fi exceeding 3 or so the difference can be ignored. As ji decreases below zero the posterior mean begins to behave like --ji-', as does the standard deviation; this is evident in Table I and follows from properties of the truncated normal distribution, as shown in Appendix I. The posterior distribution for p provides a better indication of the values of p that are plausible given the data and the constraints than d o the maximum likelihood estimator of p and approximations to its unknown distribution. This is unsurprising, since the problem is one of Bayesian inference. The distinction between formally classical and formally Bayesian
Table I. Properties of the posterior distribution p N($, l), truncated so that p 2 0
-4 -3 -2 -1 0 1 2 3 4
0.22561 0.283 10 0.37322 0.52514 0,79788 1.28760 2.05525 3.00444 4.00013
0.2 1604 0.26563 0.33805 0.44620 0.6028 1 0.79353 0.94 152 0.99331 0.99973
130
J. GEWEKE
approaches is obscured in the unconstrained case because of the duality between ji N(p,l) and p N(jl,l). In the constrained case the distinction is essential. In this paper we pursue the consequences of following the Bayesian as opposed to the classical extension of the standard model to the case of inequality constraints. In contrast to the classical extension we obtain a set of tractable results that can be very powerful in application, providing solutions to problems otherwise almost incomprehensibly difficult. Generalize from the case of unknown scalar mean and known variance to the normal linear regression model with unknown variance in the usual way. Assume the standard notation y = X jl+ E , E N(0, u21), y: n x 1, X: n x k, rank (X)= k for the normal linear regression model, and let = (XTX)-XTy, Y = n - k, s2 = v - ( y - X B ) ( y - XB). For a possibly improper, diffuse prior p ( 8, u2) = a-q(fl) the posterior distribution is proportional to
Integrating over u , the posterior becomes (up to a factor of proportionality)
the product of a multivariate t-distribution with common denominator, and q( fl). For the case of inequality constraints q(B) is an indicator function, taking on the value 1 when the constraints are satisfied and 0 when they are not. Evaluation of ( l) , in principle, provides the exact posterior distribution of p and functions of interest of B; we shall return to the problem of turning principle into practice in the next section. The maximum likelihood estimator b of B is a point in the support of ( l ) , with in the interior if and only if b = b. For the same reasons noted in the discussion of mean estimation subject to an inequality constraint, Liews expressions for var( b) are only asymptotically correct (and then only assuming that fl is not on the boundary of q ( B)). In a finite sample the difficulties noted already carry over, further complicated by problems of multidimensionality and multicollinearity. The distribution of $depends on the true value of a and as an estimator of 8, ) is inadmissible (Judge et al., 1984). Even for a single, fixed value of 8 the distribution does not readily lead to tractable procedures (Gourieroux, Holly and Monfort, 1982). Although the constrained maximum likelihood estimator $ is in general uninformative it may be useful in extreme cases. The distinguishing feature of these cases is that there are certain coefficients whose probability of satisfying the constraints is very low under the posterior constructed from the unconstrained, diffuse prior. The intuitive idea is that under the constrained, diffuse prior the distribution of these coeficients will nearly be a point mass on the boundary of q ( 8 ) ; if we impose the point distribution as an equality constraint then the distribution of the remaining coefficients will be essentially correct. This result is formalized in Appendix 11. These considerations provide an interpretation for a common practice in applied work. In its extreme form, the investigator determines that subset of a group of potential regressors that with all coefficients having the produces the highest multiple correlation coefficient (R2) correct sign. He or she then reports least squares coefficient estimates and the conventional associated statistics for that subset. The list of excluded variables combined with the coefficients on the included ones constitutes the constrained maximum likelihood estimate of the entire coefficient vector. The associated statistics are reliable only if all the coefficients are several standard deviations from zero. This condition is rarely met in practice. It is therefore useful to compute the posterior distribution explicitly.
THE INEQUALITY CONSTRAINED NORMAL LINEAR REGRESSION MODEL
131
3. COMPUTATION
Because q( 0) is an indicator function analytical integration of (1) is impossible, and numerical integration is widely regarded as challenging for dimensions greater than 3 or 4. In the examples presented in the next section we employ the Monte Carlo integration procedures o f Kloek and Van Dijk (1978) with minor modifications. To describe their procedure for the general case, let L ( 8 ; Y )denote the likelihood function andp(8) the (possibly improper) prior. Let g ( 8 ) be any function of the parameters whose expected value under the posterior is of interest. (This formulation includes the evaluation of the c.d.f. of any analytically expressible function of the parameter vector 8 at any point, and any moment of such a function that might exist.) We have
so long as the integrals exist; this will be the case for bounded p ( 8 ) (as in the case of an indicator function), bounded g( 8) (as in the case of c.d.f. evaluation), and an integrable is a likelihood function (as in the ca'se of the multivariate t ) . Suppose that k,, i = 1, . . ., i, random sampling from a distribution with p.d.f. 1(8), called an importance function. The support of I(8) must include the support of L(e;Y)p(@). (In practice, the values of 8, are quasi-random synthetic variables generated in the usual way.) Then almost surely,
The practicability of this approach may be regarded as depending on two factors. The first is the expense of evaluating g(B), L(8;Y), p ( 8 ) and Z(8) given 8. Prospects for success are enhanced by using efficient numerical procedures and designs that bring about cancellations in the quotient L(fJ;Y)p(O)/Z(O). The second factor is the number of replications N required to achieve a good approximation of E[g(O)].Ideally Z(8) is the posterior distribution, which is of course unknown. A sufficiently poor choice of I ( @ ) will render convergence arbitrarily slow. Asymptotic (in N) standard errors for the left hand side of (2) as an approximation of E [ g ( 8 ) ] can be computed by standard methods, as indicated by Kloek and Van Dijk. We shall refer to these as 'asymptotic standard errors for numerical accuracy'. Whenever the prior is diffuse I ( @ ) = L(8;Y) may be a suitable importance function, and in this case Computation simplifies t o the evaluation of
N
N
In the present problemp(8) is an indicator function, and the number of replications required f interest g(8) to achieve a good approximation will depend on the behaviour of the function o and the reasonableness of the inequality constraints as judged by the likelihood function,
132
J. GEWEKE
Equivalently, p * is the posterior probability that the constraints are true, given an uninformative, flat prior on 8. The more unreasonable the constraints, from this perspective, the slower the convergence on the left hand side of (2). Since our posterior is the product of a multivariate t and the indicator function q ( 8 ) the computations for each step are simple. The random vector w,is drawn from a N(0, s2(XTx)-) distribution, the random variable z, is drawn from a x 2 ( v ) distribution, and the replication I?, = b + w , ( z ; ~ / v ) - *is / ~then . computed, as is its antithetic replication I?: = b - ~ , ( z f / v ) - ~ In . the limiting but uninteresting case of linear g(8) and p ( 8 ) = 1 this use of antithetic variates guarantees that the left side of (2) always equals its limiting value. For the more interesting examples discussed in the next section, and in similar problems, this use of antithetic variates was found to improve convergence substantially.
4. AN EXAMPLE: RENTAL DATA (PINDYCK AND RUBINFELD)
Pindyck and Rubinfeld (1981, 44) provide 32 observations on rent paid, number of rooms rented, number of occupants, sex, and distance from campus in blocks for undergraduates at the University of Michigan. These data are used by the authors in developing the linear regression model at several points in their text. Denote rent paid per person by y , , rooms per person by r, and distance from campus in blocks by d,, and lets, be a sex dummy, one for male and zero for female. The equation estimated is
-k 8 0We consider estimation subject to the inequality constraints P2 2 0, f13 3 0, p4s 0 , pS=s0;
Yl
81 +
8zSlrl
83(
,)I
+ 84sld,
subject to these constraints interest focuses on whether there are systematic differences in rent Paid by sex ( 8 2 - 8 3 , 8 4 - 8 5 ) . Three sets of coefficient estimates and standard errors are provided in Table 11. The constrained maximum likelihood (ML) solution was found by solving the Kuhn-Tucker conditions using the algorithms provided by Lawson and Hanson (1974, Chapter 23) and the corresponding reported standards errors are computed as if the binding constraints were known to hold u priori; the estimates and standard errors are therefore those proposed by Liew (1976). The Bayes coefficient estimates are computed as described in the previous two sections. The reported Bayes coefficient estimates and standard errors are the means and standard deviations, respectively, of the coefficients under the posterior. These are presented mainly for purposes of comparison and, if the investigators loss function is non-quadratic, other point estimates will be appropriate. The numbers reported are found by substituting I,, for 8, in (3) for the estimates, and by computing the standard errors in the obvious way after substituting Sf, for 8, in (3). Results are based on 20,000 replications consisting of 10,000 antithetic pairs. One of the unconstrained estimates violates the inequality constraints, being about 1.7 standard deviations on the wrong side of zero. As one would expect in the absence
Table 11. Comparison of rent equation estimates Classical OLS
PI
Constrained ML 37.63 (33.27) 130.0 (36.29) 123.0 (38..57)

0.0
Bayes 36.02 (35.10) 138.9 (38.85) 126.2 (40.55) -0.8807 (0.8432) -1.204 (0.5824)
PZ
83
P4
PS
38.56 103.5 122.0 3.315 -1.154
(32.22) (38.37) (37.36) (1.961) (0.5714)
-1.153
(0.5901)
T H E INEQUALITY CONSTRAINED N O R M A L LINEAR REGRESSION M O D E L
133
of substantial multicollinearity, in the constrained ML estimates only this coefficient is set to zero. The posterior mean for this coefficient is of about the same absolute value as its standard error, and the difference between the constrained ML estimates and the posterior means for the other coefficients is negligible. Estimates of some functions of interest are provided in Table 111. Along with the estimates we report standard errors for numerical accuracy, and to judge the adequacy of the latter we report results for both lo4 and 2 x lo5 replications of antithetic pairs. When restrictions are imposed the asymptotic standard errors are virtually identical to @ ( l - @ ) / r ~ ] ' withp /~, equal to the reported probability, n = 2 x lo4 x 0.04845 for lo4 pairs of replications, and n = 2 x 2 x lo5 x 0.04915 for 2 x lo5 pairs of replications. When restrictions are not imposed, estimated standard errors range from 2 to 13 per cent lower than the values suggested by @(l- p)/n3'/2.Since the unconstrained ordinary least squares (OLS) estimate does not satisfy the inequality constraints, both constituents of a pair of antithetic variates never figure in the estimation of any function of interest when restrictions are imposed. The estimates themselves [ P ( p 2 > p3) and P ( p 4 > p5)3 show that the assessment of these probabilities is very sensitive to whether or not restrictions are imposed, which is not surprising given the estimate P ( p , 3 0, p3 0, p4 s 0, /.I5 s 0) = 0.048 under a uniform uninformative prior. Comparison of the estimates with two approximations of the probabilities is revealing. Following Liew (1976) one could estimate using the conventional t from the ordinary least squares regression excluding the fourth regressor; this produces P ( B , > p 3 ) = 0.653, a mild underestimate, and P ( p , > /I5)= 0.974, a wild overestimate. The bias in the latter is plainly a consequence of the assumption that 8, = 0 is known with certainty. A normal approximation to the posterior based on the estimates reported in Table I1 and the associated variance matrix (not reported here) yields P ( p 2 > p$ = 0.740, which is correct, and P(p, > p,) = 0.605, a mild underestimate. This example shows that conventional use of the maximum likelihood estimator can be substantially misleading, a phenomenon demonstrated in constructed examples by Love11 and Prescott (1970). Further elaboration on the last finding is provided in Figures 1 and 2, which show the posterior distribution of pz - /I3 and /3, - p5 given the prior p2 3 0, p3 2 0, p4s 0, ps S 0. (The figures are based on 4 x lo5replications, of which 19,661 satisfied the restrictions. The figures were generated by accumulating frequencies in 500 cells and applying a uniform smoother whose width is about 10 units in Figure 1 and 0.3 units in Figure 2). This distribution of B2 - p3 is approximately normal, in spite of the drastic difference in the point estimates in f 8, - f15 is skewed to the the restricted and unrestricted cases. By contrast the distribution o left, due primarily to the truncation of the distribution of p4to the right of p4 = 0. The normal
Table 111. Estimates of functions of interest, rent equation Restrictions imposed None Pairs of replications
P(P2
P(P2
2
P2 3
O,P3
0, P 4 5 0 9 P5
104 0.04845 (0.00148) 0.2148 (0.0025) 0.9829 (0.0009)
2 x 105
0.04915 (0.00033) 0-2209 (0.0006) 0,9812 (0~0002)
104
2 x 105 1~0000
(0~0000)
Ps 5
0, P 3 0, P 5
P3)
0 0)
1~0000
(0~0000)
'
0.7152 (0.0145) 0.6832 (0.0149)
0.7455 (0.0031) 0.6652 (0.0036)
P(P4 > P 5 )
134
J. GEWEKE
O.O%o
d
-60 -40 -20
20 40
60
ao
82-83
Figure 1
Figure 2
approximation to the distribution in Figure 2 shifts the mode to the left, leading to the underestimate of P(B, > B5) in the normal approximation to the true distribution. Computation time was 1 h 9 min for 20,000 replications and 22 h 7 min for 400,000 replications on the MicroVAX I using IMSL software for random number generation.
5 . AN EXAMPLE: AUTO SALES DATA (BAILS AND PEPPERS)
Bails and Peppers (1982, Appendix G) provide 60 quarterly observations on unit sales of automobiles in the U.S., and 10 explanatory variables. They consider the normal linear regression model y r = Elf=, P,xjr + E,, withy, denoting unit sales of automobiles at time t ; x,,, an ~ , of consumer sentiment; intercept term; x2,, personal income less transfer payments; x ~ index x , ~ ,unemployment rate; xSr,index of cost of car ownership; x6,, average miles per gallon of current model-year cars; x,,, dummy variable for automobile strikes; x S r ,depreciation rate of the stock of cars; x9,, average price of a new car; xlOr, stock of automobiles; and xllr, interest rate on automobile loans. (A discussion of these variables is provided by Bails and Peppers on pp. 246-247.) We use the data exactly as presented, except that x,, is scaled by lo3. The coefficients B2, f13, p6, and Bs are anticipated to be non-negative; all the rest except the intercept are anticipated to be non-positive. In the Bails and Peppers text the model and data are used as an instructive example of how algorithmic addition and deletion of variables can be used in conjunction with the informal imposition of sign constraints to yield a satisfactory final equation. The numbers of observations and regressors here seem to be typical of the fairly common situation in which sign constraints are imposed in an informal, descriptive regression equation.
135
Table IV. Comparison of auto sales equation estimates

~~
Sign
Classical OLS -7.704 (5.380) 0.02666 (0.00743) 0.03962 (0.01694) 0.2471 (0.1561) -5.114 (1.885) 0.08402 (0.2276) -0.1 347 (0.03 14) 65.39 (62.09) 0.1356 (0.8960) -0.07232 (0.09618) -0.008097 (0,1660)
Constrained ML --8.033 (1.641) 0.02133 (0.00198) 0.04792 (0.01254) 0.0 -4.538 (0.9173) 0.05676 (0,1392) -0.1281 (0.0284) 33.02 (43.95) 0.0 0.0 0.0
Bayes -7.535 (2.129) 0.02477 (0.00231) 0.04211 (0.01230) - 0.02057 (0.0 1 707) -3,274 (0.8406) 0.1080 (0.0861) -0.1329 (0.0198) 34.16 (25.56) -0.3986 (0.2477) - 0.0 1899 (0.0 1425) -0.1474 (0.09023)
8,
82
83 84
BS
+ +
86
87
8s
89
+ +
-
810 811
Table IV exhibits the sign constraints (in the second column) and three sets of estimates, with standard errors in parentheses. By widely applied ad hoc standards for the evaluation of OLS estimates the model is neither a great success nor an embarrassing failure: four coefficients are of the right sign and significant (more than two standard deviations from zero); four are of the right sign and insignificant; two are of the wrong sign and insignificant; and none is significant and of the wrong sign. In the constrained ML estimates both coefficients whose least squares estimates were of the wrong sign are set to zero, as well as two others. That two additional coefficients were set to zero is evidence of multicollinearity, as is the fact that the standard erors for the constrained ML estimates are as much as 73 per cent lower than those for the OLS estimates. None of the equations encountered in the approach to this problem discussed by Bails and Peppers (1982, 246-257) is the solution of the constrained ML problem; their terminal equation omits variables 6 and 8 as well as 4, 9, 10 and 11. The inequality constrained solution is shown in the seventh column of Table IV. As one might expect in a regression model of this size with correlated regressors, there is no simple relation between the constrained solution, and the constrained and unconstrained ML estimates. The constrained ML estimates are between the OLS and Bayes estimates for three coefficients, and two of these are p4 and B9, whose OLS estimates are of the wrong sign. For five of the eleven coefficients the Bayes estimates are between the classical OLS and constrained ML estimates. In the metric of the standard error of the constrained ML estimates the difference between the constrained ML and Bayes estimates is more than one standard error for six of the eleven coefficients. A by-product of the computation of the inequality-constrained Bayes estimator is an assessment of the probability that the restrictions are true under the unconstrained improper priorp( 8, u) a u - . Based on 500,000 replications (250,000 antithetic pairs) this probability is estimated to be 6.40 x with a standard error for numerical accuracy o f 1.13 x This is a result that could not be anticipated from the usual informal classification of OLS coefficient estimates by correctness of sign and ratio of estimate to standard error. Nor does it emerge by calculating the probabilities as if each coefficient were distributed independently of every other, with its distribution the same as its marginal distribution in the unconstrained likelihood function: this produces a probability of 3.26 x less than one per cent, but some 50 times the true value. There is no simple explanation for the difference in probabilities, but the single most important factor appears to be that corr(P4, /Il0) = - 0.841 in the posterior from the unconstrained prior (read from the conventional variance matrix for the ordinary
136
J . GEWEKE
least squares estimates, not reported here), coupled with the fact that the mean of P, was positive and that of PI,, was negative whereas the constrained prior stipulates that both coefficients are negative. Computation time for 500,000 replications was 78 h 34 min on the MicroVAX I, employing IMSL software for random number generation.
6. AN EXAMPLE: R E A L G N P AUTOREGRESSION (FRIEDMAN A N D SCHWARTZ)
We conclude by taking up a case in which the inequality constraints are non-linear. The example is the univariate autoregression in which it is desired to estimatey, = Z;=l By,-, + E , subject to the constraint that the roots of the autoregressive operator 1 - Z:=l P,L lie outside the unit circle, or alternatively to the restriction that two of the roots from a complex conjugate pair (implying an oscillatory response in y, to a single, deterministic shock in E,). The data are taken from Friedman and Schwarz (1982, Table 4.8, pp. 127-129). The variable y, is the logarithm of real per capita GNP, the ratio of column (3) to column ( 5 ) in the cited table. We present results for the period 1870-1973 (the longest that can be considered, given three lags) and for the recent subperiod 1954-1970; in each case the years indicate the range of the dependent variable. The development in Section 2 was undertaken assuming fixed regressors. However, if we treat the presample values yo, y -1, and y -z as either fixed or as unknown with a diffuse prior, then the form o f the posterior distribution derived in Section 2 remains uchanged (Zellner and Tiao, 1964; Zellner, 1971, Section 4.1). For each point (Pl,/3,,S,) at which the posterior is evaluated the roots of 1 - Z:=,/.l,Lare computed, and two indicator functions (one for any root less than one in modulus, one for complex roots) are constructed. The indicator functions correspond to theg(8,) in Section 2, and E[g(e,)] is the probability of the event corresponding to the indicator function being unity. Estimated probabilities for some events of this type are provided in Table V. The asymptotic standard errors for these estimates, when compared with the standard error for estimates of the pardmeter of the Bernoulli distribution from a sample of the same size, indicate gains from antithetic sampling of the likelihood function ranging from negligible to variance reduction of over 50 per cent. The postwar sample shows a substantially greater probability of non-stationarity than does the larger sample, and a somewhat smaller probability of complex roots. The posterior distribution of the inverse of the modulus of the smallest root is shown in Figures 3 and 4; the horizontal axis may be interpreted as the bounding asymptotic geometric rate of decay (less than 1.0) or rate of growth (greater than 1.0) of the shock in y, propagated by a single impulse in E,. The distribution for the 1873-1970 period is very close to normal, centered at 0.995, but with a detectable skew to the left. The distribution for the 1954-1970 period is strongly skewed to the left, has a mode at 1.04, and is considerably more diffuse. Each complex conjugate pair of roots (z,Z) implies an associated periodicity of a shock in y , propagated by a single impluse in E,, 2n/tan-[ Im(z)/Re(z)]. The posterior distribution of (P1,P2,P3)from a prior that the roots must include a complex conjugate pair induces a posterior distribution on this periodicity, and this distribution is shown in Figures 5 and 6. The distribution from the 1870-1973 sample is not surprising if one identifies periodicity with the typical business cycle, whose average length in this period is reckoned to be 4.08 years by reference cycle methods (Friedman and Schwartz, 1982, Table 5.7, pp. 186-188). For 1994-1970 the posterior probability that the periodicity exceeds 7 is 0.3247, which may be related to the fact that the posterior probability that the oscillatory root has smaller modulus than the real one is about the same (0.2871). Conditional on the periodicity being less than
137
Table V.
Estimated
posterior probabilities, autoregressions
univariate
Event *
1873-1970 Estimated probability 0.3598 (0.0022)

0,6859 (0.0024)
1954- 1970 Estimated probability 0.6289 (0,0022) 0.5321 (0.0035) 0,6400 (0.0040) 0.5162 (0.0058)
E 0
El0 0 1s
0,3798 (0.0035) 0.6644 (0.0038)
Joint probability distributions implied by point estimates?
s
0 0
1873-1 970
1954-1 970
E
_ _ _ ~
E
0.3405
0.2884
1 in
0.4253 0.2149
0.2605 0.0993
0
-
0.1916 0.1795
E = Explosive: a! least one root of ( 1 modulus.

=
x,.=, fi,L) less than
0 = Oscillatory: ( 1
1 Constructed
p,LJ)has complex roots. in the obvious way from the foregoing point estimates, standard errors not calculated.
c,=,
30
0 094
5LA
096 098 1 102 104 Dominant Root in AR ( > I explosive)
106
Figure 3. Per capita real GNP, 1873-1970
Dominant Root
in AR ( 4 erploslve)
Figure 4. Per capita real GNP, 1953-1970
138
. : 5 E 0.30 0.20
a
0.10
:.::k
J. GEWEKE
0 60r
000
2-3
3-4
4-5 5-6 6-7 Periodicity in years
>7
Figure 5. Distribution of periodicity of oscillatory root pairs: per capita real GNP, 1873-1870
0 05 0 00
2-3
3-4 4-5 5-6 6-7 Periodicity in years
>7
Figure 6 . Distribution of periodicity of oscillatory root pairs: per capita real GNP, 1953-1970
7 years, the distribution is shifted leftward relative to that for 1873-1970; the average length of a Friedman-Schwartz reference cycle for 1953-1970 is 4.00 years. The findings reported in this section are based on 20,000 replications (10,000 antithetic pairs) undertaken as part of a much larger study. The imputed computation time on the MicroVAX I for the two samples discussed here is about 1 h 57 min. IMSL routines were used for random number generation. Roots of the cubic polynomial were computed using explicit closed form solutions given, for instance, in CRC (1972, 103-104).
ACKNOWLEDGEMENTS
Financial support from the Sloan Foundation, U.S. National Science Foundation grant SES-8318778 and the Pew Foundation are gratefully acknowledged, as are comments from A. Ronald Gallant, George Judge, James Mackinnon, Dale Poirier, Edward Prescott, Arnold Zellner and two anonymous referees. Opinions and errors are solely the author's.
APPENDIX I: LIMITING BEHAVIOUR O F THE TRUNCATED NORMAL DISTRIBUTION

Let the random variable X have the standard normal distribution truncated below at c. Denote the p.d.f. and c.d.f. of the standard normal distribution by Z(x) and @ ( x ) , respectively, and denote Mills' ratio R ( x ) = [ l - @(x)]/Z(x). Then X has mean [ R ( c ) ] - ' and variance 1 + c[R(c)]-' - [ R ( c ) ] - *(Johnson and Kotz, 1970, 278).
139
Result 1
Let X , be a sequence of standard normal random variables truncated below at c,, and lim c, = 0s. Then plim(X, - c,) = 0.
n
f l
Pro0f
Gordon (1941) demonstrated that c/(c2 + 1) < R ( c ) < c - l , and hence 0 < R(c)- - c < c2. Since E ( X , - c,) > 0 for all n, for any E > 0, P [ F , - c, 1 > E] = P [ X , - c, > E ] < E ( X , - c,)/E. The last expression converges to O as n + oc, so plim(X, - c,) = 0. The ratio of the squared mean of X - c to its variance is
[ l - CR(C)]~/[R(+ C) cR(c) - 11
The limiting behaviour of the numerator and denominator of this expression can be characterized.
Lemma I
lim [l - cR(c)]c2= 1.
C-.=
The limit of this ratio is indeterminate, but a single application of LHospitals rule yields 1/(3~* + l), whose limit is 1.
Lemma 2
lim [R(c) + cR(c) - 1]c4 = 1.

C-x
Pro0f The expression whose limit is sought is

[ jc=e-22/2&]2 + ce-2/2/<=e-22/2&
- e-c2
One application of LHospitals rule produces
140
J. GEWEKE
which is still indeterminate. Scaling to remove all complicating factors from the term involving the integral (as in the proof of Lemma 1) this expression becomes
A final application of L'Hospital's rule yields

-
2/(-
2&-6
- 3&-4
- 14c-2 - 2)
Result 2
Proof
Immediate from Lemmas 1 and 2.
Result 3
Iimc[E(X) - c ] = 1.
C-2
Proof
Immediate from Lemma 1 and the fact (from Gordon's bounds) that lim c-lR(c)-' = 1.
C+I
APPENDIX 11: A LIMITlNG POSTERIOR DISTRIBUTION IN THE NORMAL LINEAR REGRESSION MODEL Result 1 of Appendix 1 can be used to derive an approximation to the posterior distribution of the coefficients in the normal linear regression model when there are certain coefficients whose probability of satisfying the constraints is very low. Adopting the notation of Section 2, partition pT = (p:, & pT) and let pT2 = (pT, 8;). Let P( p) be the posterior under the 3 0, and R ( 8 ) the unconstrained, diffuse prior, Q(p) the posterior under the prior $ ; &) conformably with bT, and posterior under the prior p1 3 0, = 0. Partition s' = ( K ,, let 8, depend on the arbitrary index m , 8 2 = bh,with
nI2
m+m
lim
Dh
--OO
Let P,( 8) be the corresponding posterior under the unconstrained, diffuse prior; let Q,( 8) be the corresponding posterior under the prior BI2 2 0; and let Q h ( p ) be the marginal in the latter case. We have distribution of
141
where V , , denotes the standard variance of under sampling theoretic assumptions and the inequality is interpreted element-by-element. From Result 1, the distribution Qh(BZ) is that of a random vector converging in probability to 0. Hence the limiting distribution of Band that of 8 conditional o n 8, = 0 are the same: lim Q , ( 8) = R ( 8).
m-=
a,,
REFERENCES Bails, D. G. and L. C. Peppers (1982), Business Fluctuations, Prentice-Hall, Englewood Cliffs. Chamberlain, G. and E. Leamer (1976). Matrix weighted averages and posterior bounds,Journal of the Royal Statistical Society, Series B , 38, 73-84. CRC (1972). Standard Mathematical Tables (20th edn), Chemical Rubber Press, Cleveland. Davis, W. W. (1978), Bayesian analysis of the linear model subject to linear inequality constraints, Journal of the American Statistical Association, 73, 573-579. Farebrother, R. W. (1984a), Testing linear inequality constraints in the standard linear model, University of Manchester working paper. Farebrother, R. W. (1984b), Computing the Gourieroux, Holly, and Monfort likelihood ratio test, University of Manchester working paper. Friedman, M. and A. J. Schwartz (1982), Monetary Trends in the United States and the United Kingdom, The University of Chicago Press, Chicago. Gordon, R. D. (1941), Values of Mills ratio of area to bounding ordinate and of the normal probability integral for large values of the argument, Annals of Mathematical Statistics, 12, 364-366. Gourieroux, C., A. Holly and A. Monfort (1982), Likelihood ratio test, Wald test, and Kuhn-Tucker test in linear models with inequality constraints on the regression parameters, Econometrica, 50, 63-80. Hillier, G . M. (1985), Joint tests for zone restrictions on nonnegative regression coefficients, Monash University working paper. . Johnson, N. L. and S. Kotz (1970), Distributions in Statistics: Continuous Univariate Distributions - 2 , Wiley, New York. Judge, G. G. and T. Takayama (1966), Inequality restrictions in regression analysis, Journal of the American Statistical Association, 61, 166-181. Judge, G. G., T. A. Yancey, M. E. Bock and R. Bohrer (1984), The non-optimality o f the inequality restricted estimator under squared error loss, Journal of Econometrics, 25, 165-177. Kodde, D. A. and F. C. Palm (1985), Wald criteria for jointly testing equality and inequality restrictions, paper presented at the Fifth World Congress of the econometric Society. Kloek, T. and H. K. Van Dijk (1978), Bayesian estimates of equation system parameters: an application of integration by Monte Carlo, Econometrica, 46, 1-19. Lawson, C. L. and R. J. Hanson (1974), Solving Least Squares Problems, Prentice-Hall, Englewood Cliffs. Leamer, E. and G. Chamberlain (1976), A Bayesian interpretation of pretesting, Journal of the Royal Statistical Society, Series B , 38, 85-94. Liew, C. K. (1976), Inequality constrained least-squares estimation,Journal of the American Statistical Association, 71, 746-75 1. Lovell, M. C. and E. Prescott (1970), Multiple regression with inequality constraints: pretesting bias, hypothesis testing, and efficiency,Journal of the American Statistical Association, 65, 913-925. OHagan, A. (1973), Bayes estimation of a convex quadrature, Biometrika, 60, 565-571. Pindyck, R. S. and D. L. Rubinfeld (1981), Econometric Models and Economic Forecasts (second ed), McGraw-Hill, New York. Zellner, A. (1971). An Introduction to Bayesian Analysis in Econometrics, Wiley, New York. Zellner, A . and G. C. Tiao (1964), Bayesian analysis of the regression model with autocorrelated errors, Journal of the American Statistical Association, 59, 763-778.

Journal of Applied Econometrics Volume 1 Issue 2 1986 (Doi 10.1002-Jae.3950010203) John Geweke - Exact Inference in The Inequality Constrained Normal Linear Regression Model

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Journal of Applied Econometrics Volume 1 Issue 2 1986 (Doi 10.1002-Jae.3950010203) John Geweke - Exact Inference in The Inequality Constrained Normal Linear Regression Model

Hochgeladen von

Copyright:

Verfügbare Formate

JOURNAL OF APPLIED ECONOMETRICS, VOL.

EXACT INFERENCE IN THE INEQUALITY CONSTRAINED NORMAL LINEAR REGRESSION MODEL

0883-72S2/86/020127-1S$O7.SO 0 1986 by John Wiley & Sons, Ltd.

Received October I985 Final revision January I986

THE INEQUALITY CONSTRAINED N O R M A L LINEAR REGRESSION MODEL

2. T H E NATURE OF T H E PROBLEM A N D ITS SOLUTION

Table I. Properties of the posterior distribution p N($, l), truncated so that p 2 0

0.22561 0.283 10 0.37322 0.52514 0,79788 1.28760 2.05525 3.00444 4.00013

Integrating over u , the posterior becomes (up to a factor of proportionality)

THE INEQUALITY CONSTRAINED NORMAL LINEAR REGRESSION MODEL

Constrained ML 37.63 (33.27) 130.0 (36.29) 123.0 (38..57)

38.56 103.5 122.0 3.315 -1.154

(32.22) (38.37) (37.36) (1.961) (0.5714)

T H E INEQUALITY CONSTRAINED N O R M A L LINEAR REGRESSION M O D E L

104 0.04845 (0.00148) 0.2148 (0.0025) 0.9829 (0.0009)

0.7152 (0.0145) 0.6832 (0.0149)

0.7455 (0.0031) 0.6652 (0.0036)

5 . AN EXAMPLE: AUTO SALES DATA (BAILS AND PEPPERS)

THE INEQUALITY CONSTRAINED NORMAL LINEAR REGRESSION MODEL

Table IV. Comparison of auto sales equation estimates

THE INEQUALITY CONSTRAINED NORMAL LINEAR REGRESSION MODEL

posterior probabilities, autoregressions

1873-1970 Estimated probability 0.3598 (0.0022)

0,3798 (0.0035) 0.6644 (0.0038)

Joint probability distributions implied by point estimates?

E = Explosive: a! least one root of ( 1 modulus.

x,.=, fi,L) less than

Figure 3. Per capita real GNP, 1873-1970

Figure 4. Per capita real GNP, 1953-1970

4-5 5-6 6-7 Periodicity in years

3-4 4-5 5-6 6-7 Periodicity in years

APPENDIX I: LIMITING BEHAVIOUR O F THE TRUNCATED NORMAL DISTRIBUTION

THE INEQUALITY CONSTRAINED NORMAL LINEAR REGRESSION MODEL

lim [R(c) + cR(c) - 1]c4 = 1.

Pro0f The expression whose limit is sought is

One application of LHospitals rule produces

A final application of L'Hospital's rule yields

THE INEQUALITY CONSTRAINED NORMAL LINEAR REGRESSION MODEL

Das könnte Ihnen auch gefallen