Estimating School Efficiency A Comparison of Methods Using Simulated Data

Economics of Education Review 20 (2001) 417429 www.elsevier.
com/locate/econedurev
Estimating school efciency A comparison of methods using simulated data

Robert Bifulco
a
a,*
, Stuart Bretschneider
Center for Policy Research, Maxwell School of Citizenship and Public Affairs, Syracuse University, 426 Eggers Hall, Syracuse, NY 13244-1020, USA b Center for Technology and Information Policy, Maxwell School of Citizenship and Public Affairs, Syracuse University, 400 Eggers Hall, Syracuse, NY 13244-1020, USA Received 29 January 1999; accepted 29 February 2000
Abstract Developing measures of school performance is crucial for performance-based school reform efforts. One approach to developing such measures is to apply econometric and linear programming techniques that have been developed to measure productive efciency. This study uses simulated data to assess the adequacy of two such methods, Data Envelopment Analysis (DEA) and Corrected Ordinary Least Squares (COLS), for the purposes of performance-based school reform. Our results suggest that in complex data sets typical of education contexts simple versions of DEA and COLS do not provide adequate measures of efciency. In data sets simulated to contain both measurement error and endogeneity, rank correlations between efciency estimates and true efciency values range from 0.104 to 0.240. In none of these data sets were either DEA and COLS able to place more than 31% of schools in their true performance quintile. 2001 Elsevier Science Ltd. All rights reserved.
JEL classication: 121 Keywords: Efciency; Productivity; Input output analysis; Educational economics; Educational nance
1. Introduction Performance-based school reform has received much attention in recent years. Key elements of this reform movement include setting standards of student, teacher and school performance, granting autonomy to local actors in the educational process, and establishing rewards for high performance and remedies for low performance. These elements are prominently featured in
the 1994 reauthorization of the federal Title I program as well as several state level reform initiatives.1 These reforms have been advanced as a remedy for several perceived problems with existing public education systems. Prominent among these perceived problems are the lack of incentives and the lack of knowledge about how to improve student performance. Some have argued that given current systems for determining compensation, professional advancement and school funding, the incentives of school ofcials are insufciently linked to student performance (Hanushek, 1994; Levin, 1997).
* Corresponding author. Tel.: +1-315-443-9056; fax: +1315-443-1081. E-mail address: rbifulcu@maxwell.syr.edu (R. Bifulco).
1 For examples and analysis of state level efforts in South Carolina, Mississippi, Kentucky, Texas and Indiana, see Richards and Sheu (1992), Elmore, Abelmann and Fuhrman (1996) and King and Mathers (1997).
0272-7757/01/$ - see front matter 2001 Elsevier Science Ltd. All rights reserved. PII: S 0 2 7 2 - 7 7 5 7 ( 0 0 ) 0 0 0 2 5 - X
418
R. Bifulco, S. Bretschneider / Economics of Education Review 20 (2001) 417429
Performance-based school reform attempts to provide stronger incentives for improving student performance by developing measures of achievement and tying nancial and other rewards to those measures. Some also believe that we know very little about how to manage classrooms, schools and districts in ways that consistently result in higher levels of student achievement. By granting local actors the autonomy to experiment with new approaches and providing the means to assess the impact of local experiments on student performance, performance-based school reform is seen as a way to learn how to meet the ever-increasing demands placed on our public education systems (Hanushek, 1994). Developing valid and reliable measures of school performance is crucial both for efforts to establish incentives and to assess management practices. There is a growing consensus that measures of school performance should be based on the performance of students in the school. However, there is also recognition that any measure of school performance that is based on the performance of students needs to account for the differences in resources available to and service delivery environments faced by different schools. One approach to developing measures of school performance is to apply the conceptions of productive efciency, and techniques for measuring it, that have been developed in the elds of economics and operations research. Several such techniques or methods have been developed, and several have been applied to estimate the efciency of educational organizations. These include econometric approaches that utilize ordinary least squares regression and stochastic frontier estimation as well as a group of linear programming approaches falling under the rubric of Data Envelopment Analysis (DEA). Bessent and Bessent (1980), Bessent, Bessent, Kennington and Reagan (1982) and Bessent, Bessent, Charnes, Cooper and Thorogood (1983) have applied the basic formulation of DEA developed by Charnes, Cooper and Rhodes (1978) to schools in Houston. Fare, Grosskopf and Weber (1989) have applied a version of DEA that allows for variable returns to scale to school districts in Missouri. More recently, Ray (1991), McCarty and Yaisawarng (1993), Ruggiero, Duncombe and Miner (1995) and Kirjavainen and Loikkanen (1998) have applied DEA-based approaches that attempt to control for the different environmental factors faced by educational organizations. Johnes and Johnes (1995) have used DEA to investigate the technical efciency of university departments of economics. Barrow (1991), Deller and Rudnicki (1993) and Cooper and Cohn (1997) have applied the stochastic frontier estimation methods developed by Aigner, Lovell and Schmidt (1977) to estimate the efciency of districts, schools and classes. Stiefel, Schwartz and Rubenstein (1999) reviews the various methods available for measuring efciency and explains
how they can be implemented in programs designed to improve school performance measurement. The availability of these methods for estimating school efciency raises two questions. The rst is whether or not the methods provide accurate estimates of efciency. The second question is, if there are multiple methods of measuring efciency that may perform differently, which method is best to use. Studies that have applied different methods to the same data have found that they provide different results (Banker, Conrad & Strauss, 1985; Nelson & Waldman, 1986). The problem is that without knowing the true efciency of the organizations studied, there is no way to determine which measures provide better estimates. Studies that use simulated data with specied, and thus known, technological relationships and levels of efciency can help to answer these questions. A limited number of such studies have been conducted. However, no attempt has been made to use the results of such simulation studies to assess how appropriate existing efciency measures are for the purposes of performancebased school reform. This paper is intended to ll this gap in the literature. Section 2 identies the specic set of challenges that the educational production process poses for methods of estimating school efciency. Section 3 reviews existing studies that have used simulated data to evaluate methods of estimating organizational efciency, and determines what these studies imply for the estimation of school efciency. Section 4 describes a simulation study that we conducted. Section 5 presents an analysis of how well two methods, the Charnes et al. (1978) version of DEA and Corrected Ordinary Least Squares, did in estimating the known efciencies of the simulated schools. Section 6 offers concluding remarks concerning the current state-of-the-art in measuring school performance and the implications this has for performance-based school reform efforts.
2. Educational production Analysis of educational production is notoriously difcult.2 The rst difculty is that education involves production of multiple outputs. Not only are schools charged with developing cognitive skills in several subject areas, but they are also charged with developing affective traits, promoting democratic values and furthering other social outcomes. Assumptions that these multiple outcomes are complimentary or even mutually consistent are difcult to maintain, and attempts to develop a priori weights that reect the relative value of various outcomes are problematic.
2 For discussions of these difculties, see Bridge, Judd and Moock (1979) and Monk (1990).
419
The second problem in analyzing educational production concerns the difculty of measuring educational outputs. Standardized tests of cognitive skills are typically used to measure educational output. However, standardized tests are not always aligned with curricular goals, subjects such as science, social studies and the arts are not often tested, and even in tested subjects, higher order thinking and problem solving skills are often not assessed (Darling-Hammond, 1991). Even for those skills tested, there is reason to believe that student performance measures aggregated at the school level typically have a large margin of error associated with them (Goldstein & Thomas, 1996). Measures of affective traits, democratic values and social outcomes may be even more problematic. Further complicating analysis is the fact that our knowledge about which factors affect educational outputs is inadequate. In addition, measuring factors that are known to effect educational outputs, such as student motivation or teacher quality, can be difcult. Consequently, attempts to analyze educational production suffer from the presence of unobserved inputs. Because input levels are typically correlated with each other as well as with environmental factors, the problem of unobserved variables can cause the statistical estimation of model coefcients to be biased. How this fact affects different methods of estimating efciency is unclear. Uncertainty about the manner in which inputs affect outputs also complicates matters. There may be time lags between changes in levels of inputs and resulting effects on output. If input levels are roughly constant overtime this might not cause to a great a difculty in estimating efciency. If, however, there are changes in input levels over time, failure to include prior year measures of inputs can create either measurement error or omitted variable problems. Education production may also be characterized by two-way causal relationships between inputs and outputs. In the case of certain student inputs that affect the learning process, this is clear. Student motivation, for instance, both inuences and is inuenced by the level of educational output. Orme and Smith (1996) suggest that there may be feedback from outputs to institutional inputs as well. School districts in which test scores are low might come under pressure to promote improved performance, which might lead to increased resource provision and thus higher levels of inputs. To some extent this process is institutionalized in legislative programs. The federal Title I program, for instance, targets signicant amounts of funds to schools with large numbers of students who show low levels of achievement. Such feedback is likely to bias the estimation of regression coefcients, and Orme and Smith argue that it can bias DEA estimates as well. Environmental factors, such as the family background of the students served by the school, can substantially
inuence the level of output that schools obtain. Environmental factors are conceptually different than production inputs because they are beyond the control of policy ofcials. If environmental factors can be represented as simple additive terms in a schools production function, then it may be acceptable to treat them as another set of inputs. In this case, environmental factors might not signicantly complicate the estimation of efciency. If, however, these factors interact with controllable inputs and technologies in non-additive ways, then incorporating environmental factors into efciency analysis will be complicated. Finally, organizational efciency, as well as inputs and the service delivery environment, affect the output of a school. One of the key premises of performancebased school reform is that some schools could be doing better given the resources they have and the environments they face. This difference between how much output a school can be expected to produce and how much it actually produces can be attributed to poor performance or inefciency. Conceptually distinguishing the affects of inefciency from the affects of input quality or environmental factors is difcult. If the teachers in a school are putting forth less effort than teachers in other schools, does this reect inefciency, lower teacher quality or a more demoralizing service delivery environment? Below we demonstrate that even if the affects of efciency are clearly distinguished from the affects of inputs conceptually, it can be difcult to distinguish those affects empirically.
3. Existing simulation studies Aigner et al. (1977) and Olsen, Schmidt and Waldman (1997) have used simulated data do compare different econometric methods for estimating stochastic production frontiers. Ruggiero (1996a) has compared stochastic frontier models with deterministic parametric frontier models. Orme and Smith (1996) have published simulation studies that examine particular properties of a single DEA method, and Ruggiero (1996b) and Ruggiero and Bretschneider (1998) have used simulated data to compare different linear programming models. Simulation studies comparing linear programming approaches to econometric approaches have also been conducted. Gong and Sickles (1992) compared three methods of estimating the Aigner et al. (1977) stochastic frontier model and the original Charnes et al. (1978) formulation of DEA. The researchers simulated observations of three inputs and one output, assuming various underlying technologies, different relative sizes of technical efciency and random noise, different distributions of inefciency and the presence of correlation between input levels and inefciency. Banker, Gadh and Gorr (1993) compared the Banker, Charnes and Cooper
420
(1984) formulations of DEA to a stochastic frontier approach that utilizes Corrected Ordinary Least Squares (COLS). These researchers examined cases of one output and two inputs with two different underlying technologies. They examined cases with high and low measurement error, various distributions of inefciency, and sample sizes of 25, 50, 100 and 200. These studies have provided important insights. Gong and Sickles (1992) found that when input levels and technical inefciency are correlated, DEA outperforms the stochastic frontier methods by a considerable margin. Both studies suggest that for cases where measurement error variance is low relative to inefciencies and the actual inefciency distribution does not match that assumed by the stochastic frontier model, DEA does approximately as well or better than the stochastic frontier methods. For cases where random measurement error variances are relatively large, on the other hand, the stochastic frontier method does better.3 Cases in which researchers try to estimate the efciency of educational organizations, differ in important ways from the cases simulated in existing studies. Most important is the presence of multiple production outputs in education. Consequently, it remains difcult to judge whether or not any of these methods are adequate for estimating school efciency. In addition, while the existing studies provide information about which methods perform better under different circumstances, they do not specically ask whether or not any of the methods perform well enough to serve the administrative and research purposes of performance-based school reform. Are the estimates of efciency accurate enough to serve as a basis for awarding nancial incentives or targeting remedial efforts? Can these methods help us determine what managerial and resource allocation practices help to foster improved performance? The rest of this paper attempts to shed further light on this issue by applying alternative methods to data created to simulate conditions characteristic of educational production.
emulate important aspects of educational production contexts. These are the Charnes et al. (1978) version of DEA and Corrected Ordinary Least Squares.4 These methods provide conceptually identical measures of technical efciency dened as a feasible combination of inputs and outputs such that it is impossible to increase any output (and/or reduce any input) without simultaneously reducing another output (and/or increasing any other input).5 Both methods enable estimation of technical efciency by locating a school relative to an efcient production frontier. The two methods differ in the way in which they construct the frontier. The primary advantages of DEA is that it handles multiple outputs without requiring a priori specication of weights, and is non-parametric with respect to the functional form of the frontier and the distribution of inefciency. The primary advantage typically advanced for regression-based approaches like COLS is their potential for separating inefciency from measurement error.6 In order to examine the performance of these methods, we generated a total of 12 different data sets. Each of the data sets were generated from the following log-linear relationship between three inputs and two outputs:
d h yayb xg x2x3 1 2 1
(1)
Here y1 and y2 represent outputs and x1, x2 and x3 are

4 For an explanation of Corrected Ordinary Least Squares used in this study, see the appendix to Banker et al. (1993). 5 More precisely, these methods provide estimates of a conceptual measure of efciency developed by Farell (1957) dened as the maximum radial reduction in all inputs consistent with equivalent production of observed output. For an explication of the Farell measure of efciency, see Lovell and Schmidt (1988). 6 This is accomplished by treating the efcient frontier as a stochastic phenomenon. That is, these methods attempt to decompose the deviation of actual production from the estimated frontier into a component that is due to inefciency and a component that is due to random error. As applied here, however, COLS is not fully stochastic. Assumptions about the composition of the error term are used in determining how much to adjust the intercept of the regression equations. Once the intercept is adjusted, however, deviations from the frontier are assumed to either be due entirely to inefciency or entirely to random error. If an observation is on the efcient side of the frontier, all of the deviation from the frontier is assumed to be due to random measurement error. If an observation is on the inefcient side of the frontier, all of the deviation is attributed to inefciency. Jondrow, Lovell, Materov and Schmidt (1982) have developed a method for measuring efciency that divides the COLS residual into an efciency and random component based on a priori assumptions about the distribution of these two error terms. Ondrich and Ruggiero (1997) have shown that this stochastic frontier method is essentially deterministic and that the estimates of efciency provided by stochastic frontier methods are ordinally equivalent to the COLS method used here.
4. The experimental design We examined the performance of two methods of estimating performance in simulated data sets designed to
3 Banker et al. (1993) also found that in 638 out of 640 cases the stochastic frontier method attributed all the variance of observed production from the estimated frontier either to measurement error or to inefciency. In only two cases was variance around the estimated regression plane attributed partially to measurement error and partially to inefciency. This casts doubt on the ability of COLS to accurately separate measurement error from inefciency.
421
inputs. In each scenario simulated, we assume a=b=0.5 and that g=0.4, d=0.4 and h=0.2. The fact that the input coefcients sum to 1 implies constant returns to scale. The weights on the outputs represent the relative value placed on each output. In this case, the outputs are assigned equal relative values. In real situations these values are likely to differ from school to school, and in any case are difcult to specify. Here, specifying the true weights allows us to generate observations with known efciency levels. Three factorsmeasurement error, correlation between inputs and inefciency, and sample sizewere manipulated to generate the 12 different sets of data. Variation on the rst two factors was limited to presence and absence, and sample sizes of 20, 100, and 500 were generated. Combined with the application of two different methods for measuring efciency, this results in a full factorial design consisting of 24 cells. 4.1. Base case simulations We began by constructing a data set with no measurement error and no endogeneity. To do this, we randomly generated observations for each of the inputs from a uniform distribution on the interval (5, 15). We also randomly generated observations for the output y2 from a uniform distribution with the range (5, 10). These values for x1, x2, x3 and y2 were then substituted into Eq. (1) to obtain the efcient amount of y1 for each observation. Next we randomly generated inefciency terms, ui, based on a half-normal distribution |N(0, 0.04)|. The reciprocal of this inefciency value, 1/ui, provides a true efciency value for each observation. The average efciency values in the 12 data sets ranged from 0.829 to 0.865 with standard deviations around those averages ranging from 0.071 to 0.102. Empirically estimated efciency distributions vary substantially depending particularly on whether or not efciency estimates account for differences in the environments faced by schools. In three studies that attempt to control for environmental factors (Ray, 1991; Deller & Rudnicki, 1993; Kirjavainen & Loikkanen, 1998), mean efciency levels ranged from 0.874 to 0.913 with standard deviations ranging from 0.042 to 0.200. It is difcult to know whether these estimated distributions reect the true underlying distributions, but there is no reason to believe that the efciency distributions we generated are unrealistic. Finally, the observed value of y1 for each observation was calculated as: y1i(obs) xg/axd/axh/a(1/yb/a)1/u1/a 1i 2i 3i 2i i (2)
4.2. Simulations with measurement error Next we added measurement error. To accomplish this, we used the following equation to calculate the efcient level of output.
d h yayb xg x2x3 v 1 2 1
(3)
Where each parameter and variable has the same value as in the base case simulations, and v is an error term that is normally distributed, N(0, 0.04), at the logged level. This makes the efcient production frontier stochastic, differing from school to school due to the effects of randomly distributed measurement errors. The observed value of y1i was calculated as:
g/a y1i(obs) x1i xd/axh/a(1/yb/a)(vi/ui)1/a 2i 3i 2i
(4)
The variance of the measurement errors used in this study are similar to those of the high measurement errors used in the study conducted by Banker et al. (1993). In these cases, the measurement error term has substantially higher correlations with the level of output than does the efciency term. Whether this is characteristic of real educational data sets is difcult to determine. Research by Bifulco and Duncombe (1998) on schools in New York City indicates that school level measures of student performance vary considerably from year to year. This, as well as the arguments presented in Section 2, suggests that measurement error in educational data sets is substantial. 4.3. Simulations with endogeneity We also generated data sets that incorporate negative correlation between input x1 and the efciency term. This was accomplished by replacing the observations for x1 in the base case simulations and in the simulations with measurement error only with observations linked to the efciency values. Specically, we used the equation x1i=45 (40/ui)+ei to generate observations of x1. Here e is a normally distributed variable with a mean of 0 and variance of 4. This resulted in a distribution for x1 similar to those in the other cells and correlations between x1 and the efciency value, 1/ui, ranging from 0.783 to 0.925. In a regression framework, efciency, together with measurement error, constitute a composed error term. Correlation between this composed error term and the observed inputs can arise for at least three different reasons: the presence of unobserved variables; two-way causal relationships between inputs and outputs; or a causal connection between the level efciency and the level of inputs. For instance, a negative correlation between inputs and efciency values can be one of the by-products of the type of feedback from outputs to inputs discussed by Orme and Smith (1996). Each of
This provided observations for each input and each output that could be used to implement the various methods of estimating efciency.
422
these potential explanations of correlation between the error term and an observed input represents one or another form of endogeneity. Incorporating this correlation into the simulated data allows us to explore the impact of these potential types of endogeneity on efciency measurement. The study used three base case simulations with sample sizes of 20, 100 and 500, three simulations with endogeneity, three simulations with measurement error and three simulations with both endogeneity and measurement error. Examining the performance of DEA and COLS using different sample sizes will help determine the appropriateness of these methods for different sized school systems. Incorporating multiple outputs, measurement error and endogeneity into our simulations will allow us to examine the effect of important aspects of educational production on the performance of these methods. Including measurement error as an additional term on the right-hand side of the production function does not allow us to distinguish the effects of measurement error in the independent variable from the effect of measurement error on the dependent variables. It does, however, allow us to examine the effect of measurement error generally. Either the presence of unobserved variables, simultaneity between inputs and outputs, or causal links between inputs and inefciency can result in correlation between the error term and one or more independent variables. By incorporating such correlation we can begin to determine the effect that these aspects of educational production have on efciency measurement. The importance of environmental factors in educational production is not explicitly incorporated into our simulations. If the effect of environmental factors on educational outputs is simply additive (in the logs), then one or more of the input variables in the underlying technology might be interpreted as an environmental factor. However, if the interaction of the environment and production technology in actual educational contexts is more complex, then our ndings may not provide an accurate view of how well DEA and COLS capture efciency.
5. Results 5.1. Accuracy Table 1 presents the mean absolute differences between the efciency estimates provided by each method and the true efciency scores. This measure indicates the proximity of each estimate on average to the true efciency. Lower mean differences are preferred. The P-values reported in the third column indicate the probability that the means reported in the two preceding columns are not different from each other. Thus, the small P-value in third column for the case of no measurement error, no endogeneity and a sample size of
100 indicates that the mean deviation of 0.063 for DEA is signicantly higher than the mean deviation of 0.032 for COLS. Also included in Table 1, in parentheses, are the standard deviations of the absolute differences. This provides a measure of the variability in proximity for each sample. Lower standard deviations are also preferred. Table 2 presents KendallTau rank correlation coefcients between estimated efciencies and true efciency values. This measure captures the ability of each method to correctly rank observations. An important component of performance-based school reform is identication of the highest and the lowest performing schools in a jurisdiction. The highest performing schools can then be rewarded and corrective actions can be targeted to the lowest performing schools. Identifying groups of high and low performing schools can also be useful for determining whether certain management or resource allocation practices consistently lead to either higher or lower levels of performance. Thus, the ability of a method to correctly rank schools is an important criterion for assessing the usefulness of the methods for the purposes of performance-based school reform. A high rank correlation suggests that the measure performs well in identifying differential efciency. In data sets for which there is no measurement error and no endogeneity, both methods perform well, especially COLS. In each case, the mean absolute deviations and standard deviations around those means are lower and the rank correlations are higher for COLS than for DEA. It is important to note that the apparent superiority of COLS on these criteria might depend on the fact that the functional form and output weightings assumed in applying COLS match those used to simulate the data. In real settings, such a match might not exist. When correlation between inefciency and one of the inputs is added to the data, the performance of COLS diminishes substantially. Such correlation, which can result from the presence of unobserved variables or simultaneous relationships between inputs and outputs, biases the coefcient estimates provided by COLS, causing a misplacement of the production frontier. The result is substantially larger mean differences between the efciency estimates and the true efciency values, larger standard deviations around the means and lower rank correlations. The performance of DEA, on the other hand, does not change as substantially when correlation between inefciency and one of the inputs is present. Consequently, DEA outperforms COLS in cases with endogeneity. This result is consistent with the ndings of Gong and Sickles (1992). In cases with measurement error but no endogeneity, COLS outperforms DEA. Measurement error diminishes the performance of both methods, but the effects are greater for DEA. Measurement error is taken into account in COLS in two places. First, assumptions about
423
Table 1 Mean (and standard deviations) of absolute deviations of efciency estimates from true efciency values DEA No measurement error No endogeneity Sample size COLS Pa
20 100 500 20 100 500
0.096 (0.080) 0.063 (0.072) 0.059 (0.047) 0.140 (0.085) 0.066 (0.074) 0.070 (0.053)
0.089 (0.052) 0.032 (0.021) 0.028 (0.023) 0.157 (0.069) 0.121 (0.073) 0.120 (0.077)
0.677 0.012 0.000 0.126 0.000 0.000
Endogeneity Sample size
Measurement error No endogeneity Sample size
20 100 500 20 100 500
0.106 (0.091) 0.142 (0.093) 0.191 (0.121) 0.100 (0.077) 0.129 (0.087) 0.193 (0.117)
0.111 (0.076) 0.085 (0.056) 0.091(0.071) 0.118 (0.085) 0.113 (0.072) 0.113 (0.076)
0.548 0.000 0.000 0.330 0.234 0.000
The gures in this column represent the probability that the mean difference between the absolute deviations is zero. These gures were computed based on the Wilcoxon signed-rank statistic.
Table 2 Rank correlations between estimated efciencies and true efciency valuesa DEA No measurement error No endogeneity Sample size COLS
20 100 500 20 100 500
0.283 0.478 0.547 0.247 0.384 0.370
0.645 0.767 0.779 0.307 0.330 0.282
20 100 500 20 100 500
0.352 0.229 0.151 0.240 0.211 0.165
0.257 0.390 0.347 0.140 0.104 0.119
Correlations are KendallTau-b statistics.
here is not fully stochastic, it adjusts for the presence of measurement error better than DEA. This result is consistent with the ndings of both Gong and Sickles (1992) and Banker et al. (1993). It is unclear, however, whether or not COLS would perform better in these cases if the assumptions made about the distribution of inefciency and random error were not close to the actual distributions. The cases where both endogeneity and measurement error are present are most similar to those that are likely to be encountered in attempts to measure the efciency of educational organizations. In these cases, the presence of measurement error substantially diminishes the performance of DEA, and the combination of measurement error and endogeneity signicantly diminishes the performance of COLS. The result is that both measures perform poorly. With a sample size of 500, the mean absolute percentage deviation from true efciency is lower for COLS than for DEA, but is nonetheless high compared to the deviations achieved in the base case simulations. DEA shows higher rank correlations. It is, however, doubtful that the correlations of approximately 0.20 achieved by DEA are adequate for the purposes of awarding performance bonuses or targeting remedial resources. 5.2. Bias Table 3 presents the mean differences of true efciency values minus the efciency estimates. Whereas Table 1 provides measures of the mean accu-
the distributions of measurement error and inefciency are used to determine the intercept adjustment. Second, for observations on the efcient side of the frontier, deviations from the frontier are assumed to consist entirely of measurement error. Thus, although COLS as applied
424
Table 3 Mean of the true efciency values minus efciency estimatesa DEA No measurement error No endogeneity Sample size COLS
20 100 500 20 100 500
0.094 (0.000) 0.058 (0.000) 0.012 (0.000) 0.140 (0.000) 0.046 (0.000) 0.008 (0.739)
0.089 (0.000) 0.013 (0.000) 0.001 (0.000) 0.149 (0.000) 0.113 (0.000) 0.113 (0.000)
20 100 500 20 100 500
0.014 (0.622) 0.089 (0.000) 0.157 (0.000) 0.026 (0.312) 0.061 (0.000) 0.163 (0.000)
0.079 (0.007) 0.039 (0.000) 0.045 (0.000) 0.023 (0.674) 0.059 (0.000) 0.057 (0.000)
Figures in parentheses represent the probability that the bias is zero. These values were computed based on the Wilcoxon signed rank statistic.
racy of each estimation method, Table 3 provides measures of bias. In this table, the gures in parentheses do not represent standard deviations, rather they indicate the probability that the bias is zero. Examining the bias of the two methods under different conditions sheds further light on the results in Tables 1 and 2. Banker et al. (1993) argue that for cases with small samples and no measurement error DEA is likely to construct the production frontier with mostly inefcient observations. For these cells, we would expect DEA to underestimate inefciency and overestimate efciency. Thus, we expect negative mean differences in Table 3. In cases with large measurement errors on the other hand, DEA will construct the efcient frontier with observations that have actually been pushed beyond the frontier by random factors. In these cells, we would expect inefciency to be overestimated, and efciency estimates to be on the low side. In these cases, we would expect positive mean differences. Both of these expectations are conrmed by the results reported in Table 3. For the cell with no measurement error and a sample of 20, mean differences are negative and signicantly different than zero. For cells with measurement error and larger samples, mean differences are positive. Further, the bias is not signicantly different from zero in cells with measurement error and a sample size of 20, suggesting that the positive bias created by smaller samples partially counteracts the negative bias created by measurement error. This explains why, in cases with measurement error, the performance of DEA diminishes as sample size increases.
The direction of the bias in COLS will depend primarily on the correction that is made to the intercept term in estimating the efcient frontier. COLS constructs a frontier by estimating an average production function, and then sliding the intercept up to facilitate a frontier interpretation of the estimated function. If the intercept is under-adjusted, inefcient rms will appear efcient, and inefciency will be underestimated. In this case, we would expect mean differences between true and estimated efciency to be negative. Over-adjustment of the intercept term will produce the opposite result. The method for adjusting the intercept term used in this study depends on assumptions about the distributions of inefciency and random error. For all but two cells, the assumed proportion of deviation from the efcient frontier attributed to random error is greater than the actual proportion of the deviation attributable to randomness. As a result, the intercept term was under-adjusted. The result is negative mean differences in Table 3.7 5.3. An averaging approach Notice that in the cells with which we are most concerned, those with measurement error and endogeneity,
7 In 9 out of the 12 cases, all of the deviation is attributed to random error. In these cases, the frontier estimated by COLS is the same as would be estimated by OLS. This results from what is known in the literature as Type I failure. For an explanation of Type I failure, see Banker et al. (1993).
425
Table 4 Performance of estimates derived by averaging DEA and COLS estimates on various criteriaa Mean absolute difference No measurement error No endogeneity Sample size Rank correlation Mean difference
20 100 500 20 100 500
0.092 0.035 0.038 0.148 0.086 0.078
0.536 0.707 0.699 0.313 0.391 0.349
0.092 0.022 0.006 0.144 0.079 0.061
20 100 500 20 100 500
0.080 0.078 0.105 0.102 0.096 0.109
0.379 0.361 0.269 0.193 0.176 0.168
0.047 0.025 0.056 0.001 0.001 0.053
a Mean=the mean absolute deviation of the efciency from the true efciency value; Correlation=KendallTau-b statistic for the efciency estimates and the true efciency values; Mean difference=mean of the true efciency values minus efciency estimates.
the bias of the DEA and COLS methods have opposite signs. Banker et al. (1993) suggest that in such cases a strategy of averaging DEA and COLS estimates may produce less biased measures. We constructed such average estimates and report their performance on several criteria in Table 4. Averaging the two estimates has the effect of averaging the bias of the two methods. In cases where the biases of the DEA and COLS estimates have opposite signs, this can result in less biased estimates. For the cases with measurement error and sample sizes of 100, the biases of the averaged estimates are indeed signicantly lower than either the DEA or COLS estimates taken alone.8 In the cells with endogeneity and measurement error, the averaged estimates perform better on the measure of proximity. In no case does either the DEA or COLS estimate taken separately show a signicantly lower mean absolute deviation from the true efciency value than does the averaged estimates. For samples sizes of 100 and 500, the averaged estimates are signicantly more accurate than the DEA estimates and for a sample size of 100 the averaged estimates are also signicantly more
accurate than COLS.9 With a sample size 100, the efciency estimate provided by the averaging approach is 25% more accurate than the DEA estimate and 15% more accurate than the COLS estimate. 5.4. Adequacy of efciency estimates for purposes of performance-based reform Despite the accuracy gains achieved, averaging estimates does little to improve performance on the important rank correlation criteria. Thus, doubt remains about whether the averaging approach can provide estimates of efciency that are adequate for the purposes of performance-based school reform. To investigate this issue further, we divided the observations in the data sets with endogeneity and measurement error into quintiles based on their true efciency score. We then examined the ability of each method, including the averaging method, to place observations in the appropriate quintiles. The results of this analysis are presented in Table 5. None of the methods did well. For sample size of 20, no more than 25% of the observations were assigned to the correct quintile, and between 35 and 45% were assigned to quintiles two or more away from the true quintile. For samples sizes of 100 and 500, the results
8 Wilcoxan signed rank tests on matched pairs of the averaged estimates and DEA estimates, and on matched pairs of the averaged estimates and COLS estimates were used. In both cases, differences were signicant at the 0.01 level.
9 Again, Wilcoxan signed rank tests with a 0.01 signicance level were used.
426
Table 5 Measures of how well various methods of measuring efciency assign observations to quintilesa Percentage assigned to bottom quintile actually in bottom quintile (%) 25.0 50.0 25.0 45.0 35.0 40.0 30.0 37.0 37.0 Percentage assigned to bottom quintile actually ranked above median (%) 50.0 25.0 50.0 20.0 35.0 25.0 36.0 36.0 30.0 Percentage assigned efciency of 1 actually ranked below median (%) 42.9 50.0 50.0 35.7 48.8 38.5 33.3 48.0 36.8
Percentage Percentage assigned two assigned to or more correct quintiles quintile (%) from actual (%)
Percentage assigned efciency of 1 actually in top quintile (%)
Sample size=20
DEA COLS AVERAGING DEA COLS AVERAGING DEA COLS AVERAGING
15.0 25.0 25.0 31.0 27.0 26.0 24.8 26.0 25.8
35.0 45.0 45.0 42.0 47.0 45.0 41.6 42.8 39.4
14.3 0.0 0.0 28.6 20.9 30.8 28.6 20.7 21.1
Sample size=100
Sample size=500
For samples with endogeneity and measurement error.
were not much better. Only 24.831% of the observations were assigned to the correct quintile, and 41.6% or more were assigned to quintiles two or more away. The averaging approach did not consistently improve performance. The third and fourth columns of Table 5 depict the ability of each method to identify the most inefcient schools. For a sample size of 20, only one of the four schools assigned by DEA to the bottom quintile was actually in the bottom quintile. Two of the four identied were actually more efcient than half of the schools in the sample. DEA performed better with the sample of 100, but poorly again with a sample of 500. Of the observations assigned by COLS to the bottom quintile, roughly the same number were more efcient than the median observation as were actually in the lowest quintile. Again, the averaging approach did not consistently produce better results. The last two columns of Table 5 present evidence on the ability of the methods to identify the most efcient schools. Both DEA and COLS provide relative measures of efciency. Thus, both necessarily assign some of the schools in any sample a perfect efciency score of 1. COLS, as applied here, assigns a substantially larger portion of the sample an efciency score of 1. In fact, COLS assigns nearly 45% of the observations an efciency score of 1.10 This may account for the facts that among
10 In fact, the percent assigned an efciency value of 1 by COLS were 45, 43 and 44.4% for the sample sizes of 20, 100 and 500 respectively. The corresponding gures for DEA were 35, 14 and 4.2%.
the observations assigned an efciency score of 1 by COLS, a smaller percentage are actually in the top quintile and a larger percentage have true efciency scores that are below the median. Thus, one should be careful about drawing any conclusions from the fact that DEA appears to do better than COLS in identify high performers. For our purposes, the important point to draw from Columns 5 and 6 is that no matter which of the three methods is used, at least one-third of the observations assigned efciency scores of 1 actually have efciency values that rank them below the median. Whether such performance is adequate for the purposes of school-based reform is a matter of judgment. However, in the best cases where we use the averaging approach on large samples, we can expect nearly 200 schools in a sample of 500 to be assigned to a quintile two or more away from their true group. If such a method were relied on to determine nancial awards or target corrective action, a large number of schools that lose out on additional resources or face burdensome requirements would have legitimate complaints. It also seems unlikely that analyzing the practices of groups identied as high or low performing by these methods would be very informative. If less than half of the schools that are identied as low performing are actually inefcient and more than 20 percent are actually achieving above average levels of efciency, then it is difcult to say that the managerial practices or patterns of resource allocation found in those schools are ineffective.
427
6. Conclusion The results of our simulations conrm the primary ndings of Gong and Sickles (1992) and Banker et al. (1993), and suggest that the ndings of these researchers can be generalized to cases with multiple outputs. First, the presence of correlation between inputs and inefciency diminishes the performance of COLS estimates of production frontiers. Second, the performance of both DEA and COLS is negatively affected by the presence of measurement error. However, when assumptions about the distribution of measurement error and inefciency made by COLS are close to the actual distributions, the performance of COLS is less effected by measurement error than the performance of DEA. We also found that for data sets characterized by the presence of endogeneity and measurement error, the bias of DEA efciency estimates tend to be positive while the bias of COLS efciency estimates tend to be negative. We further found that in such cases, averaging DEA and COLS estimates can provide less biased and more accurate measures of efciency. However, averaged estimates do not appear to provide more useful school rankings. Most importantly our results suggest that in complex data sets typically used in educational research, i.e. data sets characterized by substantial measurement error and endogeneity, simple versions of DEA and COLS do not provide adequate measures of efciency. It would be difcult to defend implementing performance-based nancing or management programs with estimates of school performance whose rank correlation with true performance is no higher than 0.24, and where no more than 31% of schools are placed in the correct performance quintile. However, our results need not be interpreted with unequivocal gloom. Not only must our ndings be properly qualied, but they also suggest strategies for developing more adequate measures of efciency. Both DEA and COLS performed poorly in our simulations because of an inability to separate inefciency from measurement error. The random errors used in our study were in fact quite large. In all cases, the random error term had a higher correlation with the level of output than the efciency term and each of the three inputs. Whether educational data is actually characterized by this much measurement error is unknown. If actual amounts of measurement error are smaller, DEA and COLS might perform better. Efforts are being made to reduce the amount of measurement error characteristic of current educational production analyses. The Title I reauthorization provided substantial amounts of funding to state educational agencies to develop testing programs that are aligned with explicit curricular goals, that test higher level thinking skills and that can be used for purposes of evaluating school performance. States, such as Kentucky, are lead-
ing the way in the development of such assessment systems. In addition, several city school districts, including Chicago and New York City, have developed schoolbased budgeting systems. These systems provide more reliable school level resource data than has ever before been available. In addition to reducing measurement error, it might be possible to modify existing methods of estimating efciency so as to minimize the effect of measurement error and/or endogeneity. For instance, the fact that the performance of COLS is diminished by correlation between inputs and inefciency is not surprising. This type of correlation violates the assumptions that are required if ordinary least squares is to provide unbiased coefcient estimates. Bias in these coefcient estimates is the source of the poor performance of COLS in estimating efciency. There are, however, well known simultaneous equation methods, such as two-staged least squares, that provide unbiased coefcient estimates in cases where the assumptions of ordinary least squares are violated. If such methods could be used to estimate production frontiers, then efciency estimates that perform better than those we have examined might be developed. Of course, one might doubt whether such methods could provide improved measures of efciency. The results we found for COLS depend on the fact that the assumptions made about the mathematical form of the production function, the relative importance of different outputs and the distributions of inefciency and random error were well matched with the actual forms and distributions. In practical circumstances, we can never know if these assumptions match reality. Thus, it might be more fruitful to search for ways to reduce the impact of measurement error on the efciency estimates provided by DEA. Finally, it may be possible to augment quantitative measures of efciency with qualitative forms of evaluation to develop more reliable measures of school performance. Such qualitative forms of evaluation might involve site visits and audits by professional peers. It is not, however, immediately obvious how information from these different types of evaluation methods can be usefully combined. Efciency estimates might be used to identify schools with potential problems, and therefore worthy of on-site investigation. Goldstein (1997), who cautions against relying solely on student test data to evaluate schools, suggests that this might be an appropriate use of student performance analyses. However, given the large errors in rankings found in this study, efciency measures might not be adequate for even this limited purpose.11
11 A blind reviewer of this paper points out that School inspectors would be somewhat perplexed if more than half the
428
Perhaps a more fruitful use of qualitative investigations would be to develop more accurate measures of important inputs and outputs in the production processes. Of course, conducting such analyses at each school might be prohibitively expensive, thereby limiting the usefulness of any improved measures for making systemwide comparisons. Research is needed to determine exactly how information acquired through site visits, peer reviews and other evaluative methods can be combined with existing data and methods to develop more reliable and valid measures of school performance. Given the data that are currently available, however, our results suggest that the methods for measuring the efciency of educational organizations that have been used most frequently, are inadequate for use in implementing performance-based management systems. This is a discouraging result, and raises questions about the feasibility of some performance-based school reforms.
References
Aigner, D., Lovell, C. A. K., & Schmidt, P. (1977). Formulation and estimation of stochastic frontier production function models. Journal of Econometrics, 6, 2137. Banker, R. D., Charnes, A., & Cooper, W. W. (1984). Some models for estimating technical and scale inefciencies in data envelopment analysis. Management Science, 30, 10781092. Banker, R. D., Conrad, R. F., & Strauss, R. P. (1985). A comparative application of DEA and translog methods: An illustrative study of hospital production. Management Science, 32, 3044. Banker, R. D., Gadh, V. M., & Gorr, W. L. (1993). A montecarlo comparison of two production frontier estimation methods: corrected ordinary least squares and data envelopment analysis. European Journal of Operational Research, 67, 332343. Barrow, M. M. (1991). Measuring local education authority performance: a frontier approach. Economics of Education Review, 10, 1927. Bessent, A., & Bessent, E. (1980). Determining the comparative efciency of schools through data envelopment analysis. Educational Administration Quarterly, 16, 5775. Bessent, A., Bessent, E., Charnes, A., Cooper, W., & Thorogood, N. C. (1983). Evaluation of educational program proposals by means of DEA. Education Administration Quarterly, 19, 82107. Bessent, A., Bessent, W., Kennington, J., & Reagan, B. (1982). An application of mathematical programming to assess productivity in the Houston Independent School District. Management Science, 28, 13351366. Bifulco, R., & Duncombe, W. (1998). The identication and evaluation of low-performance schools: the case of New
poor schools they were sent to investigate actually turned out to be better than average.
York City. Presented at the 1998 American Education Finance Association Annual Conference, Mobile, Alabama. Bridge, G., Judd, C., & Moock, P. (1979). The determinants of educational outcomes: the impact of families, peers, teachers and schools. Cambridge, MA: Ballinger Publishing Company. Charnes, A., Cooper, W. W., & Rhodes, E. (1978). Measuring the efciency of decision making units. European Journal of Operational Research, 2, 429444. Cooper, S. T., & Cohn, E. (1997). Estimation of a frontier production function for the South Carolina educational process. Economics of Education Review, 16, 313327. Darling-Hammond, L. (1991). Accountability mechanisms in big city school systems. ERIC/CUE Digest, 17. Deller, S., & Rudnicki, E. (1993). Production efciency in elementary education: the case of Maine public schools. Economics of Education Review, 12, 4557. Elmore, R., Abelmann, C., & Fuhrman, S. (1996). The new accountability in state education reform: from process to performance. In H. F. Ladd, Holding schools accountable: performance-based reform in education (pp. 6598). Washington, DC: The Brookings Institution. Fare, R., Grosskopf, S., & Weber, W. (1989). Measuring school district performance. Public Finance Quarterly, 17, 409 428. Farell, M. J. (1957). The measurement of productive efciency. Journal of the Royal Statistical Society Series A, 120, 253281. Goldstein, H. (1997). Value added tables: the less-than-Holy Grail. Managing Schools Today, 6, 1819. Goldstein, H., & Thomas, S. (1996). Using examination results as indicators of school and college performance. Journal of the Royal Statistical Society A, 159, 149163. Gong, B., & Sickles, R. C. (1992). Finite sample evidence on the performance of stochastic frontiers and data envelopment analysis using panel data. Journal of Econometrics, 51, 259284. Hanushek, E. (1994). Making schools work: improving performance and controlling costs. Washington, DC: The Brookings Institution. Johnes, J., & Johnes, G. (1995). Research funding and performance in U.K. university departments of economics: a frontier analysis. Economics of Education Review, 14, 301314. Jondrow, J., Lovell, C. A. K., Materov, I. S., & Schmidt, P. (1982). On the estimation of technical inefciency in the stochastic frontier production function model. Journal of Econometrics, 19, 233238. King, R., & Mathers, J. (1997). Improving schools through performance-based accountability and nancial rewards. Journal of Education Finance, 23, 147176. Kirjavainen, T., & Loikkanen, H. A. (1998). Efciency differences of nnish senior secondary schools: an application of DEA and Tobit analysis. Economics of Education Review, 17, 377394. Levin, H. M. (1997). Raising school productivity: an xefciency approach. Economics of Education Review, 16, 303311. Lovell, C. A., & Schmidt, P. (1988). A comparison of alternative approaches to the measurement of productive efciency. In A. Dogramaci, & R. Fare, Applications of
429
modern production theory: efciency and productivity (pp. 331). Boston: Kluwer. McCarty, T., & Yaisawarng, S. (1993). Technical efciency in New Jersey school districts. In H. Fried, K. Lovell, & S. Schmidt, The measurement of productive efciency (pp. 271287). Oxford: Oxford University Press. Monk, D. (1990). Educational nance: an economic approach. New York: McGraw-Hill Publishing Company. Nelson, R.A., & Waldman, D.M. (1986). Measuring technical efciency: index numbers vs. parametric production frontiers. Unpublished Manuscript. Olsen, J. A., Schmidt, P., & Waldman, D. A. (1980). A montecarlo study of estimators of stochastic frontier production functions. Journal of Econometrics, 13, 6782. Ondrich, J., & Ruggiero, J. (1997). Efciency measurement in the stochastic frontier model. Mimeo, Syracuse University. Orme, C., & Smith, P. (1996). The potential for endogeneity bias in data envelopment analysis. Journal of the Operational Research Society, 47, 7383. Ray, S. C. (1991). Resource use in public schools: a study of Connecticut. Management Science, 37, 15201628.
Richards, C., & Sheu, T. (1992). The South Carolina School Incentive Reward Program: a policy analysis. Economics of Education Review, 11, 7186. Ruggiero, J. (1996a). Efciency estimation and error decomposition in the stochastic frontier model: a monte-carlo analysis. Working paper, Center for Economics and Business Research. Ruggiero, J. (1996b). On the measurement of technical efciency in the public sector. European Journal of Operational Research, 90, 553565. Ruggiero, J., & Bretschneider, S. (1998). The weighted Russell measure of technical efciency. European Journal of Operations Research, 108, 438451. Ruggiero, J., Duncombe, W., & Miner, J. (1995). On the measurement and causes of technical inefciency in local public services: with an application to public education. Journal of Public Administration Research and Theory, 5, 403428. Stiefel, L., Schwartz, A., & Rubenstein, R. (1999). Measuring school-level efciency. In A. Odden, & M. Goertz, Schoolbased nancing. Thousand Oaks, CA: Corwin Press.

Estimating School Efficiency A Comparison of Methods Using Simulated Data

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Estimating School Efficiency A Comparison of Methods Using Simulated Data

Hochgeladen von

Copyright:

Verfügbare Formate

Economics of Education Review 20 (2001) 417429 www.elsevier.

Estimating school efciency A comparison of methods using simulated data

R. Bifulco, S. Bretschneider / Economics of Education Review 20 (2001) 417429

R. Bifulco, S. Bretschneider / Economics of Education Review 20 (2001) 417429

R. Bifulco, S. Bretschneider / Economics of Education Review 20 (2001) 417429

Here y1 and y2 represent outputs and x1, x2 and x3 are

R. Bifulco, S. Bretschneider / Economics of Education Review 20 (2001) 417429

R. Bifulco, S. Bretschneider / Economics of Education Review 20 (2001) 417429

R. Bifulco, S. Bretschneider / Economics of Education Review 20 (2001) 417429

20 100 500 20 100 500

0.677 0.012 0.000 0.126 0.000 0.000

Endogeneity Sample size

Measurement error No endogeneity Sample size

20 100 500 20 100 500

0.548 0.000 0.000 0.330 0.234 0.000

Endogeneity Sample size

20 100 500 20 100 500

0.283 0.478 0.547 0.247 0.384 0.370

0.645 0.767 0.779 0.307 0.330 0.282

Endogeneity Sample size

Measurement error No endogeneity Sample size

20 100 500 20 100 500

0.352 0.229 0.151 0.240 0.211 0.165

0.257 0.390 0.347 0.140 0.104 0.119

Endogeneity Sample size

Correlations are KendallTau-b statistics.

R. Bifulco, S. Bretschneider / Economics of Education Review 20 (2001) 417429

20 100 500 20 100 500

Endogeneity Sample size

Measurement error No endogeneity Sample size

20 100 500 20 100 500

Endogeneity Sample size

R. Bifulco, S. Bretschneider / Economics of Education Review 20 (2001) 417429

20 100 500 20 100 500

0.092 0.035 0.038 0.148 0.086 0.078

0.536 0.707 0.699 0.313 0.391 0.349

0.092 0.022 0.006 0.144 0.079 0.061

Endogeneity Sample size

Measurement error No endogeneity Sample size

20 100 500 20 100 500

0.080 0.078 0.105 0.102 0.096 0.109

0.379 0.361 0.269 0.193 0.176 0.168

0.047 0.025 0.056 0.001 0.001 0.053

Endogeneity Sample size

R. Bifulco, S. Bretschneider / Economics of Education Review 20 (2001) 417429

Percentage assigned efciency of 1 actually in top quintile (%)

DEA COLS AVERAGING DEA COLS AVERAGING DEA COLS AVERAGING

15.0 25.0 25.0 31.0 27.0 26.0 24.8 26.0 25.8

35.0 45.0 45.0 42.0 47.0 45.0 41.6 42.8 39.4

14.3 0.0 0.0 28.6 20.9 30.8 28.6 20.7 21.1

For samples with endogeneity and measurement error.

R. Bifulco, S. Bretschneider / Economics of Education Review 20 (2001) 417429

R. Bifulco, S. Bretschneider / Economics of Education Review 20 (2001) 417429

R. Bifulco, S. Bretschneider / Economics of Education Review 20 (2001) 417429

Das könnte Ihnen auch gefallen