Sie sind auf Seite 1von 16

Economics of Education Review 47 (2015) 180195

Contents lists available at ScienceDirect

Economics of Education Review


journal homepage: www.elsevier.com/locate/econedurev

Review

Value-added modeling: A review


Cory Koedel a,, Kata Mihaly b, Jonah E. Rockoff c
a
Department of Economics and Truman School of Public Affairs at the University of Missouri-Columbia, United States
b
RAND Corporation, United States
c
Graduate School of Business at Columbia University and National Bureau of Economic Research, United States

a r t i c l e i n f o a b s t r a c t

Article history: This article reviews the literature on teacher value-added. Although value-added models have
Received 13 October 2014 been used to measure the contributions of numerous inputs to educational production, their
Revised 26 January 2015
application toward identifying the contributions of individual teachers has been particularly
Accepted 26 January 2015
contentious. Our review covers articles on topics ranging from technical aspects of model
Available online 2 February 2015
design to the role that value-added can play in informing teacher evaluations in practice,
highlighting areas of consensus and disagreement in the literature. Although a broad spec-
JEL:
I20
trum of views is reected in available research, along a number of important dimensions the
J45 literature is converging on a widely-accepted set of facts.
M52 2015 Elsevier Ltd. All rights reserved.

Keywords:
Value-added models
VAM
Teacher evaluation
Education production

1. Introduction Accordingly, this review focuses on the literature surround-


ing teacher-level VAMs.1 The attention on teachers is mo-
Value-added modeling has become a key tool for applied tivated by the consistent nding in research that teachers
researchers interested in understanding educational produc- vary dramatically in their effectiveness as measured by value-
tion. The value-added terminology is borrowed from the added (Hanushek & Rivkin, 2010). In addition to inuenc-
long-standing production literature in economics in that ing students short-term academic success, access to high-
literature, it refers to the amount by which the value of an value-added teachers has also been shown to positively affect
article is increased at each stage of the production process. later-life outcomes for students including wages, college at-
In education-based applications, the idea is that we can iden- tendance, and teenage childbearing (Chetty, Friedman, &
tify each students human capital accumulation up to some Rockoff, 2014b). The importance of access to effective teach-
point, say by the conclusion of period t-1, and then estimate ing for students in K-12 schools implies high stakes for per-
the value-added to human capital of inputs applied during sonnel policies in public education. Chetty, Friedman, and
period t. Rockoff (2014b) and Hanushek (2011) monetize the gains
Value-added models (VAMs) have been used to esti- that would come from improving the quality of the teaching
mate value-added to student achievement for a variety of
educational inputs. The most controversial application of 1
Other applications of value-added include evaluations of teacher pro-
VAMs has been to estimate the effects of individual teachers.
fessional development and coaching programs (Biancarosa, Byrk, & Dexter,
2010; Harris & Sass, 2011), teacher training programs (Goldhaber, Liddle,
& Theobald, 2013; Koedel, Parsons, Podgursky, & Ehlert, in press; Mihaly,

Corresponding author. Tel.: +1 573 882 3871. McCaffrey, Sass, & Lockwood, 2013), reading reform programs (Betts, Zau, &
E-mail address: koedelc@missouri.edu (C. Koedel). King, 2005) and school choice (Betts & Tang, 2008), among others.

http://dx.doi.org/10.1016/j.econedurev.2015.01.006
0272-7757/ 2015 Elsevier Ltd. All rights reserved.
C. Koedel et al. / Economics of Education Review 47 (2015) 180195 181

workforce using value-added-based evidence and con- (decreased) parental inputs that are complements (substi-
clude that the gains would be substantial. tutes) for higher teacher quality. VAM researchers cannot
The controversy surrounding teacher value-added stems measure and thus cannot control for parental inputs, which
largely from its application in public policy, and in partic- means that unlike in the structural model, value-added esti-
ular the use of value-added to help inform teacher evalu- mates of teacher quality are inclusive of any parental-input
ations. Critics of using value-added in this capacity raise a adjustments. More generally, the model shown in Eq. (1) is
number of concerns, of which the most prominent are (1) exible along a number of dimensions in ways that are di-
value-added estimates may be biased (Baker et al., 2010; cult to emulate in practical modeling applications (for further
Pauer & Amrein-Beardsley, 2014; Rothstein, 2009, 2010), discussion see Sass, Semykina, & Harris, 2014).
and (2) value-added estimates seem too unstable to be Sass et al. (2014) directly test the conditions linking VAMs
used for high-stakes personnel decisions (Baker et al., 2010; to the cumulative achievement function and conrm the
Newton, Darling-Hammond, Haertel, & Thomas, 2010). skepticism of Todd and Wolpin (2003), showing that they
Rothstein (2015) also raises the general point that the labor- are not met for a number of common VAM specications. The
supply response to more rigorous teacher evaluations merits tests performed by Sass et al. (2014) give us some indication
careful attention in the design of evaluation policies. We dis- of what value-added is not. In particular, they show that the
cuss these and other issues over the course of the review. parameters estimated from a range of commonly estimated
The remainder of the paper is organized as follows. value-added models do not have a structural interpretation.
Section 2 provides background information on value-added But this says little about the informational value contained
and covers the literature on model-specication issues. by value-added measures. Indeed, Sass et al. (2014) note that
Section 3 reviews research on the central questions of bias failure of the underlying [structural] assumptions does not
and stability in estimates of teacher value-added. Section 4 necessarily mean that value-added models fail to accurately
combines the information from Sections 2 and 3 in order to classify teacher performance (p. 10).2 The extent to which
highlight areas of emerging consensus with regard to model measures of teacher value-added provide useful information
design. Section 5 documents key empirical facts about value- about teacher performance is ultimately an empirical ques-
added that have been established by the literature. Section 6 tion, and it is this question that is at the heart of value-added
discusses research on the uses of teacher value-added in ed- research literature.
ucation policy. Section 7 concludes.

2.2. Specication and estimation issues


2. Model background and specication

A wide variety of value-added models have been esti-


2.1. Background
mated in the literature to date. In this section we discuss
key specication and estimation issues. To lay the ground-
Student achievement depends on input from teachers and
work for our discussion consider the following linear VAM:
other factors. Value-added modeling is a tool that researchers
have used in their efforts to separate out teachers individual
contributions. In practice, most studies specify linear value- Yisjt = 0 + Yisjt1 1 + Xisjt 2 + Sisjt 3 + Tisjt + isjt (2)
added models in an ad hoc fashion, but under some conditions
these models can be formally derived from the following cu- In Eq. (2), Yisjt is a test score for student i at school s with
mulative achievement function, taken from Todd and Wolpin teacher j in year t, Xisjt is a vector of student characteristics, Sisjt
(2003) and rooted in the larger education production litera- is a vector of school and/or classroom characteristics, Tisjt is
ture (Ben-Porath, 1967; Hanushek, 1979): a vector of teacher indicator variables and isjt is the idiosyn-
cratic error term. The precise set of conditioning variables in
Ait = At [Xi (t), Fi (t), Si (t), i0 , it ] (1) the X-vector varies across studies. The controls that are typ-
Eq. (1) describes the achievement level for student i at ically available in district and state administrative datasets
time t (Ait ) as the end product of a cumulative set of inputs, include student race, gender, free/reduced-price lunch sta-
where Xi (t), Fi (t) and Si (t) represent the history of individ- tus, language status, special-education status, mobility status
ual, family and school inputs for student i through year t, i0 (e.g., school changer), and parental education, or some subset
represents student is initial ability endowment and it is an therein (examples of studies from different locales that use
idiosyncratic error. The intuitive idea behind the value-added control variables from this list include Aaronson, Barrow, &
approach is that to a rough approximation, prior achievement Sander, 2007; Chetty, Friedman, & Rockoff, 2014a; Goldhaber
can be used as a sucient statistic for the history of prior in- & Hansen, 2013; Kane, McCaffrey, Miller, & Staiger, 2013;
puts and, in some models, the ability endowment. This facil- Koedel & Betts, 2011; Sass, Hannaway, Xu, Figlio, & Feng,
itates estimation of the marginal contribution of contempo- 2012). School and classroom characteristics in the S-vector
raneous inputs, including teachers, using prior achievement are often constructed as aggregates of the student-level vari-
as a key conditioning variable. ables (including prior achievement). The parameters that are
In deriving the conditions that formally link typically- meant to capture teacher value added are contained in the
estimated VAMs to the cumulative achievement function,
Todd and Wolpin (2003) express skepticism that they will 2
Guarino, Reckase, and Wooldridge (2015) perform simulations that sup-
be met. Their skepticism is warranted for a number of rea- port this point. Their ndings indicate that VAM estimators tailored toward
sons. As one example, in the structural model parental inputs structural modeling considerations can perform poorly because they focus
can respond to teacher assignments, allowing for increased attention away from more important issues.
182 C. Koedel et al. / Economics of Education Review 47 (2015) 180195

vector . In most studies, teacher effects are specied as xed The vector in Eq. (5) contains the value-added estimates
rather than random effects.3 for teachers. Although most VAMs that have been estimated
Eq. (2) is written as a lagged-score VAM. An alternative, in the research literature to date use the one-step modeling
restricted version of the model where the coecient on the structure, several recent studies use the two-step approach
lagged test score is set to unity (1 = 1) is referred to as a (Ehlert, Koedel, Parsons, & Podgursky, 2013; Kane et al., 2013;
gain-score VAM. The gain-score terminology comes from Kane, Rockoff, & Staiger, 2008; Kane & Staiger, 2008).
the fact that the lagged-score term with the restricted coe- One practical benet to the two-step approach is that it
cient can be moved to the left-hand side of the equation and is less computationally demanding. Putting computational
the model becomes a model of test score gains.4 issues aside, Ehlert et al. (2014, in press) discuss several
Following Aaronson, Barrow, and Sander (2007), the er- other factors that inuence the choice of modeling struc-
ror term in Eq. (2) can be expanded as isjt = i + s + eisjt . ture. One set of factors is related to identication, and in
This formulation allows for roles of xed individual ability particular identication of the coecients associated with
(i ) and school quality (s ) in determining student achieve- the student and school/classroom characteristics. The iden-
ment growth.5 Based on this formulation, and assuming for tication tradeoff between the one-step and two-step VAMs
simplicity that all students are assigned to a single teacher at can be summarized as follows: the one-step VAM has the
a single school, error in the estimated effect for teacher j at potential to under-correct for context because the parame-
school s with class size Nj in the absence of direct controls for ters associated with the control variables may be attenuated
school and student xed effects can be written as: (1 , 2 and 3 in Eq. (2)), while the two-step VAM has the
Nj Nj potential to over-correct for context by attributing corre-
1  1 
Esj = s + i + eist (3) lations between teacher quality and the control variables to
Nj
i=1
Nj
i=1 the control-variable coecients (1 , 2 and 3 in Eq. (4)).
A second set of factors is related to policy objectives. Ehlert
The rst two terms in Eq. (3) represent the otherwise un-
et al. (2014, in press) argue that the two-step model is better
accounted for roles of xed school and student attributes.
suited for achieving key policy objectives in teacher evalu-
The third term, although zero in expectation, can be prob-
ation systems, including the establishment of an incentive
lematic in cases where Nj is small, which is common for
structure that maximizes teacher effort.
teacher-level analyses (Aaronson, Barrow, & Sander, 2007;
It is common in research and policy applications to shrink
also see Kane & Staiger, 2002). A number of value-added
estimates of teacher value-added toward a common Bayesian
studies have incorporated student and/or school xed effects
prior (Herrmann, Walsh, Isenberg, & Resch, 2013). In prac-
into variants of Eq. (2) due to concerns about bias from non-
tice, the prior is specied as average teacher value-added.
random studentteacher sorting (e.g., see Aaronson, Barrow,
The weight applied to the prior for an individual teacher is
& Sander, 2007; Koedel & Betts, 2011; McCaffrey, Lockwood,
an increasing function of the imprecision with which that
Koretz, Louis, & Hamilton, 2004; McCaffrey, Sass, Lockwood,
teachers value-added is estimated. The following formula
& Mihaly, 2009; Rothstein, 2010; Sass et al., 2012). These con-
describes empirical shrinkage:6
cerns are based in part on empirical evidence showing that
students are clearly not randomly assigned to teachers, even
jEB = aj j + (1 aj ) (6)
within schools (Aaronson, Barrow, & Sander, 2007; Clotfelter,
Ladd, & Vigdor, 2006; Jackson, 2014; Koedel & Betts, 2011;
In (6), jEB is the shrunken value-added estimate for teacher
Pauer & Amrein-Beardsley, 2014; Rothstein, 2009, 2010).
An alternative structure to the standard, one-step VAM j, j is the unshrunken estimate, is average teacher value-
2
shown in Eq. (2) is the two-step VAM, or average residuals added, and aj = , where 2 is the estimated variance
2 +
VAM. The two-step VAM uses the same information as Eq. (2) j

but performs the estimation in two steps: of teacher value-added (after correcting for estimation error)
and is the estimated error variance of j (e.g., the squared
j
Yisjt = 0 + Yisjt1 1 + Xisjt 2 + Sisjt 3 + isjt (4)
standard error).7
isjt = Tisjt + eisjt (5) The benet of the shrinkage procedure is that it produces
estimates of teacher value-added for which the estimation-
error variance is reduced through the dependence on the
3
Examples of studies that specify teacher effects as random include stable prior. This benet is particularly valuable in applica-
Corcoran, Jennings, and Beveridge (2011), Konstantopoulos and Chung tions where researchers use teacher value-added as an ex-
(2011), Nye, Konstantopoulos, and Hedges (2004), and Papay (2011). Be- planatory variable (e.g., Chetty, Friedman, & Rockoff, 2014a,
cause standard software estimates teacher xed effects relative to an arbi-
2014b; Harris & Sass, 2014; Jacob & Lefgren, 2008; Kane et al.,
trary holdout teacher, and the standard errors for the other teachers will be
sensitive to which holdout teacher is selected, some studies have estimated 2013; Kane & Staiger, 2008), in which case it reduces atten-
VAMs using a sum-to-zero constraint on the teacher effects (Goldhaber, uation bias. The cost of shrinkage is that the weight on the
Cowan, & Walch, 2013; Jacob & Lefgren, 2008; Mihaly, McCaffrey, Sass, & prior introduces a bias in estimates of teacher value-added.
Lockwood, 2010).
4
In lagged-score VAMs the coecient on the lagged-score coecient,
unadjusted for measurement error, is typically in the range of 0.60.8 (e.g.,
6
see Andrabi, Das, Khwaja, & Zajonc, 2011; Rothstein, 2009). A detailed discussion of shrinkage estimators can be found in Morris
5 (1983).
Also see Ishii and Rivkin (2009). The depiction of the error term could
be further expanded to allow for separate classroom effects, perhaps driven 7
There are several ways to estimate 2 e.g., see Aaronson, Barrow,
by unobserved peer dynamics, which would be identiable in teacher-level and Sander (2007), Chetty, Friedman, and Rockoff (2014a), Herrmann et al.
VAMs if teachers are observed in more than one classroom. (2013), Kane, Rockoff, and Staiger (2008), and Koedel (2009).
C. Koedel et al. / Economics of Education Review 47 (2015) 180195 183

The trend in the literature is clearly toward the increased use Pauer and Amrein-Beardsley (2014) raise concerns about
of shrunken value-added estimates.8 the potential for non-random studentteacher assignments
Additional issues related to model specication that have to bias estimates of teacher value-added. They survey
been covered in the literature include the selection of an ap- principals in Arizona elementary schools and report that a
propriate set of covariates and accounting for test measure- variety of factors inuence students classroom assignments.
ment error. Areas of consensus in the literature regarding the Many of the factors that they identify are not accounted
specication issues we have discussed thus far, and these ad- for, at least directly, in typically-estimated VAMs (e.g., stu-
ditional issues, have been driven largely by what we know dents interactions with teachers and other students, parental
about bias and stability of the estimates that come from preferences, etc.). The survey results are consistent with
different models. It is to these issues that we now turn. In previous empirical evidence showing that studentteacher
Section 4 we circle back to the model-specication question. sorting is not random (Aaronson, Barrow, & Sander, 2007;
Clotfelter, Ladd, & Vigdor, 2006; Jackson, 2014; Koedel &
3. Bias and stability of estimated teacher value-added Betts, 2011; Rothstein, 2009, 2010), although the survey re-
sults are more illuminating given the authors focus on mech-
3.1. Bias anisms. Pauer and Amrein-Beardsley (2014) interpret their
ndings to indicate that the purposeful (nonrandom) as-
A persistent area of inquiry among value-added re- signment of students into classrooms biases value-added
searchers has been the extent to which estimates from estimates (p. 356). However, no direct tests for bias were
standard models are biased. One consequence of biased performed in their study, nor were any value-added models
value-added estimates is that they would likely lead to an actually estimated. Although Pauer and Amrein-Beardsley
overstatement of the importance of variation in teacher qual- (2014) carefully document the dimensions of student sort-
ity in determining student outcomes.9 Bias would also lead ing in elementary schools, what is not clear from their study
to errors in supplementary analyses that aim to identify the is how well commonly-available control variables in VAMs
factors that contribute to and align with effective teaching as provide sucient proxy information.
measured by value-added. In policy applications, a key con- Rothstein (2010) offers what has perhaps been the
cern is that if value-added estimates are biased, individual highest-prole criticism of the value-added approach (also
teachers would be held accountable in their evaluations for see Rothstein, 2009). He examines the extent to which future
factors that are outside of their control. teacher assignments predict students previous test-score
Teacher value-added is estimated using observational growth conditional on standard controls using several com-
data. Thus, causal inference hinges on the assumption that monly estimated value-added models (a gain-score model,
student assignments to teachers (treatments) are based on a lagged-score model, and a gain-score model with stu-
observables. As with any selection-on-observables model, dent xed effects). Rothstein (2010) nds that future teach-
the extent to which estimates of teacher value-added are ers appear to have large effects on previous achievement
biased depends fundamentally on the degree to which stu- growth. Because future teachers cannot possibly cause stu-
dents are sorted to teachers along dimensions that are not dent achievement in prior years, he interprets his results as
observed by the econometrician. Bias will be reduced and/or evidence that the identifying assumptions in the VAMs he
mitigated when (a) non-random studentteacher sorting is considers are violated. He writes that his nding of non-zero
limited and/or (b) suciently rich information is available effects for future teachers implies that estimates of teachers
such that the model can condition on the factors that drive effects based on these models [VAMs] cannot be interpreted
the non-random sorting (or factors that are highly correlated as causal (p. 210). However, several subsequent studies raise
with the factors that drive the non-random sorting). concerns about the construction of Rothsteins tests and how
The evidence on bias presented in this section is based he interprets his results. Perhaps most notably, Goldhaber
on examinations of math and reading teachers in grades 48. and Chaplin (2015) show that his tests reject VAMs even
This focus is driven in large part by data availability the when there is no bias in estimated teacher effects (also see
structure of most standardized testing regimes is such that Guarino, Reckase, Stacey, & Wooldridge, 2015). In addition,
annual testing occurs in grades 38 in math and reading. One Kinsler (2012) shows that Rothsteins tests perform poorly
direction in which extending the lessons from the current with small samples, Koedel and Betts (2011) show that his
evidence base may be challenging is into high school. High results are driven in part by non-persistent studentteacher
school students are likely to be more-strictly sorted across sorting, and Chetty, Friedman, & Rockoff (2014a) show that
classrooms, which will make it more dicult to construct the issues he raises become less signicant if leave-year-out
VAMs, although not impossible.10 measures of value-added are estimated.11
Studies by Kane and Staiger (2008) and Kane et al. (2013)
8
approach the bias question from a different direction, us-
That said, Guarino et al. (2014) report that shrinkage procedures do not
substantially boost the accuracy of estimated teacher value-added. Similarly,
ing experiments where students are randomly assigned to
the results from Herrmann (2013) imply that the benets from shrinkage are teachers. Both studies take the same general approach. First,
limited. the authors estimate teacher effectiveness in a pre-period
9
The effect of the bias would depend on its direction. The text here pre-
sumes a scenario with positive selection bias i.e., where more effective
teachers are assigned to students with higher expected growth. 11
Chetty, Friedman, and Rockoff (2014a) report that in personal corre-
10
Studies that estimate value added for high school teachers include spondence, Rothstein indicated that his ndings are neither necessary nor
Aaronson, Barrow, and Sander (2007), Jackson (2014), Koedel (2009) and sucient for there to be bias in a VA estimate. See Goldhaber and Chaplin
Manseld (in press). (2015) and Guarino et al. (2015) for detailed analyses on this point.
184 C. Koedel et al. / Economics of Education Review 47 (2015) 180195

using standard VAMs in a non-experimental setting (their Chetty, Friedman, and Rockoff (2014a) also examine the
VAMs are structured like the two-step model shown in extent to which estimates of teacher value-added are bi-
Eqs. (4) and (5)). Next, the non-experimental value-added ased. Their primary VAM is a variant of the model shown in
estimates are used to predict the test scores of students who Eq. (2), but they also consider a two-step VAM.16 They do not
are randomly assigned to teachers in the subsequent year.12 examine models with school or student xed effects.
The 2008 study uses teacher value-added as the sole pre- Chetty, Friedman, and Rockoff (2014a) take two different
period performance measure. The 2013 study uses a com- approaches in their investigation of the bias question. First,
posite effectiveness measure that includes value-added and they merge a 21-year administrative data panel for students
other non-test-based measures of teacher performance, al- and teachers in a large urban school district one that con-
though the weights on the components of the composite tains information similar to datasets used to estimate value-
measure are determined based on how well they predict prior added in other locales with tax data from the Internal Rev-
value-added and thus, value-added is heavily weighted.13 enue Service (IRS). The tax data contain information about
Under the null hypothesis of zero bias, the coecient students and their families at a level of detail never before
on the non-experimental value-added estimate in the pre- seen in the value-added literature. They use these data to
dictive regression of student test scores under random as- determine the scope for omitted-variables-bias in standard
signment should be equal to one. Kane and Staiger (2008) models that do not include such detailed information. The
estimate coecients on the non-experimental value-added key variables that Chetty, Friedman, and Rockoff (2014a) in-
estimates in math and reading of 0.85 and 0.99, respectively, corporate into their VAMs from the tax data are mothers age
for their models that are closest to what we show in Eqs. (4) at the students birth, indicators for parental 401(k) contri-
and (5). Neither coecient can be statistically distinguished butions and home ownership, and an indicator for parental
from one. When the model is specied as a gain-score model marital status interacted with a quartic in household income.
it performs marginally worse, when it includes school xed They estimate that the magnitude of the bias in value-added
effects it performs marginally better in math but worse in estimates in typical circumstances when these parental char-
reading, and when it is specied in test score levels with stu- acteristics are unobserved is 0.2%, with an upper-bound at
dent xed effects it performs poorly. Kane et al. (2013) obtain the edge of the 95% condence interval of 0.25%. The au-
analogous estimates in math and reading of 1.04 and 0.70, thors offer two reasons for why their estimates of the bias
respectively, and again, neither estimate can be statistically are so small. First, the standard controls that they use in their
distinguished from one. The 2013 study does not consider a district-data-only VAM (e.g., lagged test scores, poverty sta-
gain-score model or models that include school or student tus, etc.) capture much of the variation in the additional tax-
xed effects. data variables. Put differently, the marginal value of these
While the ndings from both experimental studies are additional variables is small, which suggests that the stan-
consistent with the scope for bias in standard value-added dard controls in typically-estimated VAMs are quite good.
estimates being small, several caveats are in order. First, Despite this fact, however, the parental characteristics are
sample-size and compliance issues are such that the re- still important independent predictors of student achieve-
sults are noisy and the authors cannot rule out fairly large ment. The second reason that the bias is small is that the
biases at standard condence levels.14 Second, due to fea- variation in parental characteristics from the tax data that
sibility issues, the random-assignment experiments were is otherwise unaccounted for in the model is essentially un-
performed within schools and among pairs of teachers for correlated with teacher value-added. Chetty, Friedman, and
whom their principal was agreeable to randomly assigning Rockoff (2014a) conclude that studentteacher sorting based
students between them. This raises concerns about exter- on parental/family characteristics that are not otherwise ac-
nalizing the ndings to a more inclusive model to evaluate counted for in typically-estimated VAMs is limited in practice.
teachers (Pauer & Amrein-Beardsley, 2014; Rothstein, 2010; Chetty et al.s (2014a) second approach to examining the
Rothstein & Mathis, 2013), and in particular, about the ex- scope for bias in teacher value-added has similarities to the
tent to which the experiments are informative about sort- above-described experimental studies by Kane and Staiger
ing across schools. Third, although these studies indicate (2008) and Kane et al. (2013), but is based on a quasi-
that teacher value-added is an accurate predictor of student experimental design. In instances where a stang change oc-
achievement in an experimental setting on average, this does curs in a school-by-grade-by-year cell, the authors calculate
not preclude individual prediction errors for some teachers.15 the expected change in average value-added in the cell corre-
sponding to the change. They estimate average value-added
for teachers in each cell using data in all years except the years
12
Rosters of students were randomly assigned to teachers within pairs or in-between which the stang changes occur. Next, they use
blocks of teachers.
13
A case where prior value-added is the only pre-period performance mea-
sure is also considered in Kane et al. (2013). of overcorrection bias, but inaccurate for some teachers (for more informa-
14
Rothstein (2010) raises this concern about Kane and Staiger (2008). Kane tion, see footnote 35 on page 33 of Kane et al., 2013). This is less of an issue
et al. (2013) perform a much larger experiment. While this improves statis- for the Chetty, Friedman, and Rockoff (2014a) study that we discuss next.
tical precision, it is still the case that large biases cannot be ruled out with 16
There are some procedural differences between how Chetty, Friedman,
95% condence in the larger study (Rothstein & Mathis, 2013). and Rockoff (2014a) estimate their models and how the models are shown
15
Building on this last point, in addition to general imprecision, one poten- in Eqs. (2), (4) and (5). The two most interesting differences are: (1) the use
tial source of individual prediction errors in these studies is overcorrection of a leave-year-out procedure, in which a teachers value-added in any
bias, which the experimental tests are not designed to detect owing to their given classroom is estimated from students in other classrooms taught by
use of two-step VAMs to produce the pre-period value-added measures. The the same teacher, and (2) their allowance for teacher value-added to shift
non-experimental predictions can still be correct on average in the presence over time, rather than stay xed in all years.
C. Koedel et al. / Economics of Education Review 47 (2015) 180195 185

their out-of-sample estimate of the change in average teacher Future research will undoubtedly continue to inform our
value-added at the school-by-grade level to predict changes understanding of the bias issue. To date, the studies that have
in student achievement across cohorts of students. Put dif- used the strongest research designs provide compelling ev-
ferently, they compare achievement for different cohorts of idence that estimates of teacher value-added from standard
students who pass through the same school-by-grade, but models are not meaningfully biased by studentteacher sort-
differ in the average value-added of the teachers to which ing along observed or unobserved dimensions.20 It is notable
they are exposed due to stang changes.17 that there is not any direct counterevidence indicating that
Their quasi-experimental approach requires stronger value-added estimates are substantially biased.
identifying assumptions than the experimental studies, as As the use of value-added spreads to more research and
detailed in their paper. However, the authors can exam- policy settings, we stress that given the observational na-
ine the scope for bias in value-added using a much larger ture of the value-added approach, the absence of bias in
dataset, which allows for substantial gains in precision. Fur- current research settings does not preclude bias elsewhere
thermore, because the stang changes they consider include or in the future. Bias could emerge in VAMs estimated in
cross-school moves, their results are informative about sort- other settings if unobserved sorting occurs in a fundamen-
ing more broadly. Using their second method, Chetty, Fried- tally different way, and the application of VAMs to inform
man, and Rockoff (2014a) estimate that the forecasting bias teacher evaluations could alter sorting and other behav-
from using their observational value-added measures to pre- ior (e.g., see Barlevy & Neal, 2012; Campbell, 1976).21 Al-
dict student achievement is 2.6% and not statistically signi- though the quasi-experimental method developed by Chetty,
cant (the upper bound of the 95% condence interval on the Friedman, and Rockoff (2014a) is subject to ongoing scholarly
bias is less than 10%). debate as noted above, it is a promising tool for measuring
Results similar to those from Chetty, Friedman, and bias in VAMs without the need for experimental manipula-
Rockoff (2014a) are reported in a recent study by Bacher- tion of classroom assignments.
Hicks, Kane, and Staiger (2014) using data from Los Angeles.
Bacher-Hicks, Kane, and Staiger (2014) use the same quasi- 3.2. Stability
experimental switching strategy and also nd no evidence
of bias in estimates of teacher value-added. At the time of Although some of the highest-prole studies on teacher
our writing this review, there is an ongoing debate between value-added have focused on the issue of bias, the extent
Bacher-Hicks, Kane, and Staiger (2014), Chetty, Friedman, and to which value-added measures will be useful in research
Rockoff (2014c) and Rothstein (2014) regarding the useful- and policy applications also depends critically on their sta-
ness of the quasi-experimental teacher-switching approach bility. Without some meaningful degree of persistence over
for identifying teacher value-added. Rothstein (2014) argues time, even unbiased estimates offer little value. If we as-
that a correlation between teacher switching and students sert that value added models can be developed and tested in
prior grade test scores invalidates the approach. Chetty, which there is a minimal role for bias (based on the evidence
Friedman, and Rockoff (2014c) and Bacher-Hicks, Kane, and discussed in the previous section), estimated teacher value-
Staiger (2014) acknowledge the correlation documented by added can be described as consisting of three components:
Rothstein (2014), but nd that it is mechanical and driven by (1) real, persistent teacher quality; (2) real, non-persistent
the fact that prior data were used to estimate value-added. teacher quality, and (3) estimation error (which is, of course,
They argue that Rothsteins failure to account for the me- not persistent). Of primary interest in many research stud-
chanical correlation makes his proposed placebo test based ies is the real, persistent component of teacher quality, with
on prior test scores uninformative about the validity of the the second and third factors contributing to the instability
teacher-switching approach, and present additional speci-
cations which show how the correlation of value-added
with changes in prior scores can be eliminated without sub- Chetty, Friedman, and Rockoff (2014c) address this point by demonstrating
stantively changing the estimated effect of value-added on that, because value-added is positively correlated within school-grade cells,
imputing the population mean (i.e., zero) results in attenuation of the co-
changes in current scores.18,19 ecient on value-added in the quasi-experimental regression. They argue
that the best way to assess the importance of missing data is to examine
the signicant subset of school-grade cells where there is little or no miss-
17
This approach is similar in spirit to Koedel (2008), where identifying ing data. This estimate of bias is quite close to zero, both in their data and
variation in the quality of high school math teachers is caused by changes in in Rothsteins replication study. Bacher-Hicks, Kane, and Staiger (2014) also
school stang and course assignments across cohorts. discuss the merits of the alternative approaches to dealing with missing data
18 in this context. In their study, reasonably imputing the expected impacts of
For example, a teacher who moves forward from grade-4 to grade-5
within a school will have an impact on changes in current and prior scores [teachers] with missing value-added does not change the nding regarding
of 5th graders; when Chetty, Friedman, and Rockoff (2014c) drop the small the predictive validity of value-added (p. 4).
20
number of teachers with these movements, this greatly reduces placebo ef- In addition to the studies covered in detail in this section, which in our
fects on lagged scores but leaves effects on current scores largely unchanged. view are the most narrowly-targeted on the question of bias in estimates of
Chetty, Friedman, and Rockoff (2014c) conclude that after accounting for teacher value-added, other notable studies that provide evidence consistent
these types of mechanical correlations, Rothsteins ndings are consistent with value-added measures containing useful information about teacher
with their interpretation, pointing to results in Rothsteins appendix to sup- productivity include Glazerman et al. (2013), Harris and Sass (2014), Jackson
port this claim. (2014), Jacob and Lefgren (2008), and Kane and Staiger (2012). Relatedly,
19 Deming (2014) tests for bias in estimates of school value-added using random
Rothstein also raises a concern about how Chetty, Friedman, and Rockoff
(2014a) deal with missing data. He presents an argument for why their quasi- school choice lotteries and fails to reject the hypothesis that school effects
experimental test should be conducted with value-added of zero imputed are unbiased.
21
for teachers with missing estimates, rather than dropping these teachers and This line of criticism is not specic to value-added. The use of other
their students, with the former approach resulting in a larger estimate of bias. high-stakes evaluation metrics could also alter behavior.
186 C. Koedel et al. / Economics of Education Review 47 (2015) 180195

with which this primary parameter is estimated.22 A num- be used to rank all teachers based on value-added as long as
ber of studies have provided evidence on the stability of es- enough teachers switch schools (Manseld, in press). Teach-
timated teacher value-added over time and across schools ers who share the same school are directly linked within
and classrooms (e.g., see Aaronson, Barrow, & Sander, 2007; schools, and school switchers link groups of teachers across
Chetty, Friedman, & Rockoff, 2014a; Glazerman et al., 2013; schools. Non-switchers can be compared via indirect link-
Goldhaber & Hansen, 2013; Jackson, 2014; Koedel, Leather- ages facilitated by the switchers. However, the reliance on
man, & Parsons, 2012; McCaffrey et al., 2009; Schochet & school switching to link the data comes with its own chal-
Chiang, 2013). lenges and costs in terms of precision. In the context of a
Studies that document the year-to-year correlation in es- model designed to estimate value-added for teacher prepa-
timated teacher value-added have produced estimates that ration programs, Mihaly et al., (2013) note indirect linkages
range from 0.18 to 0.64. Differences in the correlations across can make estimates imprecise, with the potential for signi-
studies are driven by several factors. One factor that inu- cant variance ination (p. 462).23
ences the precision with which teacher value-added can be Goldhaber, Walch, and Gabele (2013), Koedel, Leather-
estimated, and thus the correlation of value-added over time, man, and Parsons (2012) and McCaffrey et al. (2009) es-
is the studentteacher ratio. Goldhaber and Hansen (2013) timate the year-to-year correlation in teacher value-added
and McCaffrey et al. (2009) document the improved preci- from lagged-score models similar in structure to the one
sion in value-added estimates that comes with increasing shown in Eq. (2), and from additional specications that in-
teacher-level sample sizes. Goldhaber and Hansen (2013) clude school or student xed effects. Correlations of adjacent
show that the predictive value of past value-added over fu- year value-added measures estimated from models without
ture value-added improves with additional years of data. The school or student xed effects range between 0.47 and 0.64
improvement is non-linear and the authors nd that adding across these studies. McCaffrey et al. (2009) report an anal-
more years of data beyond three years is of limited practi- ogous correlation for a model that includes student xed ef-
cal value for predicting future teacher performance. Beyond fects of 0.29 (using a gainscore specication).24 Goldhaber,
the diminishing marginal returns of additional information, Walch, and Gabele (2013) and Koedel, Leatherman, and
another explanation for this result is that as the time hori- Parsons (2012) report correlations from models that include
zon gets longer, drift in real teacher performance (Chetty, school xed effects ranging from 0.18 to 0.33. Thus, while
Friedman, & Rockoff, 2014a) puts downward pressure on the there is a wide range of stability estimates throughout the
predictive power of older value-added measures. This drift literature, much of the variance in the estimated correla-
will offset the benets associated with using additional data tions is driven by modeling decisions. Put differently, among
unless the older data are properly down-weighted. McCaffrey similarly-specied models, there is fairly consistent evidence
et al. (2009) examine the benets of additional data purely with regard to the stability of value-added estimates.
in terms of teacher-level sample sizes, which are positively It is also important to recognize that for many policy
but imperfectly correlated with years of data. Unsurprisingly, decisions, the year-to-year stability in value-added is not
they document reductions in the average standard error for the appropriate measure by which to judge its usefulness
teacher value-added estimates as sample sizes increase. (Staiger & Kane, 2014). As an example consider a retention
A second factor that affects the stability of value-added decision to be informed by value-added. The relevant ques-
estimates is whether the model includes xed effects for tion to ask about the usefulness of value-added is how well
students and/or schools. Adding these layers of xed effects a currently-available measure predicts career performance.
narrows the identifying variation used to estimate teacher How well a currently-available measure predicts next years
value-added, which can increase imprecision in estimation. performance is not as important. Staiger and Kane (2014)
In models with student xed effects, estimates of teacher show that year-to-year correlations in value-added are sig-
value-added are identied by comparing teachers who share nicantly lower than year-to-career correlations, and it is
students; in models with school xed effects, the identifying these latter correlations that are most relevant for judging
variation is restricted to occur only within schools. the usefulness of value-added in many policy applications.
Despite relying entirely on within-unit variation (i.e., The question of how much instability in VAMs is ac-
school or student) for identication, school and student xed ceptable is dicult to answer in the abstract. On the one
effects models can facilitate comparisons across all teachers hand, the level of stability in teacher value-added is similar
as long as teachers are linked across units. For example, with to the level of stability in performance metrics widely used
multiple years of data, a model with school xed effects can in other professions that have been studied by researchers,
including salespeople, securities analysts, sewing-machine
operators and baseball players (Glazerman et al., 2010;
22
Standard data and methods are not sucient to distinguish between fac- McCaffrey et al., 2009). However, on the other hand, any de-
tors (2) and (3). For example, suppose that a non-persistent positive shock to cisions based on VAM estimates will be less than perfectly
estimated teacher value-added is observed in a given year. This shock could
correlated with the decisions one would make if value-added
be driven by a real positive shock in teacher performance (e.g., perhaps the
teacher bonded with her students more so than in other years) or measure-
was known with certainty, and it is theoretically unclear
ment error (e.g., the lagged scores of students in the teachers class were
affected by a random negative shock in the prior year, which manifests itself
23
in the form of larger than usual test-score gains in the current year). We Also note that the use of student xed effects strains the data in that
would need better data than are typically available to sort out these alterna- it requires estimating an additional n parameters using a large-N, small-T
tive explanations. For example, data from multiple tests of the same material dataset, which is costly in terms of degrees of freedom (Ishii & Rivkin, 2009).
24
given on different days could be used to separate out test measurement error For the McCaffrey et al. (2009) study we report simple averages of the
from teacher performance that is not persistent over classrooms/time. correlations shown in Table 2 across districts and grade spans.
C. Koedel et al. / Economics of Education Review 47 (2015) 180195 187

whether using imperfect data in teaching is less benecial (or controlling for multiple prior test scores can mitigate the in-
more costly) than in other professions. The most compelling uence of test measurement error. A simple but important
evidence on this question comes from studies that, taking the takeaway from Koedel, Leatherman, and Parsons (2012) is
instability into account, evaluate the merits of acting on esti- that one should not assume that measurement error in the
mates of teacher value-added to improve workforce quality lagged test score is constant across the test-score distribution.
(Boyd, Lankford, Loeb, & Wyckoff, 2011; Chetty, Friedman, & Although this assumption simplies the measurement-error
Rockoff, 2014b; Goldhaber & Theobald, 2013; Rothstein, correction procedure, it is not supported by the data and can
2015; Winters & Cowen, 2013). These studies consistently worsen model performance.
show that using information about teacher value-added im-
proves student achievement relative to the alternative of not 4.2. Student and school xed effects
using value-added information.
Chetty, Friedman, and Rockoff (2014a) show that a VAM
4. Evidence-based model selection and the estimation of along the lines of the one shown in Eq. (2) without school
teacher value-added and student xed effects produces estimates of teacher
value-added with very little bias (forecasting bias of approx-
Section 2 raised a number of conceptual and economet- imately 2.6%, which is not statistically signicant). Kane et al.
ric issues associated with the choice of a value-added model. (2013) and Kane and Staiger (2008) also show that models
Absent direct empirical evidence, a number of different spec- without school and student xed effects perform well. While
ications are defensible. This section uses the evidence pre- the experimental studies are not designed to assess the value
sented in Section 3, along with evidence from elsewhere in of using school xed effects because the randomization oc-
the literature, to highlight areas of consensus and disagree- curs within schools, they provide evidence consistent with
ment on key VAM specication issues. unbiased estimation within schools without the need to con-
trol for unobserved student heterogeneity using student xed
4.1. Gain-score versus lagged-score VAMs effects.
Thus, although school and student xed effects are con-
Gain-score VAMs have been used in a number of studies ceptually appealing because of their potential to reduce bias
and offer some conceptual appeal (Meghir & Rivkin, 2011). in estimates of teacher value-added, the best evidence to date
One benet is that they avoid the complications that arise suggests that their practical importance is limited. Without
from including lagged achievement, which is measured with evidence showing that these layers of xed effects reduce bias
error, as an independent variable. However, available evi- it is dicult to make a compelling case for their inclusion in
dence suggests that the gain-score specication does not VAMs. Minus this benet, it seems unnecessary to bear the
perform as well as the lagged-score specication under a cost of reduced stability in teacher value-added associated
broad range of estimation conditions (Andrabi et al., 2011; with including them, per the discussion in Section 3.2.
Guarino, Reckase, & Wooldridge, 2015; Kane & Staiger, 2008;
McCaffrey et al., 2009). Put differently, the restriction on the 4.3. One-step versus two-step VAMs
lagged-score coecient appears to do more harm than good.
Thus, most research studies and policy applications of value- Models akin to the standard, one-step VAM in Eq. (2) and
added use a lagged-score specication.25 the two-step VAM in Eqs. (4) and (5) perform well in the va-
It is noteworthy that the lagged-score VAM has out- lidity studies by Chetty, Friedman, and Rockoff (2014a), Kane
performed the gain-score VAM despite the fact that most and Staiger (2008) and Kane at al. (2013). Chetty, Friedman,
studies do not implement any procedures to directly ac- and Rockoff (2014a) is the only study of the three to examine
count for measurement error in the lagged-score control. the two modeling structures side-by-side they show that
The measurement-error problem is complicated because test value-added estimates from both models exhibit very little
measurement error in standard test instruments is het- bias. Specically, their estimates of forecasting bias are 2.6
eroskedastic (Andrabi et al., 2011; Boyd, Lankford, Loeb, and 2.2% for the one-step and two-step VAMs, respectively,
& Wyckoff, 2013; Koedel, Leatherman, & Parsons, 2012; and neither estimate is statistically signicant. If the primary
Lockwood & McCaffrey, 2014). Lockwood and McCaffrey objective is bias reduction, available evidence does not point
(2014) provide the most comprehensive investigation of to a clearly-preferred modeling structure.
the measurement-error issue in the value-added context of Although we are not aware of any studies that di-
which we are aware. Their study offers a number of sug- rectly compare the stability of value-added estimates from
gestions for ways to improve model performance by directly one-step and two-step models, based on the discussion in
accounting for measurement error that should be given care- Section 3.2 we do not expect signicant differences to emerge
ful consideration in future VAM applications. One straight- along this dimension either. The reason is that the two funda-
forward nding in Lockwood and McCaffrey (2014) is that mental determinants of stability that have been established
by the literature teacher-level sample sizes and the use of
school and/or student xed effects will not vary between
25
In addition to the fact that most research studies cited in this review use the one-step and two-step VAM structures.
a lagged-score VAM, lagged-score VAMs are also the norm in major policy Thus, the literature does not indicate a clearly preferred
applications where VAMs are incorporated into teacher evaluations e.g.,
see the VAMs that are being estimated in Washington DC (Isenberg & Walsh,
modeling choice along this dimension. This is driven in large
2014), New York City (Value Added Research Center, 2010) and Pittsburgh part by the fact that the two modeling structures produce
(Johnson et al., 2012). similar results Chetty, Friedman, and Rockoff (2014a) report
188 C. Koedel et al. / Economics of Education Review 47 (2015) 180195

a correlation in estimated teacher value-added across models sures, but no other covariates, is not far from their estimate
of 0.98 in their data.26 For most research applications, the based on the full model (2.58%, statistically insignicant).
high correlation offers sucient grounds to support either Compared to the lagged-achievement controls, whether
approach. However, in policy applications this may not be the to include the additional demographic and socioeconomic
case, as even high correlations allow for some (small) fraction controls that are typically found in VAMs is a decision that has
of teachers to have ratings that are substantively affected by been shown to be of limited practical consequence, at least
the choice of modeling structure (Castellano & Ho, 2015). in terms of global inference (Aaronson, Barrow, & Sander,
2007; Ehlert et al., 2013; McCaffrey et al., 2004). However,
two caveats to this general result are in order. First, the
degree to which including demographic and socioeconomic
4.4. Covariates
controls in VAMs is important will depend on the richness
of the prior-achievement controls in the model. Models with
We briey introduced the conditioning variables that
less information about prior achievement may benet more
have been used by researchers in VAM specications with
from the inclusion of demographic and socioeconomic con-
Eq. (2). Of those variables, available evidence suggests that
trols. Second, and re-iterating a point from above, in policy
the most important, by far, are the controls for prior achieve-
applications it may be desirable to include demographic and
ment (Chetty, Friedman, & Rockoff, 2014a; Ehlert et al., 2013;
socioeconomic controls in VAMs, despite their limited im-
Lockwood & McCaffrey, 2014; Rothstein, 2009).
pact on the whole, in order to guard against teachers in the
Rothstein (2009) and Ehlert et al. (2013) show that mul-
most disparate circumstances being systematically identied
tiple years of lagged achievement in the same subject are
as over- or under-performing. For personnel evaluations, it
signicant predictors of current achievement conditional on
seems statistically prudent to use all available information
single-lagged achievement in the same subject, which indi-
about students and the schooling environment to mitigate
cates that model performance is improved by including these
as much bias in value-added estimates as possible. This is in
variables. Ehlert et al. (2013) note that moving from a spec-
contrast to federal guidelines for state departments of educa-
ication with three lagged scores to one with a single lagged
tion issued in 2009, which discourage the use of some control
score does not systematically benet or harm certain types
variables in value-added models (United States Department
of schools or teachers (p. 19), but this result is sensitive
of Education, 2009).27
to the specication that they use. Lockwood and McCaffrey
(2014) consider the inclusion of a more comprehensive set
of prior achievement measures from multiple years and mul-
tiple subjects. They nd that including the additional prior- 5. What we know about teacher value-added
achievement measures attenuates the correlation between
estimated teacher value-added and the average prior same- 5.1. Teacher value-added is an educationally and economically
subject achievement of teachers students. meaningful measure
Chetty, Friedman, and Rockoff (2014a) examine the sen-
sitivity of value-added to the inclusion of different sets of A consistent nding across research studies is that there
lagged-achievement information but use data only from the is substantial variation in teacher performance as mea-
prior year. Specically, rather than extending the time hori- sured by value-added. Much of the variation occurs within
zon to obtain additional lagged-achievement measures, they schools (e.g., see Aaronson, Barrow, & Sander, 2007; Rivkin,
use lagged achievement in another subject along with aggre- Hanushek, & Kain, 2005). Hanushek and Rivkin (2010) report
gates of lagged achievement at the school and classroom lev- estimates of the standard deviation of teacher value-added
els. Similarly to Ehlert et al. (2013), Lockwood and McCaffrey in math and reading (when available) across ten studies and
(2014) and Rothstein (2009), they show that the additional conclude that the literature leaves little doubt that there are
measures of prior achievement are substantively important signicant differences in teacher effectiveness (p. 269). The
in their models. In one set of results they compare VAMs that effect of a one-standard deviation improvement in teacher
include each students own lagged achievement in two sub- value-added on student test scores is estimated to be larger
jects, with and without school- and classroom-average lagged than the effect of a ten student reduction in class size (Jepsen
achievement in these subjects, to a model that includes only & Rivkin, 2009; Rivkin et al., 2005).28
the students own lagged achievement in the same subject.
The models do not include any other control variables. Their
27
quasi-experimental estimate of bias rises from 3.82% (statis- The federal guidelines are likely politically motivated, at least in part
see Ehlert et al. (in press). Also note that some potential control variables
tically insignicant) with all controls for prior achievement,
reect student designations that are partly at the discretion of education
to 4.83% (statistically insignicant) with only controls for a personnel. An example is special-education status. The decision of whether
students own prior scores in both subjects, to 10.25% (sta- to include discretionary variables in high-stakes VAMs requires accounting
tistically signicant) with only controls for a students prior for the incentives to label students that can be created by their inclusion.
28
score in the same subject. Notably, their estimate of bias in Although the immediate effect on achievement of exposure to a high-
value-added teacher is large, it fades out over time. A number of studies
the model that includes the detailed prior-achievement mea- have documented rapid fade-out of teacher effects after one or two years
(Jacob, Lefgren, & Sims, 2010; Kane & Staiger, 2008; Rothstein, 2010). Chetty,
Friedman, and Rockoff (2014b) show that teachers impacts on test scores
26
This result is based on a comparison of fully-specied versions of the stabilize at approximately 25% of the initial impact after 34 years. It is not
two types of models. Larger differences may emerge when using sparser uncommon for the effects of educational interventions to fade out over time
specications. (e.g., see Currie & Thomas, 2000; Deming, 2009; Krueger & Whitmore, 2001).
C. Koedel et al. / Economics of Education Review 47 (2015) 180195 189

Research has demonstrated that standardized test scores Lockwood et al. (2007) compare teacher value-added on the
are closely related to school attainment, earnings, and eco- procedures and problem solving subcomponents of the
nomic outcomes (e.g., Chetty et al., 2011; Hanushek & same mathematics test.30 They report correlations of value-
Woessmann, 2008; Lazear, 2003; Murnane, Willett, Duhalde- added estimates across a variety of different VAM specica-
borde, & Tyler, 2000), which supports the test-based mea- tions. Estimates from their standard one-step, lagged-score
surement of teacher effectiveness. Hanushek (2011) com- VAM indicate a correlation of teacher value-added across
bines the evidence on the labor market returns to higher testing subcomponents of approximately 0.30.31 This corre-
cognitive skills with evidence on the variation in teacher lation is likely to be attenuated by two factors. First, as noted
value-added to estimate the economic value of teacher qual- by the authors, there is a lower correlation between student
ity. With a class size of 25 students, he estimates the economic scores on the two subcomponents of the test in their analytic
value of a teacher who is one-standard deviation above the sample relative to the national norming sample as reported
mean to exceed $300,000 across a range of parameterizations by the test publisher. Second, the correlational estimate is
that allow for sensitivity to different levels of true variation in based on estimates of teacher value-added that are not ad-
teacher quality, the decay of teacher effects over time, and the justed for estimation error, an issue that is compounded by
labor-market returns to improved test-score performance. the fact that the subcomponents of the test have lower relia-
Chetty, Friedman, and Rockoff (2014b) complement bility than the full test.32
Hanusheks work by directly estimating the long-term ef- Papay (2011) replicates the Lockwood et al. (2007) anal-
fects of teacher value-added. Following on their initial va- ysis with a larger sample of teachers and shrunken value-
lidity study (Chetty, Friedman, & Rockoff, 2014a), they use added estimates. The cross-subcomponent correlation of
tax records from the Internal Revenue Service (IRS) to exam- teacher value-added in his study is 0.55. Papay (2011) re-
ine the impact of exposure to teacher value-added in grades ports that most of the differences in estimated teacher value-
48 on a host of longer-term outcomes. They nd that a one- added across test subcomponents cannot be explained by
standard deviation improvement in teacher value-added in differences in test content or scaling, and thus concludes that
a single grade raises the probability of college attendance at the primary factor driving down the correlation is test mea-
age 20 by 2.2% and annual earnings at age 28 by 1.3%. For surement error.33 Corcoran, Jennings, and Beveridge (2011)
an entire classroom, their earnings estimate implies that re- perform a similar analysis and obtain similar results us-
placing a teacher whose performance is in the bottom 5% ing student achievement on two separate math and read-
in value-added with an average teacher would increase the ing tests administered in the Houston Independent School
present discounted value of students lifetime earnings by District. They report cross-test, same-subject correlations of
$250,000.29 In addition to looking at earnings and college teacher value-added of 0.59 and 0.50 in math and reading, re-
attendance, they also show that exposure to more-effective spectively. The tests they evaluate also differ in their stakes,
teachers reduces the probability of teenage childbearing, in- and the authors show that the estimated variance of teacher
creases the quality of the neighborhood in which a student value-added is higher on the high-stakes assessment in both
lives as an adult, and raises participation in 401(k) savings subjects.
plans. The practical implications of the correlations in teacher
value-added across subjects and instruments are not obvious.
5.2. Teacher value-added is positively but imperfectly The fact that the correlations across all alternative achieve-
correlated across subjects and across different tests within ment metrics are clearly positive is consistent with there
the same subject being an underlying generalizable teacher-quality compo-
nent embodied in the measures. This is in line with what
Goldhaber, Cowan, and Walch (2013) correlate estimates one would expect given the inuence of teacher value-
of teacher value-added across subjects for elementary teach- added on longer-term student outcomes (Chetty, Friedman, &
ers in North Carolina. They report that the within-year cor- Rockoff, 2014b). However, it is disconcerting that different
relation between math and reading value-added is 0.60. Af- testing instruments can lead to different sets of teachers be-
ter adjusting for estimation error, which attenuates the cor- ing identied as high- and low-performers, as indicated by
relational estimate, it rises to 0.80. Corcoran, Jennings, and the Corcoran, Jennings, and Beveridge (2011), Lockwood et al.
Beveridge (2011) and Lefgren and Sims (2012) also show that (2007) and Papay (2011) studies. Papays (2011) conclusion
value-added in math and reading are highly correlated. The
latter study leverages the cross-subject correlation to deter- 30
The authors note that procedures items cover computation using sym-
mine an optimal weighting structure for value-added in math bolic notation, rounding, computation in context and thinking skills, whereas
and reading given the objective of predicting future value- problem solving covers a broad range of more complex skills and knowledge
added for teachers most accurately. in the areas of measurement, estimation, problem solving strategies, number
Teacher value-added is also positively but imperfectly systems, patterns and functions, algebra, statistics, probability, and geom-
etry. The two sets of items are administered in separately timed sections.
correlated across different tests within the same subject.
(p. 49)
31
This is an average of their single-year estimates from the covariate ad-
justed model with demographics and lagged test scores from Table 2.
29
The Chetty, Friedman, and Rockoff (2014b) estimate is smaller than the 32
Lockwood et al. (2007) report that the subcomponent test reliabilities
comparable range of estimates from Hanushek (2011). Two key reasons are
are high, but not as high as the reliability for the full test.
(1) Hanushek uses a lower discount rate for future student earnings, and 33
Papay (2011) also compares value-added measures across tests that are
(2) Hanushek allows for more persistence in teacher impacts over time.
given at different points in the school year and shows that test timing is an
Nonetheless, both studies indicate that high-value-added teachers are of
additional factor that contributes to differences in teacher value-added as
substantial economic value.
estimated across testing instruments.
190 C. Koedel et al. / Economics of Education Review 47 (2015) 180195

that the lack of consistency in estimated value-added across correlated with value-added. Their preferred models that use
assessments is largely the product of test measurement er- a composite index to measure classroom practice imply that
ror seems reasonable, particularly when one considers the a 2.3 standard-deviation increase in the index (one point)
full scope for measurement error in student test scores (Boyd corresponds to a little more than a one-standard-deviation
et al., 2013). increase in teacher value-added in reading, and a little less
than a one-standard-deviation increase in math.36
5.3. Teacher value-added is positively but imperfectly The positive correlations between value-added and the
correlated with other evaluation metrics alternative, non-test-based metrics lends credence to the in-
formational value of these metrics in light of the strong valid-
Several studies have examined correlations between es- ity evidence emerging from the research literature on value-
timates of teacher value-added and alternative evaluation added. Indeed, out-of-sample estimates of teacher quality
metrics. Harris and Sass (2014) and Jacob and Lefgren (2008) based on principal ratings, observational rubrics and stu-
correlate teacher value-added with principal ratings from dent surveys positively predict teacher impacts on student
surveys. Jacob and Lefgren (2008) report a correlation of 0.32 achievement, although not as well as out-of-sample value-
between estimates of teacher value-added in math and prin- added (Harris & Sass, 2014; Jacob & Lefgren, 2008; Kane et al.,
cipal ratings based on principals beliefs about the ability of 2011; Kane & Staiger, 2012). At present, no direct evidence on
teachers to raise math achievement; the analogous correla- the longer-term effects of exposure to high-quality teachers
tion for reading is 0.29 (correlations are reported after ad- along these alternative dimensions is available.37
justing for estimation error in the value-added measures).
Harris and Sass (2014) report slightly larger correlations for 6. Policy applications
the same principal assessment 0.41 for math and 0.44 for
reading and also correlate value-added with overall prin- 6.1. Using value-added to inform personnel decisions
cipal ratings and report correlations in math and reading of in K-12 schools
0.34 and 0.38, respectively.34
Data from the Measures of Effective Teaching (MET) The wide variation in teacher value-added that has been
project funded by the Bill and Melinda Gates Foundation have consistently documented in the literature has spurred re-
been used to examine correlations between teacher value- search examining the potential benets to students of using
added and alternative evaluation metrics including classroom information about value-added to improve workforce qual-
observations and student surveys. Using data primarily from ity. The above-referenced studies by Hanushek (2011) and
middle school teachers who teach multiple sections within Chetty, Friedman, and Rockoff (2014b) perform direct calcu-
the same year, Kane and Staiger (2012) report that the corre- lations of the benets associated with removing ineffective
lation between teacher scores from classroom observations teachers and replacing them with average teachers (also see
and value-added in math, measured across different class- Hanushek, 2009). Their analyses imply that using estimates
rooms, ranges from 0.16 to 0.26 depending on the observa- of teacher-value added to inform personnel decisions would
tion rubric. The correlation between value-added in math and lead to substantial gains in the production of human capi-
student surveys of teacher practice, also taken from differ- tal in K-12 schools. Similarly, Winters and Cowen (2013) use
ent classrooms, is somewhat larger at 0.37. Correlations be- simulations to show how incorporating information about
tween value-added in reading and these alternative metrics teacher value-added into personnel policies can lead to im-
remain positive but are lower. They range from 0.10 to 0.25 provements in workforce quality over time.
for classroom observations across rubrics, and the correlation Monetizing the benets of VAM-based removal poli-
between student survey results and reading value-added is cies and comparing them to costs suggests that they will
0.25.35 easily pass a costbenet test (Chetty, Friedman, & Rock-
Kane, Taylor, Tyler, and Wooten (2011) provide related off, 2014b).38 A caveat is that the long-term labor supply
evidence by estimating value-added models to identify the response is unknown. Rothstein (2015) develops a struc-
effects of teaching practice on student achievement. Teach- tural model to describe the labor market for teachers that
ing practice is measured in their study by the observational
assessment used in the Cincinnati Public Schools Teacher 36
Rockoff and Speroni (2011) also provide evidence that teachers who re-
Evaluation System (TES). Cincinnatis TES is one of the few ceive better subjective evaluations produce greater gains in student achieve-
examples of a long-standing high stakes assessment with ex- ment. A notable aspect of their study is that some of the evaluation metrics
ternal peer evaluators, and thus it is quite relevant for pub- they consider come from prior to teachers being hired into the workforce.
37
lic policy (also see Taylor & Tyler, 2012). Consistent with Because many of these metrics are just emerging in rigorous form (e.g.,
see Weisberg et al., 2009), a study comparable to something along the lines of
the evidence from the MET project, the authors nd that
Chetty et al. (2014b) may not be possible for some time. In the interim, non-
observational measures of teacher practice are positively cognitive outcomes such as persistence in school and college-preparatory
behaviors, in addition to cognitive outcomes, can be used to vet these alter-
native measures similarly to the recent literature on VAMs.
34 38
The question from the Harris and Sass (2014) study for the overall rating Studies by Boyd et al. (2011) and Goldhaber and Theobald (2013) con-
makes no mention of test scores: Please rate each teacher on a scale from 1 sider the use of value-added measures to inform personnel policies within
to 9 with 1 being not effective to 9 being exceptional (p. 188). the more narrow circumstance of forced teacher layoffs. Both studies show
35
Polikoff (2014) further documents that the correlations between these that layoff policies based on value-added result in signicantly higher stu-
alternative evaluation measures and value-added vary across states, im- dent achievement relative to layoff policies based on seniority. Seniority-
plying that different state tests are differentially sensitive to instructional driven layoff policies were a central point of contention in the recent Vergara
quality as measured by these metrics. v. California court case.
C. Koedel et al. / Economics of Education Review 47 (2015) 180195 191

incorporates costs borne by workers from the uncertainty 6.2. Teacher value-added as a component of combined
associated with teacher evaluations based on value-added. measures of teaching effectiveness
He estimates that teacher salaries would need to rise un-
der a more-rigorous, VAM-based personnel policy. The salary Emerging teacher evaluation systems in practice are us-
increase would raise costs, but not by enough to offset the ing value-added as one component of a larger combined
benets of the improvements in workforce quality (Chetty, measure. Combined measures of teaching effectiveness also
Friedman, & Rockoff, 2014b; Rothstein, 2015). However, as typically include non-test-based performance measures like
noted by Rothstein (2015), his ndings are highly sensi- classroom observations, student surveys and measures of
tive to how he parameterizes his model. This limits in- professionalism.40 Many of the non-test-based measures are
ference given the general lack of evidence in the lit- newly emerging, at least in rigorous form (see Weisberg,
erature with regard to several of his key parameters Sexton, Mulhern, & Keeling, 2009). The rationale for incor-
in particular, the elasticity of labor supply and the de- porating multiple measures into teacher evaluations is that
gree of foreknowledge that prospective teachers possess teaching effectiveness is multi-dimensional and no single
about their own quality. As more-rigorous teacher evalua- measure can capture all aspects of quality that are impor-
tion systems begin to come online, monitoring the labor- tant.
supply response will be an important area of research. Ini- Mihaly, McCaffrey, Staiger, and Lockwood (2013) consider
tial evidence from the most mature high-stakes system the factors that go into constructing an optimal combined
that incorporates teacher value-added into personnel de- measure based on data from the MET project. The compo-
cisions the IMPACT evaluation system in Washington, nents of the combined measure that they consider are teacher
DC provides no indication of an adverse labor-supply re- value-added on several assessments, student surveys and ob-
sponse of yet (Dee & Wyckoff, 2013). servational ratings. Their analysis highlights the challenges
In addition to developing rigorous teacher evaluation sys- associated with properly weighting the various components
tems that formally incorporate teacher value-added, there in a combined-measure evaluation system. Assigning more
are several other ways that information about value-added weight to any individual component makes the combined
can be leveraged to improve workforce quality. For exam- measure a stronger predictor of that component in the fu-
ple, Rockoff, Staiger, Kane, and Taylor (2012) show that sim- ture, but a weaker predictor of the other components. Thus,
ply providing information to principals about value-added the optimal approach to weighting depends in large part on
increases turnover for teachers with low performance es- value judgments about the different components by policy-
timates and decreases turnover for teachers with high es- makers.41
timates. Correspondingly, average teacher productivity in- The most mature combined-measure evaluation system in
creases. Student achievement increases in line with what one the United States, and the one that has received the most at-
would expect given the workforce-quality improvement. tention nationally, is the IMPACT evaluation system in Wash-
Condie, Lefgren, and Sims (2014) suggest an alternative ington, DC. IMPACT evaluates teachers based on value-added,
use of value-added to improve student achievement. Al- teacher-assessed student achievement, classroom observa-
though they perform simulations indicating that VAM-based tions and a measure of commitment to the school commu-
removal policies will be effective (corroborating earlier stud- nity.42 Teachers evaluated as ineffective under IMPACT are
ies), they also argue that value-added can be leveraged to dismissed immediately, teachers who are evaluated as min-
improve workforce quality without removing any teachers. imally effective face pressure to improve under threat of
Specically, they propose to use value-added to match teach- dismissal (dismissal occurs with two consecutive minimally-
ers to students and/or subjects according to their compar- effective ratings), and teachers rated as highly effective re-
ative advantages (also see Goldhaber, Cowan, and Walch, ceive a bonus the rst time and an escalating bonus for two
2013). Their analysis suggests that data driven workforce re- consecutive highly-effective ratings (Dee & Wyckoff, 2013).
shuing has the potential to improve student achievement Using a regression discontinuity design, Dee and Wyckoff
by more than a removal policy targeted at the bottom 10% of (2013) show that teachers who are identied as minimally ef-
teachers. fective a single time are much more likely to voluntarily exit,
Glazerman et al. (2013) consider the benets of a differ- and for those who do stay, they are more likely to perform
ent type of re-shuing based on value-added. They study
a program that provides pecuniary bonuses to high-value-
40
School districts that are using or in the process of developing combined
added teachers who are willing to transfer to high-poverty
measures of teaching effectiveness include Los Angeles (Strunk, Weinstein,
schools from other schools. The objective of the program is & Makkonnen, 2014), Pittsburgh (Scott & Correnti, 2013) and Washington
purely equity-based. Glazerman et al. (2013) show that the DC (Dee & Wyckoff, 2013); state education agencies include Delaware, New
teacher transfer policy raises student achievement in high- Mexico, Ohio, and Tennessee (White, 2014).
41
poverty classrooms that are randomly assigned to receive a Polikoff (2014) further considers the extent to which differences in the
sensitivity of state tests to the quality and/or content of teachers instruction
high-value-added transfer teacher.39
should inuence how weights are determined in different states. He recom-
mends that states explore their own data to determine the sensitivity of their
tests to high-quality instruction.
42
The weight on value-added in the combined measure was 35% for teach-
ers with value-added scores during the 20132014 school year, down from
50% in previous years (as reported by Dee & Wyckoff, 2013). The 15% drop
39
There is some heterogeneity underlying this result. Specically, in the weight on value-added for eligible teachers was offset in 20132014
Glazerman et al. (2013) nd large, positive effects for elementary classrooms by a 15% weight on teacher-assessed student achievement, which was not
that receive a transfer teacher but no effects for middle-school classrooms. used to evaluate value-added eligible teachers in prior years.
192 C. Koedel et al. / Economics of Education Review 47 (2015) 180195

better in the following year.43 Dee and Wyckoff (2013) also consensus described in the previous paragraph). For example,
show that teachers who receive a rst highly-effective rating the research studies that have employed the strongest ex-
improve their performance in the following year, a result that perimental and quasi-experimental designs to date indicate
the authors attribute to the escalating bonus corresponding that the scope for bias in estimates of teacher value-added
to two consecutive highly-effective ratings. from standard models is quite small.44 Similarly, aggregat-
In Washington DC, at least during the initial years of ing available evidence on the stability of value-added reveals
IMPACT, there is no prima facie evidence of an adverse consistency in the literature along this dimension as well.
labor-supply effect. Dee and Wyckoff (2013) report that new In addition, agreement among researchers on a variety of
teachers in Washington DC entering after the second year of model-specication issues has emerged, ranging from which
IMPACTs implementation were rated as signicantly more covariates are the most important in the models to whether
effective than teachers who left (about one-half of a standard gain-score or lagged-score VAMs produce more reliable esti-
deviation of IMPACT scores). The labor-supply response in mates.
Washington DC could be due to the simultaneous introduc- Still, our understanding of a number of issues would ben-
tion of performance bonuses in IMPACT and/or idiosyncratic et from more evidence. An obvious area for expansion of
aspects of the local teacher labor market. Thus, while this is research is into higher grades, as the overwhelming majority
an interesting early result, labor-supply issues merit careful of studies focus on students and teachers in elementary and
monitoring as similar teacher evaluation systems emerge in middle schools. The stronger sorting that occurs as students
other locales across the United States. move into higher grades will make estimating value-added
for teachers in these grades more challenging, but not im-
7. Conclusion possible. There are also relatively few studies that examine
the portability of teacher value-added across schools (e.g.,
This article has reviewed the literature on teacher value- see Jackson, 2013; Xu, Ozek, & Corritore, 2012). A richer ev-
added, covering issues ranging from technical aspects of idence base on cross-school portability would be helpful for
model design to the use of value-added in public policy. A designing policies aimed at, among other things, improv-
goal of the review has been to highlight areas of consensus ing equity in access to effective teachers (as in Glazerman
and disagreement in research. Although a broad spectrum of et al., 2013). Another area worthy of additional exploration
views is reected in available studies, along a number of im- is the potential to improve instructional quality by using in-
portant dimensions the literature appears to be converging formation about value-added to inform teacher assignments,
on a widely-accepted set of facts. ranging from which subjects to teach to how many students
Perhaps the most important result for which consistent to teach (Condie, Lefgren, & Sims, 2014; Goldhaber, Cowan,
evidence has emerged in research is that students in K-12 & Walch, 2013; Hansen, 2014). More work is also needed
schools stand to gain substantially from policies that incor- on the relationships between value-added and alternative
porate information about value-added into personnel deci- evaluation metrics, like those used in combined measures
sions for teachers (Boyd et al., 2011; Chetty, Friedman, & of teaching effectiveness (e.g., classroom observations, stu-
Rockoff, 2014b; Condie, Lefgren, & Sims, 2014; Dee & dent surveys, etc.). Using value-added to validate these mea-
Wyckoff, 2013; Glazerman, 2013; Goldhaber, Cowan, & sures, and determining how to best combine other measures
Walch, 2013; Goldhaber & Theobald, 2013; Hanushek, 2009, with value-added to evaluate teachers, can help to inform
2011; Rothstein, 2015; Winters & Cowen, 2013). The consis- decision-making in states and school districts working to de-
tency of this result across the variety of policy applications velop and implement more rigorous teacher evaluation sys-
that have been considered in the literature is striking. In some tems.
applications the costs associated with incorporating value- As state and district evaluation systems come online and
added into personnel policies would likely be small exam- start to mature, it will be important to examine how students,
ples include situations where mandatory layoffs are required teachers and schools are affected along a number of dimen-
(Boyd et al., 2011; Goldhaber & Theobald, 2013), and the use sions (with Dee & Wyckoff, 2013, serving as an early exam-
of value-added to adjust teaching responsibilities rather than ple). Heterogeneity in implementation across systems can be
make what are likely to be more-costly and less-reversible leveraged to learn about the relative merits of different types
retention/removal decisions (Condie, Lefgren, & Sims, 2014; of personnel policies, ranging from retention/removal poli-
Goldhaber, Cowan, & Walch, 2013). However, even in more cies to merit pay to the data driven re-shuing of teacher re-
costly applications, such as the use of value-added to inform sponsibilities. Studying these systems will also shed light on
decisions about retention and removal, available evidence some of the labor-market issues raised by Rothstein (2015),
suggests that the benets to students of using value-added about which we currently know very little.
to inform decision-making will outweigh the costs (Chetty, In summary, the literature on teacher value-added has
Friedman, & Rockoff, 2014b; Rothstein, 2015). provided a number of valuable insights for researchers and
In addition to the growing consensus among researchers policymakers, but much more remains to be learned. Given
on this critical issue, our review also uncovers a number of what we already know, this much is clear: the implementa-
other areas where the literature is moving toward general tion of personnel policies designed to improve teacher quality
agreement (many of which serve as the building blocks for the
44
As described above, the caveat to this result is that the absence of bias in
43
Dee and Wyckoffs research design is not suited to speak to whether the current research settings does not preclude bias elsewhere or in the future.
improvement represents a causal effect on performance or is the result of Nonetheless, available evidence to date offers optimism about the ability of
selective retention. value-added to capture performance differences across teachers.
C. Koedel et al. / Economics of Education Review 47 (2015) 180195 193

is a high-stakes proposition for students in K-12 schools. It Deming, D. J. (2009). Early childhood intervention and life-cycle skill devel-
is the students who stand to gain the most from ecacious opment: Evidence from head start. American Economic Journal: Applied
Economics, 1(3), 111134.
policies, and who will lose the most from policies that come Deming, D. J. (2014). Using school choice lotteries to test measures of school
up short. effectiveness. American Economic Review, 104(5), 406411.
Ehlert, M., Koedel, C., Parsons, E., & Podgursky, M. (2013). The sensitivity
of value-added estimates to specication adjustments: Evidence from
school- and teacher-level models in Missouri. Statistics and Public Policy,
Acknowledgment 1(1), 1927.
Ehlert, M., Koedel, C., Parsons, E., & Podgursky, M. (2014). Choosing the right
growth measure: Methods should compare similar schools and teachers.
The authors gratefully acknowledge useful comments Education Next, 14(2), 6671.
from Dan McCaffrey. The usual disclaimers apply. Ehlert, M., Koedel, C., Parsons, E., & Podgursky, M. (in press). Selecting growth
measures for use in school evaluation systems: Should proportionality
matter? Educational Policy.
Glazerman, S., Loeb, S., Goldhaber, D., Staiger, D., Raudenbush, S., &
References
Whitehurst, G. (2010). Evaluating teachers: The important role of value-
added. Policy Report, Brown Center on Education Policy at Brookings.
Aaronson, D., Barrow, L., & Sander, W. (2007). Teachers and student achieve- Glazerman, S., Protik, A., Teh, B. -r., Bruch, J., Max, J., & Warner, E. (2013).
ment in the Chicago public high schools. Journal of Labor Economics, 25(1), Transfer incentives for high-performing teachers: Final results from a
95135. multisite randomized experiment. United States Department of Education.
Andrabi, T., Das, J., Khwaja, A. I., & Zajonc, T. (2011). Do value-added esti- Goldhaber, D., & Chaplin, D. (2015). Assessing the Rothstein falsication
mates add value? Accounting for learning dynamics. American Economic test." Does it really show teacher value-added models are biased?. Journal
Journal: Applied Economics, 3(3), 2954. of Research on Educational Effectiveness, 8(1), 834.
Bacher-Hicks, A., Kane, T. J., & Staiger, D. O. (2014). Validating teacher ef- Goldhaber, D., Cowan, J., & Walch, J. (2013). Is a good elementary teacher
fect estimates using changes in teacher assignments in Los Angeles (NBER always good? Assessing teacher performance estimates across subjects.
Working Paper No. 20657). Economics of Education Review, 36(1), 216228.
Baker, E. L., Barton, P. E., Darling-Hammond, L., Haertel, E., Ladd, H. F., Linn, Goldhaber, D., & Hansen, M. (2013). Is it just a bad class? Assessing
R. L., et al. (2010). Problems with the use of student test scores to evaluate the stability of measured teacher performance. Economica, 80(319),
teachers (Economic Policy Institute Brieng Paper No. 278). 589612.
Barlevy, G., & Neal, D. (2012). Pay for percentile. American Economic Review, Goldhaber, D., Liddle, S., & Theobald, R. (2013). The gateway to the profession:
102(5), 18051831. Evaluating teacher preparation programs based on student achievement.
Ben-Porath, Y. (1967). The production of human capital and the life-cycle of Economics of Education Review, 34(1), 2944.
earnings. Journal of Political Economy, 75(4), 352365. Goldhaber, D., & Theobald, R. (2013). Managing the teacher workforce in
Betts, J. R., & Tang, Y. E. (2008). Value-added and experimental studies of the austere times: The determinants and implications of teacher layoffs.
effect of charter schools on student achievement: A literature review. Bothell, Education Finance and Policy, 8(4), 494527.
WA: National Charter School Research Project, Center on Reinventing Goldhaber, D., Walch, J., & Gabele, B. (2013). Does the model matter? Ex-
Public Education. ploring the relationship between different student achievement-based
Betts, J. R., Zau, A. C., & King, K. (2005). From blueprint to reality: San Diegos teacher assessments. Statistics and Public Policy, 1(1), 2839.
education reforms. San Francisco, CA: Public Policy Institute of California. Guarino, C. M., Reckase, M. D., Stacey, B. W., & Wooldridge, J. W. (2015).
Biancarosa, G., Byrk, A. S., & Dexter, E. (2010). Assessing the value-added Evaluating specication tests in the context of value-added estimation.
effects of literacy collaborative professional development on student Journal of Research on Educational Effectiveness, 8(1), 3559.
learning. Elementary School Journal, 111(1), 734. Guarino, C., Maxeld, M., Reckase, M. D., Thompson, P., & Wooldridge, J. M.
Boyd, D., Lankford, H., Loeb, S., & Wyckoff, J. (2011). Teacher layoffs: An (2014). An evaluation of empirical Bayes estimation of value-added teacher
empirical illustration of seniority v. measures of effectiveness. Education performance measures. Unpublished manuscript.
Finance and Policy, 6(3), 439454. Guarino, C. M., Reckase, M. D., & Wooldridge, J. W. (2015). Can value-added
Boyd, D., Lankford, H., Loeb, S., & Wyckoff, J. (2013). Measuring test measure- measures of teacher performance be trusted?. Education Finance and
ment error: A general approach. Journal of Educational and Behavioral Policy, 10(1), 117156.
Statistics, 38(6), 629663. Hanushek, E. A. (2009). Teacher deselection. In D. Goldhaber, & J. Hann-
Campbell, D.T. (1976). Assessing the impact of planned social change (Occa- away (Eds.), Creating a new teaching profession. Washington, DC: Urban
sional Paper Series #8). Hanover, NH: The Public Affairs Center, Dart- Institute.
mouth College. Hanushek, E. A. (2011). The economic value of higher teacher quality. Eco-
Castellano, K. E., & Ho, A. D. (2015). Practical differences among aggregate- nomics of Education Review, 30(3), 266479.
level conditional status metrics: From median student growth per- Hansen, M. (2014). Right-sizing the classroom: Making the most of great teach-
centiles to value-added models. Journal of Educational and Behavioral ers (CALDER Working Paper No. 110).
Statistics, 40(1), 3568. Hanushek, E. A. (1979). Conceptual and empirical issues in the estimation of
Chetty, R., Friedman, J., Hilger, N., Saez, E., Schanzenbach, D. W., & Yagan, D. educational production functions. The Journal of Human Resources, 14(3),
(2011). How does your kindergarten classroom affect your earnings? 351388.
Evidence from project star. Quarterly Journal of Economics, 126(4), Hanushek, E. A., & Rivkin, S. G. (2010). Generalizations about using value-
15931660. added measures of teacher quality. American Economic Review, 100(2),
Chetty, R., Friedman, J. N., & Rockoff, J. E. (2014a). Measuring the impacts of 267271.
teachers I: Evaluating bias in teacher value-added estimates. American Hanushek, E. A., & Woessmann, L. (2008). The role of cognitive skills in eco-
Economic Review, 104(9), 25932632. nomic development. Journal of Economic Literature, 46(3), 607668.
Chetty, R., Friedman, J. N., & Rockoff, J. E. (2014b). Measuring the impacts Harris, D. N., & Sass, T. R. (2011). Teacher training, teacher quality, and student
of teachers II: Teacher value-added and student outcomes in adulthood. achievement. Journal of Public Economics, 95, 798812.
American Economic Review, 104(9), 26332679. Harris, D. N., & Sass, T. R. (2014). Skills, productivity and the evaluation of
Chetty, R., Friedman, J. N., & Rockoff, J. E. (2014c). Response to Rothstein (2014) teacher performance. Economics of Education Review, 40, 183204.
on Revisiting the impacts of teachers. Unpublished manuscript, Harvard Herrmann, M., Walsh, E., Isenberg, E., & Resch, A. (2013). Shrinkage of
University. value-added estimates and characteristics of students with hard-to-predict
Clotfelter, C. T., Ladd, H. F., & Vigdor, J. L. (2006). Teacher-student matching achievement levels. Policy Report, Mathematica Policy Research.
and the assessment of teacher effectiveness. Journal of Human Resources, Isenberg, E., & Walsh, E. (2014). Measuring school and teacher value added
41(4), 778820. in DC, 2012-2013 school year: Final report. Policy Report, Mathematica
Condie, S., Lefgren, L., & Sims, D. (2014). Teacher heterogeneity, value-added Policy Research (01.17.2014).
and education policy. Economics of Education Review, 40(1), 7692. Ishii, J., & Rivkin, S. G. (2009). Impediments to the estimation of teacher value
Corcoran, S., Jennings, J. L., & Beveridge, A. A. (2011). Teacher effectiveness on added. Education Finance and Policy, 4(4), 520536.
high- and low-stakes tests. Unpublished manuscript, New York University. Jackson, K. (2013). Match quality, worker productivity, and worker mobility:
Currie, J., & Thomas, D. (2000). School quality and the longer-term effects of Direct evidence from teachers. Review of Economics and Statistics, 95(4),
head start. Journal of Human Resources, 35(4), 755774. 10961116.
Dee, T., & Wyckoff, J. (2013). Incentives, selection and teacher performance. Jackson, K. (2014). Teacher quality at the high-school level: The importance
Evidence from IMPACT (NBER Working Paper No. 19529). of accounting for tracks. Journal of Labor Economics, 23(4), 645684.
194 C. Koedel et al. / Economics of Education Review 47 (2015) 180195

Jacob, B. A., & Lefgren, L. (2008). Can principals identify effective teachers? Mihaly, K., McCaffrey, D. F., Staiger, D. O., & Lockwood, J. R. (2013). A compos-
Evidence on subjective performance evaluations in education. Journal of ite estimator of effective teaching. RAND External Publication EP-50155.
Labor Economics, 26(1), 101136. Morris, C. N. (1983). Parametric empirical Bayes inference: Theory and
Jacob, B. A., Lefgren, L., & Sims, D. P. (2010). The persistence of teacher- applications. Journal of the American Statistical Association, 78(381),
induced learning gains. Journal of Human Resources, 45(4), 915943. 4755.
Jepsen, C., & Rivkin, S. G. (2009). Class reduction and student achievement: Murnane, R. J., Willett, J. B., Duhaldeborde, Y., & Tyler, J. H. (2000). How
The potential tradeoff between teacher quality and class size. Journal of important are the cognitive skills of teenagers in predicting subsequent
Human Resources, 44(1), 223250. earnings?. Journal of Policy Analysis and Management, 19(4), 547568.
Johnson, M., Lipscomb, S., Gill, B., Booker, K., & Bruch, J. (2012). Value-added Newton, X. A., Darling-Hammond, L., Haertel, E., & Thomas, E. (2010). Value-
model for Pittsburgh public schools. Unpublished report, Mathematica Pol- added modeling of teacher effectiveness: An exploration of stability
icy Research. across models and contexts. Education Policy Analysis Archives, 18(23),
Kane, T. J., McCaffrey, D. F., Miller, T., & Staiger, D. O. (2013). Have we identi- 127.
ed effective teachers? Validating measures of effective teaching using Nye, B., Konstantopoulos, S., & Hedges, L. V. (2004). How large are teacher
random assignment. Seattle, WA: Bill and Melinda Gates Foundation. effects?. Educational Evaluation and Policy Analysis, 26(3), 237257.
Kane, T. J., Rockoff, J. E., & Staiger, D. O. (2008). What does certication tell us Papay, J. P. (2011). Different tests, different answers: The stability of teacher
about teacher effectiveness? Evidence from New York City. Economics of value-added estimates across outcome measures. American Educational
Education Review, 27(6), 615631. Research Journal, 48(1), 163193.
Kane, T. J., & Staiger, D. O. (2002). The promise and pitfalls of using imprecise Pauer, N. A., & Amrein-Beardsley, A. (2014). The random assignment of stu-
school accountability measures. Journal of Economic Perspectives, 16(4), dents into elementary classrooms: Implications for value-added anal-
91114. yses and interpretations. American Educational Research Journal, 51(2),
Kane, T. J., & Staiger, D. O. (2008). Estimating teacher impacts on stu- 328362.
dent achievement: An experimental evaluation (NBER Working Paper No. Polikoff, M. S. (2014). Does the test matter? Evaluating teachers when tests
14607). differ in their sensitivity to instruction. In T. J. Kane, K. A. Kerr, & R.
Kane, T. J., & Staiger, D. O. (2012). Gathering feedback for teaching: Combining C. Pianta (Eds.), Designing teacher evaluation systems: New guidance from
high-quality observations with student surveys and achievement gains. the measures of effective teaching project (pp. 278302). San Francisco,
Seattle, WA: Bill and Melinda Gates Foundation. CA: Jossey -Bass.
Kane, T. J., Taylor, E. S., Tyler, J. H., & Wooten, A. L. (2011). Identifying effective Rivkin, S. G., Hanushek, E. A., & Kain, J. F. (2005). Teachers, schools and
classroom practices using student achievement data. Journal of Human academic achievement. Econometrica, 73(2), 417458.
Resources, 46(3), 587613. Rockoff, J. E., & Speroni, C. (2011). Subjective and objective evaluations of
Kinsler, J. (2012). Assessing Rothsteins critique of teacher value-added mod- teacher effectiveness: Evidence from New York City. Labour Economics,
els. Quantitative Economics, 3(2), 333362. 18(5), 687696.
Koedel, C. (2008). Teacher quality and dropout outcomes in a large, urban Rockoff, J. E., Staiger, D. O., Kane, T. J., & Taylor, E. S. (2012). Information
school district. Journal of Urban Economics, 64(3), 560572. and employee evaluation: Evidence from a randomized intervention in
Koedel, C. (2009). An empirical analysis of teacher spillover effects in sec- public schools. American Economic Review, 102(7), 31843213.
ondary school. Economics of Education Review, 28(6), 682692. Rothstein, J. (2009). Student sorting and bias in value-added estimation: Se-
Koedel, C., & Betts, J. R. (2011). Does student sorting invalidate value-added lection on observables and unobservables. Education Finance and Policy,
models of teacher effectiveness? An extended analysis of the Rothstein 4(4), 537571.
critique. Education Finance and Policy, 6(1), 1842. Rothstein, J. (2010). Teacher quality in educational production: Tracking,
Koedel, C., Leatherman, R., & Parsons, E. (2012). Test measurement error and decay, and student achievement. Quarterly Journal of Economics, 125(1),
inference from value-added models. B. E. Journal of Economic Analysis & 175214.
Policy, 12(1) (Topics). Rothstein, J. (2014). Revising the impacts of teachers. Unpublished manuscript,
Koedel, C., E. Parsons, M. Podgursky and M. Ehlert (in press). Teacher prepa- University of California-Berkeley.
ration programs and teacher quality: Are there real differences across Rothstein, J. (2015). Teacher quality policy when supply matters. American
programs? Education Finance and Policy. Economic Review, 105(1), 100130.
Konstantopoulos, S., & Chung, V. (2011). The persistence of teacher effects in Rothstein, J., & Mathis, W. J. (2013). Review of two culminating reports from
elementary grades. American Educational Research Journal, 48, 361386. the MET project. National Education Policy Center Report.
Krueger, A. B., & Whitmore, D. M. (2001). The effect of attending a small class Sass, T. R., Hannaway, J., Xu, Z., Figlio, D. N., & Feng, L. (2012). Value added of
in the early grades on college test taking and middle school test results: teachers in high-poverty schools and lower poverty schools. Journal of
Evidence from project STAR. Economic Journal, 111(468), 128. Urban Economics, 72, 104122.
Lazear, E. P. (2003). Teacher incentives. Swedish Economic Policy Review, 10(3), Sass, T. R., Semykina, A., & Harris, D. N. (2014). Value-added models and the
179214. measurement of teacher productivity. Economics of Education Review,
Lefgren, L., & Sims, D. P. (2012). Using subject test scores eciently to predict 38(1), 923.
teacher value-added. Educational Evaluation and Policy Analysis, 34(1), Schochet, P. Z., & Chiang, H. S. (2013). What are error rates for classifying
109121. teacher and school performance measures using value-added models?.
Lockwood, J. R., & McCaffrey, Daniel F. (2014). Correcting for test score mea- Journal of Educational and Behavioral Statistics, 38(3), 142171.
surement error in ANCOVA models for estimating treatment effects. Jour- Scott, A. & Correnti, R. (2013). Pittsburghs new teacher improvement system:
nal of Educational and Behavioral Statistics, 39(1), 2252. Helping teachers help students learn. Policy Report, A+ Schools, Pittsburgh,
Lockwood, J. R., McCaffrey, D. F., Hamilton, L. S., Stecher, B., Le, V. -N., & PA.
Martinez, J. F. (2007). The sensitivity of value-added teacher effect esti- Staiger, D. O., & Kane, T. J. (2014). Making decisions with imprecise perfor-
mates to different mathematics achievement measures. Journal of Edu- mance measures: The relationship between annual student achievement
cational Measurement, 44(1), 4767. gains and a teachers career value-added. In T. J. Kane, K. A. Kerr, & R.
Manseld, R.K. (in press). Teacher quality and student inequality. Journal of C. Pianta (Eds.), Designing teacher evaluation systems: New guidance from
Labor Economics. the measures of effective teaching project (pp. 144169). San Francisco,
McCaffrey, D. F., Lockwood, J. R., Koretz, D. M., Louis, T. A., & Hamilton, L. CA: Jossey -Bass.
(2004). Models for value-added modeling of teacher effects. Journal of Strunk, K. O., Weinstein, T. L., & Makkonnen, R. (2014). Sorting out the signal:
Educational and Behavioral Statistics, 29(1), 67101. Do multiple measures of teachers effectiveness provide consistent in-
McCaffrey, D. F., Sass, T. R., Lockwood, J. R., & Mihaly, K. (2009). The intertem- formation to teachers and principals?. Education Policy Analysis Archives,
poral variability of teacher effect estimates. Education Finance and Policy, 22(100), 141.
4(4), 572606. Taylor, E. S., & Tyler, J. H. (2012). The effect of evaluation on teacher perfor-
Meghir, C., & Rivkin, S. G. (2011). Econometric methods for research in edu- mance. American Economic Review, 102(7), 36283651.
cation. In E. A. Hanushek, S. Machin, & L. Woessmann (Eds.). Handbook Todd, P. E., & Wolpin, K. I. (2003). On the specication and estimation of the
of the economics of education: Vol. 3. Waltham, MA: Elsevier. production function for cognitive achievement. The Economic Journal,
Mihaly, K., McCaffrey, D. F., Sass, T. R., & Lockwood, J. R. (2010). Centering 113, F3F33.
and reference groups for estimates of xed effects: Modications to United States Department of Education. (2009). Growth models: Non-
felsdvreg. The Stata Journal, 10(1), 82103. regulatory guidance. Unpublished.
Mihaly, K., McCaffrey, D., Sass, T. R., & Lockwood, J. R. (2013). Where you Value Added Research Center. (2010). NYC teacher data initiative: Tech-
come from or where you go? Distinguishing between school quality and nical report on the NYC value-added model. Unpublished report,
the effectiveness of teacher preparation program graduates. Education Wisconsin Center for Education Research, University of Wisconsin-
Finance and Policy, 8(4), 459493. Madison.
C. Koedel et al. / Economics of Education Review 47 (2015) 180195 195

Weisberg, D., Sexton, S., Mulhern, J., & Keeling, D. (2009). The widget effect: Winters, M. A., & Cowen, J. M. (2013). Would a value-added system of
Our national failure to acknowledge and act on differences in teacher effec- retention improve the distribution of teacher quality? A simulation
tiveness. New York: The New Teacher Project. of alternative policies. Journal of Policy Analysis and Management,
White, T. (2014). Evaluating teachers more strategically. Using performance re- 32(3), 634654.
sults to streamline evaluation systems. Policy Report, Carnegie Foundation Xu, Z., Ozek, U., & Corritore, M. (2012). Portability of teacher effectiveness
for the Advancement of Teaching. across school settings (CALDER Working Paper No. 77).

Das könnte Ihnen auch gefallen