Sie sind auf Seite 1von 38

810520

research-article2018
EPXXXX10.1177/0895904818810520Educational PolicyMintz and Kelly

Article
Educational Policy
1­–38
Science Teacher © The Author(s) 2018
Article reuse guidelines:
Motivation and sagepub.com/journals-permissions
DOI: 10.1177/0895904818810520
https://doi.org/10.1177/0895904818810520
Evaluation Policy in a journals.sagepub.com/home/epx

High-Stakes Testing
State

Jessica A. Mintz1 and Angela M. Kelly1

Abstract
This qualitative case study explored the teachers’ and administrators’
perceptions of a newly implemented teacher evaluation policy in a high-
stakes testing state, and how this policy impacted their motivation. Five
science teachers and their immediate supervisors were interviewed,
and their perceptions were analyzed through motivational theories of
incentivizing career behaviors. Findings suggest the overarching goal
of improving teacher practice through accountability was facilitated
by intrinsic motivation and challenged by weaknesses in policy design.
These tensions could be mediated by localized control that improves
stakeholder agency, peer learning communities, and the adoption of more
reliable evaluation metrics. Implications for teacher buy-in of evaluation
policy are discussed.

Keywords
educational policy, high-stakes accountability, policy implementation,
qualitative research, science education, secondary education, state policies,
supervision, teacher-administrator relations, teacher quality

1Stony Brook University, NY, USA

Corresponding Author:
Angela M. Kelly, Associate Professor, Institute for STEM Education, Stony Brook University,
092 Life Sciences, Stony Brook, NY 11794-5233, USA.
Email: angela.kelly@stonybrook.edu
2 Educational Policy 00(0)

Introduction
Teacher evaluation and accompanying educational reforms have received
considerable attention across the United States in recent years. Research in
educational policy has called for the need to evaluate the practices, percep-
tions, and motivation of teachers to understand the successes and challenges
of policy changes (Cuevas, Ntoumanis, Fernández-Bustos, & Bartholomew,
2018; Datnow & Castellano, 2000). The most promising evaluation reform
initiatives involved multiple sources of data and actively cultivated teacher
agency (Darling-Hammond, Amrein-Beardsley, Haertel, & Rothstein, 2012;
Louis, Febey, & Schroeder, 2005). Successful implementation of new policy
depends largely upon teacher acceptance and motivation, therefore, research-
ers and policy makers must be attentive to teachers’ responses to change (de
Jesus & Lens, 2005).
National- and state-level policies have impacted local teacher evaluation
systems by requiring measurement of teacher performance, pathways for pro-
fessional improvement, and mechanisms for identifying underperforming
teachers. In May 2011, the New York State Education Department (NYSED)
initiated a plan to use state standardized assessment (“Regents”) results as a
partial quantitative factor in measuring the effectiveness of science teachers
(New York State Legislature, 2010). The decision to use these assessment
scores was influenced by funds made available by Race to the Top, a program
that offered competitive grant money to states that linked student achieve-
ment data to individual teachers (American Recovery and Reinvestment Act
[ARRA], 2009).
New York’s Annual Professional Performance Review (APPR) system
placed emphasis on evaluating science teachers’ abilities with high stakes
attached. Teachers may have been placed on a teacher improvement plan or
terminated if found to be developing or ineffective, rather than using scores
to provide meaningful professional development (Pallas, 2012). Examining
teacher and administrator insights regarding teacher evaluation is essential as
they are the key players in the learning process (Goe, Bell, & Little, 2008).
Because teacher quality has been identified as the primary influence on stu-
dent achievement (Stronge, Ward, Tucker, & Hindman, 2007), their motiva-
tion and perceptions of the evaluation process impact pedagogical practice
and, thus, student learning (Hopkins, 2016).
This study is one of the first to focus on the motivation and perceptions
of science teachers as well as their direct supervisors in performance review.
Research has indicated that New York science teachers have been impacted
by state standards in the evaluation process more so than teachers in other
subjects and science teachers in other states (Louis et al., 2005; NORC at
Mintz and Kelly 3

the University of Chicago, 2018), though there has been little or no recent
research targeting the evaluation of these teachers in light of recent legisla-
tion. This is consistent with research that suggested studies on reform
efforts lag behind policy implementation, which in turn shifts in response to
constituent push-back (Coburn, Hill, & Spillane, 2016). In New York state,
nearly all high school science educators teach curricula that culminate in
high-stakes Regents exams, which is not the case for most teachers in other
disciplines. These exams often determine graduation eligibility as students
must pass a biology and physical science exam to earn a Regents diploma
(NYSED, 2015a). Secondary science teachers are also unique in that their
disciplinary supervisors often do not share their areas of certification;
research has suggested that subject-specific feedback is necessary to pro-
vide useful knowledge leading to pedagogical improvement (Hill &
Grossman, 2013).
The purpose of this study is to provide insights on teacher motivation
related to the implementation of science teacher evaluation reform in a high-
stakes testing state, New York, with the ultimate goal of providing recom-
mendations to inform future efforts to promote professional motivation,
excellence in science teaching, and student learning. The results from this
study have broader implications for the use of student performance data in
teacher evaluations and reach beyond science teachers in New York, inform-
ing policy across disciplines and states. The findings contribute to the larger
discussion of the use of high-stakes achievement measures in evaluating
teachers, policy enactment and its impact on motivation, and mechanisms for
effective accountability implementation.
This qualitative study explored connections among teachers’ and
administrators’ motivation and perceptions, reform implementation fac-
tors, and consequences of a teacher evaluation system based upon class-
room observations and student outcome measures. This study included
five secondary science teachers with varied years of experience and the
administrators responsible for their evaluations. Both teachers and admin-
istrators were included to reflect the varying perspectives of two groups of
key stakeholders—those responsible for overseeing implementation of the
policy at the departmental level, and those directly bearing the conse-
quences of the policy in terms of their professional performance evalua-
tions. The research questions were as follows:

Research Questions 1: How did the APPR system affect the motivation,
perceptions, and professional satisfaction of science teachers and their
administrators? What challenges have been identified?
4 Educational Policy 00(0)

Research Questions 2: What are the recommendations of science


teachers and administrators on how to evaluate science teachers
effectively?

Background
This study employed a motivational lens to generate explanatory con-
structs for improving teacher evaluation policy implementation. Teacher
evaluation requires effective measurement, capacity building, and appro-
priate incentives to motivate teachers to improve professional practice
(Firestone, 2014). Research has shown that most teachers have exhibited a
sense of agency regarding their students’ academic performance, and
teachers’ individual motivation was dependent upon the alignment of pol-
icy incentives with their desire to see their students succeed (Finnigan &
Gross, 2007). Accountability policies that overemphasized student testing
performance often resulted in teachers experiencing pressure, stress, and
diminished motivation (Cuevas et al., 2018). School leaders have been a
critical factor in mitigating teachers’ responses to accountability policies,
and their efforts have impacted teacher motivation to comply with educa-
tional reforms (Leithwood, Steinbach, & Jantzi, 2002). In the current cli-
mate of teacher performance measures gaining importance with respect to
school-level measures, more research is needed on educators’ perceptions
and responses to accountability reforms, and how such reforms influence
motivation (Harris & Herrington, 2015).

Conceptual Framework
This qualitative study incorporated a priori theories of motivational incen-
tives in exploring science teacher and administrator perspectives of APPR.
Research has suggested that teachers must be motivated to implement reforms
that are based upon top-down policy initiatives (de Jesus & Lens, 2005).
Motivation has been defined as inspiration or reason for acting or behaving a
certain way (Ryan & Deci, 2000). Two overarching theories of motivation are
often evident in the design of teacher evaluation systems: intrinsic and extrin-
sic. Intrinsically motivated individuals engage in tasks because they experi-
ence inherent interest and joy in their work (Eccles & Wigfield, 2002).
Competence and autonomy are basic psychological needs that often lead to
intrinsic motivation. Extrinsic motivation involves performing a task because
of anticipated separate consequences. Although extrinsic motivation has
often been perceived as the weaker incentive for self-directed change, indi-
viduals may integrate external prompts if they are congruent with their values
Mintz and Kelly 5

and beliefs (Ryan & Deci, 2000). Consequently, intrinsic and extrinsic moti-
vations are not necessarily discrete entities.
Teachers with autonomy have often internalized the goals of administra-
tors, principals, or educational leaders if they found those goals reasonable
and within their power to achieve (de Jesus & Lens, 2005). When the goals
came from a valid authority figure, such as a respected principal or adminis-
trator, specific and challenging goals produced greater effort. Even with rapid
proliferation of accountability policies, teachers still reported their work was
less influenced by these policies and influenced more by student and peer
validation (Finnigan & Gross, 2007; Firestone, Nordin, Shcherbakov, Blitz,
& Kirova, 2014).

Challenges of motivational incentives.  Combining incentives into one teacher


assessment policy has proven to be difficult. Researchers have reported the
interaction impacts of extrinsic and intrinsic incentives have been mixed
(Ryan & Deci, 2006; Springer, Ballou, & Peng, 2008). Firestone (2014)
suggested three main challenges when combining incentives: (a) intrinsic
incentives are challenged by extrinsic incentives, (b) high-stakes exams
may not provide teachers with productive feedback, and (c) the time
required for administrators to collect evidence of effectiveness competes
with the time to promote teacher self-efficacy. The autonomy required to
support intrinsic incentives has been undermined when the rewards were
predicable, as in pay for performance or merit pay (Firestone, 2014). Weiss
(2012) reported difficulty in designing assessments that could monitor the
distribution of extrinsic rewards while simultaneously creating intrinsic
ones. Teachers have used accountability assessment data productively when
the data were immediately available and did not have high stakes attached
(Jennings, 2012). Last, the time required for administrators to collect the
necessary information to distribute extrinsic rewards has competed with
time needed to create the working conditions that would increase teacher
efficacy (Firestone, 2014). Research has shown that administrators have
less time to provide support and productive feedback to teachers in need,
often distributing responsibilities to others to minimize the increased
demands of accountability policies (González & Firestone, 2013). Conse-
quently, teachers may not receive valid information to build pedagogical
capacity, which negatively impacts motivation.
A fundamental question in teacher evaluation reform is how to balance
the need for intrinsic motivation with the demands of external account-
ability, and how these evaluations are used to improve instruction. The two
parts to this question are informed by different theories of motivation, as
summarized in Figure 1. The first part relies on extrinsic incentives to
6 Educational Policy 00(0)

Figure 1.  Conceptual framework.


Source. Adapted from de Jesus and Lens (2005), Eccles and Wigfield (2002), Firestone (2014),
Ryan and Deci (2000).

motivate teachers to improve instruction. The second part focuses on


intrinsic incentives, whereby teachers engage in professional development
of their own accord to improve instruction (Firestone, 2014). This study
draws upon the interaction of these two motivational theories to investi-
gate stakeholders’ perceptions of evaluation reform, identifying common
ground and factors that challenge their integration. By comparing the
views of both teachers and administrators, accountability goals may be
analyzed in terms of contextualized experiences. A balance between intrin-
sic and extrinsic considerations may be necessary to motivate teachers to
improve their pedagogical practices.

Evaluating Teacher Quality


The United States has attempted to establish a system that identifies strug-
gling schools and rewards those showing innovative and effective practices,
and strategies for enacting these reforms are important considerations when
examining impacts on teacher motivation. Since the passage of No Child Left
Mintz and Kelly 7

Behind (NCLB; U.S. Department of Education, 2001) and Race to the Top
(U.S. Department of Education, 2009), the connection between teacher effec-
tiveness and student achievement has prompted reform efforts in teacher
evaluation (Strong, Gargani, & Hacifazlioglu, 2011). States have developed
and implemented new evaluation systems to improve upon prior methods
that were unable to differentiate teacher quality (Weisberg, Sexton, Mulhern,
& Keeling, 2009). The current practice of teacher compensation based upon
years of experience and academic credentials has had little impact on student
learning (Hanushek, Kain, O’Brien, & Rivkin, 2005). Further research has
shown that teacher evaluations ranked most teachers as satisfactory or good,
but failed to recognize great teachers and offered little professional develop-
ment for poorly performing teachers (Forman & Markson, 2015; Kraft &
Gilmour, 2017). Most researchers and policy makers have agreed that the
evaluation process is more productive when it facilitates meaningful profes-
sional development, which promotes intrinsic motivation, rather than serving
as a summative judgment of teacher performance, which may negatively
influence motivation due to external pressure (Finnigan & Gross, 2007;
Gigante & Firestone, 2008).
The most common way to measure teacher effectiveness has been through
subjective observation. This type of formative evaluation, intended to develop
pedagogical skills, has been criticized because it often relies upon one evalu-
ator’s perception of teacher effectiveness without validating data to support
his or her interpretation (Strong et al., 2011). School leaders have tended to
be lenient in teacher evaluation scores (Kimball & Milanowski, 2009), which
have rarely been used in making consequential personnel decisions (Murphy,
Hallinger, & Heck, 2013). However, practice-based teacher evaluation is
arguably the most direct evidence of a teacher’s ability to affect student learn-
ing and would suggest that these observations be heavily weighted when cal-
culating a teacher summative score, although multiple measures are needed
(Darling-Hammond et al., 2012). Teacher observation is a complex process
that requires appropriate instruments and training to yield valid and reliable
results. The accuracy and fairness of evaluation metrics are essential for pro-
viding useful feedback and incentivizing excellent teaching (Firestone,
2014).
Teachers have also been evaluated based on value-added models (VAM)
of student achievement; these models consider student growth in the teach-
er’s evaluation in an attempt to control for outside factors that affect student
achievement, measuring teacher effectiveness through adjusted gains in
standardized test scores (Harris, 2009). Student performance data have
sometimes been associated with more accurate teacher evaluations
(Hopkins, 2016). However, other researchers questioned the usefulness of
8 Educational Policy 00(0)

VAMs because they also correlated to student background characteristics,


with more able students demonstrating higher normative gains (Hill,
Kapitula, & Umland, 2011; Steinberg & Garrett, 2016). Large per-teacher
variability in VAMs has been observed due to the characteristics of students
assigned in a given year (Darling-Hammond et al., 2012). Regardless, Race
to the Top has designated measurement of student achievement as an essen-
tial component of teacher evaluation (ARRA, 2009). The Every Student
Succeeds Act (ESSA), passed in 2015, made concessions regarding teacher
evaluation policy. Districts no longer had to prove their teachers were
“highly qualified” to receive Title I funds. ESSA (2015) gave states new
flexibility for rating systems and designated student achievement as an
optional component of teacher evaluations.
Recent teacher evaluation methods tend to utilize a composite score that
incorporates both criterion-referenced (observations) and norm-referenced
(student achievement) measures (Kraft & Gilmour, 2017). This process was
supported by the Measure of Effective Teaching Project, which revealed pos-
itive correlations among five different observation rubrics and student
achievement. Reliability of scores was increased when multiple observations
were averaged and student surveys were added (Bill and Melinda Gates
Foundation, 2011). However, VAMs of teacher effectiveness used within the
Gates Foundation study have been found to be problematic. VAMs produced
an inconsistent pattern of results for individual teachers over time, the mea-
sures were affected by the students assigned to teachers, and VAMs were
unable to assess the many other influences that contributed to student prog-
ress (Darling-Hammond et al., 2012). Other studies have shown weak or
inconsistent correlations between observations and student performance
measures (Harris, Ingle, & Rutledge, 2014; Kimball & Milanowski, 2009;
Xu, Grant, & Ward, 2016). The weighting of multiple measures in a compos-
ite score risks inaccuracy, as the reliability of individual metrics may be prob-
lematic (Martínez, Schweig, & Goldschmidt, 2016). Researchers have
suggested using multiple years of data in designing evaluations (Conley &
Glasman, 2008), however, longitudinal studies have often experienced prob-
lems with missing data and content-related validity of state assessments
(Amrein-Beardsley, 2008). Regardless of the evaluation method, the process
must be reliable and meaningful to motivate teachers to improve their perfor-
mance (Cuevas et al., 2018; Finnigan & Gross, 2007).

Teacher Evaluations: The Views of Teachers and Administrators


Teacher evaluation has sparked debate over how to hold teachers accountable
for student outcomes and the best strategies to implement reforms effectively
Mintz and Kelly 9

(Baker et al., 2010; Papay, 2012). Although research has shown that teacher
evaluation policy offers promise for improving student learning (Louis et al.,
2005), few studies have explored teacher and administrator views of these
reforms and how the process impacts professional motivation. Teachers have
resisted educational policy change if they felt that reforms did not match the
views of their professional communities (Jiang, Sporte, & Luppescu, 2015).
Their perceptions of new policies have been influenced by structural and
social conditions of schools and relationships with school administration
(Louis et al., 2005; Malen, 2003).
Educators generally believe that evaluation is a worthwhile practice
(Clipa, 2015). Teachers have expressed that evaluations should be used to
measure and develop their pedagogical skills, but development should be
prioritized (Marzano, 2012). A recent study on teacher perceptions of evalu-
ation reform in Chicago showed that teachers were concerned about the addi-
tion of student growth as a part of their evaluations but found the observation
process provided useful feedback (Jiang et al., 2015). Principals voiced con-
cerns about the perceived inequities of the teacher evaluation system and its
impact on teachers, particularly when evaluations were not used for instruc-
tional improvement (Kimball & Milanowski, 2009). These concerns were
intensified when their resource and time management issues were not
addressed during transitions to new evaluation policy (Derrington &
Campbell, 2015). The concerns of teachers and administrators are critical
considerations when examining the potential of evaluation initiatives to moti-
vate teachers and build capacity for student achievement.

Method
Research Design
The focus of this phenomenological case study was to explore and describe
the shared meaning of lived experiences for a group of individuals
(Creswell, 2013), and analyze the impact of evaluation policy upon teacher
motivation in a localized context. Case study is an appropriate approach
for examining policy implementation with key stakeholders as units of
analysis (Yin, 1994). The shared experience in this case study was the
external implementation of a state-mandated teacher evaluation system
that was primarily based upon student performance scores and teacher
observations. Qualitative research has been recommended for evaluating
educational policy because understanding district subsystems provides
nuanced insights into the connection between policy and practice (Sadler,
1985). The researchers explored teacher and administrator perceptions of
10 Educational Policy 00(0)

APPR, motivations to improve teaching practice and student learning, and


recommendations regarding teacher evaluation policy. This iterative pro-
cess utilized a priori theoretical constructs and elements of grounded the-
ory, where unexpected key statements from interviews were identified and
coded. Inductive and deductive interpretations were formulated to build a
novel explanatory framework for how the enacted policy may have dif-
fered from what was intended in terms of teacher motivation.

Context
APPR in New York State.  On May 28, 2011, New York’s Senate and Assembly
voted to structure teacher evaluations with 40% of the composite score based
upon student achievement and 60% based upon observations. A primary fea-
ture of the law was that every school district in the state prepared and imple-
mented an APPR that began in the 2012-2013 academic year. Teachers were
annually reviewed for performance based upon a composite score of student
growth, student achievement, and other measures such as teacher observa-
tions. These categories are described in more detail below.

Composite score. In New York, every classroom teacher in the state


received an annual Composite Effectiveness Score (CES), a number
between 0 and 100. The CES was calculated by taking 20% student growth
on state assessments, 20% locally selected measures of student achieve-
ment, and the remaining 60% was based on other measures (usually obser-
vations). Based on the CES, the teacher’s APPR was classified into one
of four categories specified in regulations of the New York State Educa-
tion Commissioner. This scale was referred to as the “HEDI scale,” based
on an acronym of the individual categories of Highly Effective, Effective,
Developing, and Ineffective. Although the point value classifications were
identical for every school district (NYSED, 2013, Table 1), districts had
the freedom of assigning different point values to HEDI scales for student
growth and student achievement.

Student growth.  Student growth was the measure of the change in a stu-
dent’s scores between two or more points in time (NYSED, 2015b). To mea-
sure student growth, objectives needed to be defined to show evidence that
a student had learned more science. A Student Learning Objective (SLO)
was an academic goal set by the educator at the beginning of each school
year (Tyler, 2011). The SLO contained information on the student population,
the learning content, the instructional time frame, the assessments used to
measure the goal, the baseline level of the students in the class, the expected
Mintz and Kelly 11

Table 1.  Teacher Evaluation Rating Categories.

Rating Student growth Student achievement Other measures


Highly Results well Results are well above district- Overall performance
Effective above district adopted expectations for and results exceed
91-100 goals. student learning standards. standards.
Effective Results meet Results meet district-set Overall performance
75-90 district-set goals for student learning meets standards.
goals. standards.
Developing Results are Results are below district-set Overall performance
65-74 below district- goals for student learning and results need
set goals standards. improvement to
meet standards.
Ineffective Results are well Results are well below district- Overall performance
0-64 below district- set goals for student learning and results do not
set goals. standards. meet standards.

target by the end of the course, district-based HEDI ratings, and rationale
as to why the teacher chose such targets. SLOs for Regents-level courses
were required to use the Regents exam results as evidence of student learning
during the instructional time frame. The living environment and chemistry
courses measured student growth using a baseline exam and the state Regents
exam as a summative assessment.

Locally selected measures of student achievement.  New York State Regents


exam scores were required for the locally selected measure in the four major
science disciplines (biology, chemistry, Earth science, physics). This measure
differed from an SLO in that the teacher did not set the target for the students,
rather, the school district set the target in a district-generated HEDI scale. The
HEDI scale was typically the same as the SLO without a baseline assessment
or teacher target score. For example, a school district may have used a sci-
ence Regents passing score as the target. The percentage of students meeting
that target was then converted to a number between 1 and 20.

Teacher observations.  In total, 60% of the teacher’s composite score was


based upon other measures that awarded a point value from one to 60. Other
measures included classroom observations using a state-approved teacher
practice rubric. The school districts had some discretion when assigning the
breakdown of the 60 points. The majority of the composite score came from
two or more direct teacher observations—one was an announced observation
and one was an informal observation by a principal or trained administrator.
12 Educational Policy 00(0)

A formal observation was scheduled and usually required meetings before


and after the observation with the administrator. An informal observation
was one that was not scheduled but was typically followed by a post-obser-
vation discussion. The state noted that at least 31 points of the 60 possible
other measure points had to come from unannounced observations (NYSED,
2015b). The NYSED selected four acceptable teacher observation rubrics,
giving school districts the opportunity to choose the rubric that best suited
their needs. The participating districts used Danielson’s (2008) Framework
for Teaching or the New York State United Teachers (NYSUT, 2011) Practice
Rubric. The rubrics were generalized and applicable to all subject areas but
lacked discipline-specific metrics.

Participants.  The study included five secondary science teachers with var-
ied ranges of experience and five administrators responsible for their evalu-
ations across Suffolk County, New York. The geographical area is comprised
of the eastern end of Long Island, measuring 2,400 square miles with 69
different school districts. The county encapsulates the characteristic fea-
tures of a much larger school district in a small area and has a long tradition
of state-mandated test-based accountability. There is a great range of vari-
ance in school tax revenue, expenditures, and educational attainment due to
segregation, the fractured structure of education, and the major differences
in property taxes (Long Island Index, 2009). However, schools in the area
typically experienced organizational stability with low teacher turnover due
to relatively high teacher salaries.
Participants were chosen based upon their years of experience, school dis-
trict demographics, and position as a secondary living environment or chem-
istry teacher or school science administrator/supervisor. The goal of the
maximum variation sampling process (Patton, 1990) was to gather teachers
with a range of educational experience among different school districts to
obtain a representative view of teacher and administrator perceptions. The
use of such purposeful sampling allowed the researchers to employ cross-
case analysis to identify key motivational themes that were consistent among
a variety of school contexts.
The study required participants to have experienced the change in teacher
evaluation law, and science teachers with more than 5 years of experience
would have been teaching during this transitional period. Science teachers
were also chosen based on their content certifications. Science teachers who
taught the living environment course were chosen because the accompanying
high-stakes Regents exam was required for graduation. To broaden the per-
spectives of the participants and to make the study more generalizable, sci-
ence teachers who taught Regents chemistry were chosen as their courses
Mintz and Kelly 13

were not technically required for graduation but also culminated in a Regents
exam. Consequently, qualitative data revealed unique descriptions of teacher
and administrator perspectives, as well as shared patterns elicited from a het-
erogeneous sample (Patton, 1990).
The five participating pairs, with each pair including a teacher and the
administrator responsible for her evaluation, were employed by the same
school district. Because it was the administrator’s responsibility to explain
the evaluation process to the teachers, these pairs were purposely chosen
to analyze the relationship between the evaluator perceptions and the
teacher perceptions. The participants had a variety of procedures for
teacher evaluation. The administrators reported a range of 40 to 78 teach-
ers they were directly responsible for supervising. Most of the administra-
tors had undergraduate majors and certifications in science, with one
exception. The two chemistry teachers who participated in this study were
observed and evaluated by administrators who did not have chemistry
majors or certifications. The teacher, administrator, and district descrip-
tions are summarized in Table 2.

Data Collection and Analysis


This study included five secondary science teachers with varied years of
experience and the administrators responsible for their evaluations. Because
this study relied on their expressed motivations and views of the evaluation
system, semi-structured interview protocols (Appendices A and B) allowed
for the questions to be broad and general so the participants could construct
the meaning of the interview, allowing for more discussion (Creswell, 2013).
The participants typically provided significant details when responding to
questions, and the interviewers provided follow-up prompts to elicit addi-
tional relevant information. When conducting phenomenological research,
Polkinghorne (1989) recommended researchers interview five to 25 individu-
als who have experienced the phenomena. A total of 10 participants were
interviewed based on research related to subject saturation and variability
(Guest, Bunce, & Johnson, 2006). After one set of interviews in the pilot
study and 10 additional interviews, no additional themes were generated that
would warrant further recruitment.

Development of coding scheme.  Provisional coding, which included a prede-


termined set of codes that were anticipated categories or responses, was
utilized in this study based upon prior research and a pilot study (Miles &
Huberman, 1994). These interpretations were based upon ways in which
teacher and administrator responses indicated how their motivation and
Table 2.  Teacher and Administrator Descriptions and Respective School Characteristics.

14
Science teachers

Participant Jack Annie Christine Robert Sarah

Undergraduate Degree Genetics Biology Environmental Studies Biology Chemistry


Certifications Earth Science Biology Biology Biology Chemistry
Biology General Science General Science Chemistry Biology
General Science General Science
Content Taught Living Environment Living Environment Living Environment Chemistry Chemistry
Years Experience 18 10 16 20 10
Number of Colleagues in Content Area 6 17 8 9 8
Number of Science Teachers 23 41 30 27 38
District Demographics 51% Free and reduced 57% Free and reduced 50% Free and reduced 12% Free and 9% Free and reduced
lunch lunch lunch reduced lunch lunch
7% ESL 5% ESL 11% ESL 2% ESL 1% ESL
60% Underrepresented 41% Underrepresented 42% Underrepresented 9% Underrepresented 25% Underrepresented
minorities minorities minorities minorities minorities

Administrators responsible for supervision

Participant Jane Adam Charles Rich Stacy

Undergraduate Degree Science—Med Tech Elementary Education Biology Physics Environmental Science
Certifications Biology Kindergarten and Biology Physics Earth Science
General Science Grades 1-6 General Science General Science Biology
Chemistry Administrator Administration Math 7-12 General Science
Administrator Administrator Administration
Type of Administrator Science & Technology Assistant Principal Science & Technology Science & Technology Science Director
Director Director Chairperson
Years Experience 36 19 28 17 24
(Teaching + Admin)
Teachers Directly Supervised 78 50-60 40-50 49 42

Note. ESL = English as a second language.


Mintz and Kelly 15

professional lives were affected by the newly implemented APPR policy.


Similar to the studies of Tuytens and Devos (2009) and Jiang et al. (2015),
this study employed a priori theoretical constructs from Fullan (2001) as a
way to delineate factors influencing teacher motivation. Fullan (2001)
identified two significant reform implementation themes that often affect
how teachers change their practices and beliefs in response to new policies:
clarity and practicality. Clarity addresses how policy goals are explicitly
operationalized, and practicality refers to whether teachers consider the
new policy feasible to implement (Fullan, 2001). Four major codes related
to motivation, clarity, practicality, and impression were identified; these
provisional codes were expanded into multiple categories and then com-
bined with inductively generated codes to develop major explanatory con-
structs concerning the effectiveness of the APPR science teacher evaluation
system in promoting motivation.
Based on the provisional codes, versus coding was used to identify binary
terms that were in direct conflict with each other (Saldaña, 2013). Fullan
(2001) suggested that teachers change their pedagogical practices and beliefs
about education when reform efforts are clearly explained and are practical to
implement. Based on these constructs, teachers were asked to explain how
their evaluation scores were generated. Science teachers and administrators
were also probed during the interviews to discuss their motivation related to
the evaluation process and how this affected their teaching practices or
administrative responsibilities.
This study simultaneously followed a modified grounded theory approach
to collecting and categorizing interview data to formulate an explanatory
framework (Glaser & Strauss, 1967). Rather than collecting data and quanti-
fying them through a traditional deductive approach, grounded theory com-
pares and classifies data inductively based on their own characteristics.
Different fragments of qualitative interview data were assigned codes, or
phrases that captured essential meanings (Saldaña, 2013). Four different
stages of coding were used to categorize data: open, axial, selective, and the-
oretical. Open coding organized the questions and responses into initial cat-
egories and common themes (Strauss & Corbin, 1990). Interview transcripts
were coded line-by-line following the guidelines of Corbin and Strauss
(2014). These initial, open codes were comparative and tentative, yet concep-
tualized and situated within existing research in teacher evaluation.
Subsequent coding phases searched for causal explanations and theoretical
constructs in the data that fit within the motivation framework (Miles &
Huberman, 1994). Axial coding reorganized the data to reveal new catego-
ries. Descriptive codes were systematically placed into specific categories,
and emerging links among these categories were identified. Selective coding
16 Educational Policy 00(0)

created a story from emerging categories by linking information between and


across discrete categories. Finally, theoretical coding aligned the data with
what is known in the field by specifying relationships, integrating partici-
pants’ perspectives on common experiences, and allowing construction of a
linear narrative to provide thematic insights.

Validity and reliability. Standardized criteria to ensure rigor in qualitative


research were employed to make the results of this study more credible (Bar-
usch, Gringeri, & George, 2011). The researchers utilized four of the eight
strategies presented by Creswell (2013) for establishing objectivity in quali-
tative studies: reflexivity, analyst and data triangulation, member checking,
and thick description. Reflexivity was practiced by engaging in critical self-
reflection about potential biases; this was necessary because the researchers
were the primary “instrument” for data analysis (Watt, 2007). Analyst trian-
gulation was employed to minimize the impact of researcher bias. The two
researchers analyzed the same interview transcripts, then compared their
cross-case findings (Patton, 1999). Data triangulation was also achieved by
incorporating the views of teachers, administrators, and the researchers in a
comparative case study design. Alignments between coding schemes were
constructed and disagreements resolved through extended discussions to
achieve interrater reliability.
Member checking was utilized as a means to validate the researchers’
interpretations of participant experiences (Barusch et al., 2011). The research-
ers asked clarifying questions during the discussions and contacted the par-
ticipants during analysis of the post-interview transcripts to verify perceptions
and reduce ambiguities. Finally, the researchers provided a detailed descrip-
tion of the participants, their school districts, and the relationships between
the science teachers and administrators. Thick description, a method to
improve external validity for a qualitative study, is a detailed depiction of
field experiences in which the researcher makes clear the study context
(Holloway, 1997; Lincoln & Guba, 1985). Extended quotes were presented to
give “voice” to the participants.

Findings
Data revealed that the policy goal to foster teacher motivation and profes-
sional development through accountability met with mixed results. Teachers
expressed evidence of intrinsic motivation, however, there were clear chal-
lenges related to accountability metrics and lack of stakeholder agency. The
secondary science teachers in this study were particularly affected because of
content specialization and the high-stakes exams their students were required
Mintz and Kelly 17

to pass. Teacher and administrator perceptions of these challenges provided


insights into reform implementation and its potential effectiveness. Three
important themes were elicited from the interviews: (a) teacher professional
development was facilitated through intrinsic motivators; (b) professional
motivation and improvement were challenged by issues related to lack of
clarity, perceived unfairness, and lack of agency; and (c) stakeholders pro-
vided insightful suggestions for strengthening the APPR policy to maximize
its intended potential and improve teacher motivation.

Fostering Professional Development Through Motivation


The design of teacher evaluation policy in New York has been influenced by
extrinsic and intrinsic motivation theories to reward great teachers, remediate
ineffective teachers, and develop sound teaching practices (NYSED, 2015b).
With these incentives in mind, science teachers were asked to discuss what
motivated them in their careers. All teachers interviewed were motivated
intrinsically. They reported self-reflection, the desire to be a better educator,
passion for content, and satisfaction in making a difference in the lives of
their students. Robert shared that his professional motivation came from two
main sources—his sincere interest in his discipline and the rewarding nature
of working with high school students:

I think it’s the passion for the subject in terms of what I’m teaching. I’ve taken
such an interest in chemistry. I’ve also taken such an interest in learning new
things through the science research program. The second part is the interaction
with the students and just working with kids—students—that interaction is
incredible. It’s fun and it’s the reason I like coming here every day. (Robert)

The data also indicated the science teachers recognized the importance
of evaluation and accountability as extrinsic motivations, and felt the
observation process was the most effective of the three APPR categories.
They generally felt the dialogues with administrators were practical and
constructive, and they demonstrated reasonable knowledge of the observa-
tion process and identified the rubric used to generate their observation
scores. Both science teachers and administrators found the conversations
about science lessons had improved as a result of APPR and the required
observation rubrics. The teachers found that the observation rubrics pro-
vided the instrumentality or the explicit means to succeed with clear lesson
expectations, and they sometimes received more productive feedback
when administrators used the rubric as a discussion guide after the lesson.
Christine felt as if the conversations during the post-conferences with her
18 Educational Policy 00(0)

administrator, Charles, were useful and provided her with information that
could improve her practice:

. . . we have a pre-conference, we have the observation, and then we have the


post-conference. I come to the pre-conference with my plan, and we discuss it,
and they suggest changes I make. They ask me why do you do this or do you
think this might help or they’ll suggest things, and I’ll take those into
consideration when I’m actually finalizing my lesson plan for the observation.
Then afterwards it’s like that next step how do you think it went, what could
you have done differently. That’s everything—there is a dialogue. I do feel that
so far it’s been helpful. They give constructive criticism, it’s never back
shaming—in my experience. I can only talk about myself. But I’ve had only
positive feedback. (Christine)

The expectations were made clear and she found the dialogue formative,
which was an important aspect of her buying into the process. Despite their
general positive feelings about observations, all participating teachers desired
more content-specific recommendations. It is notable that the science teach-
ers observed by administrators who did not share the same teaching certifica-
tion voiced this concern more often. For example, Annie, a teacher whose
certification was different from her supervisor’s, stated, “I am a strong
believer in [evaluation]. And I think it should always be external. I do think
that it needs to be someone who has experience in teaching and possibly even
experience teaching that subject.”
Some teachers in this study voiced concerns with extrinsic motivators,
such as publicly available ratings, stating that it would lead to teacher
competition rather than teacher collaboration. One of the goals of APPR
was to develop excellent teachers, however, categorical rating compari-
sons to within-school and out-of-school colleagues had consequences.
When external evaluation promotes competitiveness and ranks teachers in
relation to their colleagues, these same teachers may compete for a higher
composite score and demonstrate unwillingness to share pedagogical
tools. Jack described this potential threat to the spirit of professional
collaboration:

Colleagues should really work together. Part of teaching, part of the idea of
education is mentoring, learning from other individuals, sharing knowledge
and I feel as though when you put that competition aspect into it you’re not
going to inspire people to be better, you’re going to inspire people to kind of be
greedy and money always leads to that and when you bring that variable into
an equation, who knows where it goes, who knows how it’s going to fracture
education. (Jack)
Mintz and Kelly 19

Other teachers concurred that the APPR did not nurture their motivation,
rather, the extrinsic incentives attached to APPR were not congruent with
their teaching beliefs. The dynamic interaction between intrinsic and extrin-
sic motivation suggested that teachers were committed to educating children
and improving their practice through collegial interactions, but they expressed
some concerns about extrinsic rewards that might foster competition rather
than collaboration. This and other constructs are explored in more detail in
the next section.

Accountability Challenges
The teachers and administrators cited several challenges that impacted
their perceptions of APPR and professional motivation, including time
constraints, lack of clarity, perceptions of unfairness, and lack of agency.
Administrators found the observations to be important, however, the
increased volume of observations reduced the amount of time they could
dedicate to being teacher leaders. They voiced concerns about the commit-
ment required, which included a pre-observation meeting, the observation,
and a post-observation conference. As Jane stated, “So I think with some-
one who has a large staff like myself it’s very difficult to get through. I
think I do 78 observations.” The recommendations made by the adminis-
trators focused on reducing the amount of time associated with formal
observations, as they believed the formal observation provided a limited
view of the teacher’s overall practice; the science teachers also mentioned
formal observations being a “show” of teaching. The administrators
believed increasing the amount of informal classroom visits was a more
practical way to gain information about the professional needs of their
teaching staffs. The time involved in the observation process was signifi-
cant in terms of practicality, and they preferred devoting attention toward
staff development and improvement rather than the typically rote and cur-
sory observation process. Charles described the increase in workload on
science administrators and how it shifted his priorities:

Everybody is very sympathetic towards the teachers, and rightfully so;


however, it’s the administrators that are carrying the weight . . . you have to do
all those observations. Managing time became very critical, and it kind of took
away from things that you’d normally do. (Charles)

Some teachers voiced concerns about the observation process when they
felt they were not getting productive feedback from observers. Annie com-
mented that the delay time between the observation and written feedback was
20 Educational Policy 00(0)

too long for the evaluation to be useful. She also felt there was little clarity
regarding what could have been improved in the lesson. In this sense, she
could not understand how the observer differentiated between “effective” and
“highly effective” and felt “short changed” to receive a lower rating:

. . . my observation, I didn’t get back for weeks later, even though I know it’s
supposed to be 48 hours . . . and I feel like they’re just—the administration is
so overworked with doing observations that I think they write just the basic
stuff that they I think it’s almost like a form letter. But yet it’s just like the old
way of observations . . . so I got an “effective,” but yet there was nothing in
there that you would’ve improved about the lesson. (Annie)

This teacher was not only disappointed with the excessive time to receive
feedback, but also the cursory manner in which the administrator used the
rubric as a list of items or behaviors to verify. She believed the observer
should have exerted greater effort in evaluating her work to produce insight-
ful instructional guidance. Lack of differentiation was also problematic.
Sarah explained the science teachers within her district were all classified as
“highly effective,” which painted an inaccurate picture of the true makeup of
her colleagues. She felt that teachers within her department were less skilled
than others but still received a highly effective rating:

I could work my butt off, and someone could come in, leave, lecture every day,
do no activities, not even meet the lab requirements, they’re still highly
effective. So it’s just, it’s unfair. The whole system’s unfair. I don’t feel like it
achieved its goal. (Sarah)

Lack of clarity and perceptions of unfairness were major impediments to


teacher buy-in of the APPR process and sometimes resulted in diminished
motivation. These constructs were evidenced by several situational con-
straints. The interview data revealed that teachers lacked explicit understand-
ing and largely disagreed with the use of student test scores in their evaluation
composite. Some of the participating science teachers were aware their over-
all evaluations were comprised of three pieces but could not articulate each
component. Christine could not differentiate between the classroom SLO,
which was based upon her own designated target to measure student growth,
and the LAM, which was a district-wide testing goal set by the local
administration:

I have the observations, and then that growth score from my students based on
the end of the year, their Regent scores—how they’ve grown. Then there’s
another component. I don’t think it’s either the state or the local, and get very
Mintz and Kelly 21

confused on this so I’m really sorry. This is the part that I don’t always—I’m
not always sure of. But it’s component-based on I think something with the
school. I don’t even—I don’t know. (Christine)

When Jack was questioned about the difference between the achievement on
exams and the improvement on exams, he seemed unclear and wondered
what the expectations were for the testing components, stating, “That’s one of
the areas I’ve actually always felt as though it was very muddled. I don’t see
much of a difference there at all and I don’t understand what the expectations
of the state or the expectations of my district are.”
Another issue was the use of baseline student performance data in the
calculation of the SLO. Of the five teachers interviewed, all but one used a
science assessment as a baseline. One district used the seventh-grade lan-
guage arts exam score as a baseline, while another used an eighth-grade sci-
ence exam score. The other teachers used practice Regents exams that were
distributed at the start of the academic year. Students were aware that base-
line exams were used to evaluate the teacher and did not affect their grades;
consequently, many students were reportedly less serious about the pretests.
In each case, teachers found the use of these scores problematic because the
validity and reliability of the measures were called into question. The SLO
was intended to measure the teacher’s influence on student growth, yet it
became impractical and was perceived as unfair if it did not measure what
was intended. The reliance on unreliable metrics often negatively impacted
motivation. Science teachers generally expressed that the observation pro-
cess was important in evaluation and development, however, they believed
that using student test scores as a means of evaluating teacher performance
should be eliminated or adjusted.
The variation in student population among different levels and from year-
to-year was a commonly articulated concern. Jack pointed out equity issues
with the APPR, as he taught the living environment course to a group of
English as a second language (ESL) students. They were held to the same
standards as matriculated English-speaking students in terms of Regents
exam performance. He found this part of evaluation troublesome because the
performance of nontraditional students was not considered fairly when the
evaluation system was changed, and consequently changed his teaching to
what he had done before the policy was implemented:

The curriculum is far too wide for ESL students to begin with and then that’s
never taken into account. And then the fact that I felt as though I was going to
be scored negatively on it I felt as though I had to rush through a curriculum
and I’m sure that led to my students really not enjoying the class and not even
22 Educational Policy 00(0)

comprehending things as well as they should have. From that perspective that
was really, really difficult for me and after the first year of APPR, I stepped
back, I re-evaluated myself, and I went back to some old strategies. (Jack)

Other teachers pointed out the inequity between classes and between differ-
ent school years. Some teachers taught higher level students when students
were tracked. Students were placed in different levels, based on academic
ability, within the same content course. An example of this would be an
enriched living environment course, taught with additional topics not tested
on the exam. The teacher who has higher level students might have typically
received a higher composite score. Also, in some academic years, students
were higher performing than others. Because of this variability, the teachers
felt that quantitative composite scores should not be used to judge teaching
performance. Several administrators also regretted there was no mechanism
to adjust for student differences when looking at aggregate test scores for
individual science teachers. Some teachers and administrators felt APPR tar-
gets were unattainable due to lack of consideration of student variability,
which lessened their buy-in and motivation.
A final theme that emerged was the rapid timeline for policy implementa-
tion that diminished teacher and administrator agency. As APPR was intro-
duced, Common Core Standards (Common Core State Standards Initiative,
2010) were implemented simultaneously. The top-down nature of this reform
and hurried enactment of the policy did not allow for the stakeholders to fully
understand the process. Adam shared the impacts of this decision:

The rollout was a problem statewide because it came out when Common Core
was first introduced. And there was a lot of research that clearly outlined the
fact that you had so many initiatives under the Race to the Top—because APPR
was part of the Race to the Top—when you secured those funds, then you had
to implement or jump through all these hoops. And at the same time, Common
Core, we had the shift in the standards. And, because in New York State, we
were actually tested on the new standards and then evaluated a year before we
had to do it. So there was definitely a poor timeline as far as implementation.
There wasn’t enough time for the teachers to understand the new standards in
order to be evaluated correctly. (Adam)

He attributed his science teachers’ lack of clarity about the new evalua-
tion policy to the aggressive time frame. He believed that efforts to receive
federal funds minimized valuable discussions on how teachers should be
evaluated, suggesting that science teachers may have resisted the APPR
policy because they were not included in the preliminary discussions.
This lack of agency resulting from a focus on top-down directives was
Mintz and Kelly 23

consistently expressed when discussing issues related to teacher account-


ability and motivation.

Strengthening Teacher Evaluation Policy


The conversations with teachers and administrators highlighted potential fac-
tors that might serve to strengthen the APPR policy. These suggestions built
upon the more positive aspects of the system. Teachers and administrators
found the dialogues that occurred during the observation process were bene-
ficial. They also generally expressed agreement in principle with the neces-
sity of accountability as extrinsic motivation for professional development.
However, they offered different visions of how APPR might shift from an
evaluative process to a more collaborative and developmental experience that
would motivate them to improve science teaching.
Both teachers and administrators desired more science staff collaboration
and developmental opportunities to improve content-specific pedagogical
practice, and suggested these be incorporated into professional evaluations.
The administrators emphasized the desire to exercise their leadership skills
by developing their staff to meet specific needs. Some lamented that the
observation process was often a poor representation of a teacher’s effective-
ness because it was based upon one scripted lesson. For example, Jane rec-
ommended “portfolio work” as a means to develop more creative laboratory
activities. Charles, like some other administrators, found more value in fre-
quent informal visits to the classroom, in terms of setting both individual and
departmental performance goals. Stacy discussed the use of professional
growth plans and peer evaluation as means of evaluation:

I would love to put something in there that includes a peer evaluation


component. Specifically though with lots of training that would include
instructional rounds. And I think that that part is very important. I think that
teachers should be able to consistently and frequently visit each other’s
classrooms. I think that doing it as a team with administrators would promote
incredible dialogue that doesn’t exist right now. But there would have to be
very, very clear protocols put in place, lots of professional development before
you rolled it out. And I think that that would take us from where we are to the
next level. (Stacy)

Rich concurred that implementing peer-peer observations would provide


him with more time dedicated to professionally developing his staff. He
viewed this strategy as a means of localizing control of policy implementa-
tion while maintaining the spirit of the policy’s intent. In doing so, the state
policy would be more aligned with his vision of instructional leadership.
24 Educational Policy 00(0)

His thoughts were consistent with Stacy’s emphasis on agency through


local autonomy:

When we first started the transition to the new APPR, and there were some
initial committee meetings to discuss the negotiation of the new framework,
many of us, myself included, said let’s start thinking outside the box. Let’s
get a little more creative. Let’s work back to something that still satisfies
these mandates, but at the same time affords us the opportunity to work
more effectively with our staff, and instead of maintaining this performance
based show in some regards—but let’s free up that time so that perhaps we
can form—I would love to institute peer-to-peer observation. That’s talked
about all the time, rarely implemented effectively, and certainly not in our
district. (Rich)

Rich emphasized the level of professional trust that should exist between the
state and school administrators. He discussed how the administration should
have some level of accountability to the state, and the state should trust the
district professionals that they are doing the right thing with regard to evalu-
ating and developing their staff. He also felt that the policy as written was not
congruent with his professional beliefs regarding instructional supervision,
as teachers were receiving evaluation scores based upon categories in a rubric
that were not possible in every lesson. This added to the problem of using
invalid metrics to quantify teacher impacts.
Most teachers also expressed they would choose another form of evalua-
tion that would give them meaningful feedback. The science teachers dis-
cussed how targeted professional development would be more helpful to
motivate them intrinsically to refine their teaching practice. Robert com-
mented on creating a peer review system:

It’s almost like a peer review system would work very well where you can have
teachers come in and they can observe at any point in time and you can ask
them to evaluate you intently. I know that takes time, I know not everybody is
comfortable doing that, you have to have a rapport with that person but I think
that’s a much better evaluation system than having an independent consultant
come in or an administrator who hardly knows you come in a see you in an
either observed or planned observation. (Robert)

The data from this study suggested that student growth and achievement
should be carefully considered in a science teacher’s evaluation because of the
variability among students, lack of seriousness among students taking pre- and
posttests, and the limited reliability and validity of baseline measures. Some of
the participants believed that high-stakes state exams did motivate teachers to
Mintz and Kelly 25

perform, although teachers should be held accountable for student outcomes


to some degree. Others pointed out the use of test scores should only be used
to help the science teacher focus on problematic content to better their teach-
ing practice. Administrators reported that the politically driven policy changes
rushed the implementation, creating a stressful transition, lack of teacher trust,
and, in some cases, diminished motivation. The administrators found the one-
size-fits-all model of APPR did not take into account economic and demo-
graphic differences within school districts. They recommended educator
involvement when creating new science teacher evaluations that could be used
as guidelines for districts to promote science teacher development. The over-
arching goal of improving teacher practice through accountability and extrin-
sic motivation was facilitated by aligned intrinsic motivational factors yet
challenged by weaknesses in policy design. These tensions could be mediated
by localized control that improves stakeholder agency, peer learning commu-
nities, and the adoption of more reliable metrics. Key thematic elements are
illustrated in a new explanatory framework for the role of motivation in pro-
fessional improvement, represented in Figure 2.

Limitations
The study has several limitations. The context where this research study took
place was unique compared with the organization of other school districts in
the United States. The research area captured the characteristic features of a
much larger school district and had a long tradition of state-mandated test-
based accountability. The results from this qualitative study may not be gen-
eralizable outside of the state of New York in school districts with different
standardized testing cultures, or in rural and urban school districts. Although
the perspectives of urban and rural classroom teachers and administrators
were not included, their views would likely present different understandings
and broader explanations of APPR policy implementation. The small number
of interviews conducted and the results gathered might not be generalizable
beyond the teachers and administrators interviewed. The interview protocol,
developed with a priori constructs, may have missed key variables related to
localized policy implementation.
The researchers took steps to minimize their biases during discussions
with each other and external colleagues, yet their views and judgments were
ever present during data collection and analysis. This could have influenced
participant responses during the interviews and further constrained generaliz-
ability. Several biases were uncovered through discussions with academic
colleagues through the critical friend model (Fetterman & Wandersman,
2005), yet additional misinterpretations were possible.
26 Educational Policy 00(0)

Figure 2.  Explanatory framework for science teacher and administrator


perceptions of APPR.
Note. APPR = Annual Professional Performance Review.

Discussion
This study attempts to fill the gap between educational research and practice
by generating findings that are geared to influence science teacher motivation
and practice through the evaluation process. Research has shown that there
have been concerns about whether new evaluation systems will result in
improved instruction and increase student learning despite devotion of sig-
nificant resources (Sipple, Killen, & Monk, 2003). Educational research has
indicated that the perceptions of key stakeholders are important in under-
standing policy success and failures (Datnow & Castellano, 2000). This
research study builds upon Firestone’s (2014) work with evaluation and
motivation theories by questioning whether the APPR policy was designed to
leverage teacher motivation to improve practice and student performance.
Louis et al. (2005) suggested that science and mathematics teachers are most
affected by accountability, and these findings shed light upon building the
capacity of science teachers to exercise agency in changing educational sys-
tems. This work also furthers the work of de Jesus and Lens (2005) by
Mintz and Kelly 27

identifying contextual factors that impact professional constraints and teacher


self-determination during policy implementation. In doing so, differences
between the intended policy and enacted policy were identified and analyzed.
This study contributes new understandings to science teacher evaluation
design by exploring the symbiotic relationships among science teachers,
administrators, and state policy through motivational perspectives.

Motivation
The APPR policy in New York State was designed to foster teacher profes-
sional development through accountability in an effort to increase student
achievement. The teacher-administrator pairs in this study revealed insights
regarding the tensions between intrinsic and extrinsic motivation in meeting
the policy goals. The science teachers in this study demonstrated a passion
for teaching science and having a positive impact on student learning, which
was the main motivation for their careers. They agreed the focus of the reform
should place more emphasis on the aspects of teaching that would intrinsi-
cally motivate them rather than a quantified extrinsic motivator with unrea-
sonable metrics. This may explain the teachers’ and administrators’ responses
to the policy, and how the policy did not reflect an understanding of what
motivates teachers.
Intrinsic motivation is influenced by teacher efficacy and trust (de Jesus &
Lens, 2005; Gigante & Firestone, 2008), so it is essential that assessments of
effectiveness are perceived as fair, clear, and reliable. If student test scores
are to be used in teacher evaluation, their weight should be given careful
consideration along with variations in student population. Student test scores
should be used formatively as a way to adjust teaching practice in areas in
which students have difficulty. Student performance is an important part of
educator accountability and should be utilized to inform teachers about suc-
cessful and unsuccessful teaching practices. However, this needs to be based
upon reliable and valid metrics. Research has suggested that value-added
measures are not sufficient alone and often are not aligned with observation
ratings (Hill et al., 2011; Kimball & Milanowski, 2009), and the participants
in this study concurred that data must be cumulative to moderate the effects
of student differences from year-to-year.
Teacher learning and accountability were often perceived as mutually
exclusive constructs, which may be construed as tension between intrinsic
(inherent desire for learning) and extrinsic (accountability) motivations.
Intrinsically motivating teachers through professional developmental strate-
gies would fulfill one of the goals of Race to the Top—maximizing student
learning through improved teacher effectiveness. The policy would have had
28 Educational Policy 00(0)

more positive reception if science teachers were clear about the policy and
cognizant of its useful information and practicality. The extrinsic nature of a
quantified composite score that encapsulated a teacher’s professional practice
was viewed as disconnected from their intrinsic motivation; however, this did
not need to be the case. Many of the teachers in this study expressed sensitiv-
ity to their status in their professional community, and they were validated by
positive feedback from their supervisors. They wanted to become better edu-
cators, but were suspicious of a policy that lacked positive incentives. They
identified downside effects such as the potential professional embarrassment
from a low HEDI score. This compromised their sense of professionalism as
the unfairness and uncertainty of certain metrics seemed to marginalize their
efforts. Science teachers expressed commitment to professional improvement,
although this was not because of the policy but in spite of it.

Intended Versus Enacted Policy


A major finding in this study was the APPR system was often not imple-
mented in school districts the way the state intended. There was little teacher
“buy-in” because of the lack of teacher and administrator input into the pol-
icy before the teacher evaluation law was changed. For reform efforts to be
implemented successfully, the policy must be clearly outlined for teachers to
meet the spirit of the reform. The policy should be practical, providing teach-
ers with valuable information to improve their teaching practice. Educator
involvement in the design of teacher evaluations will lead to a smoother tran-
sition and science teacher agency.
State activism and local control have a delicate balance, and these findings
shed light upon how local contextual factors influenced policy implementa-
tion. Malen (2003) suggested that educational policies often have less impact
when district-level administrators bring state mandates in line with their own
professional views and constraints. Because the supervisors did not have
adequate resources to meet the demands of additional teacher observations,
their work sometimes suffered from cursory feedback that was not immedi-
ate. They also felt deprofessionalized and diminished in their roles as instruc-
tional leaders. They did their best to meet policy specifications but teachers
mostly maintained their status quo as extraordinary efforts were not incentiv-
ized and good faith efforts may have been undervalued. The external motiva-
tion of accountability would have had greater impact if it were more consistent
with the educators’ values and beliefs.
For policy changes to be accepted by the educational community and
implemented properly, reform efforts need to be a blend of bottom-up and top-
down approaches (Fullan, 1994). Researchers have concluded that change
Mintz and Kelly 29

occurs when centralized mandates and local initiatives unite. Fullan (1994)
concluded that systems change when individuals and small groups find com-
monality both logically and centrally. In one sense, teacher evaluation is
already bottom-up as ineffective teachers are removed before granting tenure.
However, the top-down approach of the state evaluation system had limited
impact because administrators were not given the resources to manage
increased workload, and teachers often viewed their composite scores as
unfair. Teacher evaluations can only improve instruction and student learning
if there is trust between the teachers being evaluated and administration
(Firestone, 2014). Trust in professional relationships has been found to con-
tribute to increased motivation and collaboration, leading to improved student
learning in science (Smetana, Wenner, Settlage, & McCoach, 2016).

Implications
This research suggests strategies for the state to improve the evaluation
policy and process, and, consequently, science teacher motivation. A one-
size-fits-all approach to science teacher evaluations is not appropriate for
states with large numbers of school districts with varied ranges of socio-
economic diversity. The results and themes generated by this research lend
themselves to a bigger question, that is, how should science teachers be
evaluated? Structural changes in the format of science teacher evaluation
are necessary to accomplish the goals set by Race to the Top and New York
State, namely, to provide objective teacher evaluation results that foster
motivation, professional growth, and student learning. ESSA released
states from reporting “highly effective” teachers and accompanying stu-
dent performance metrics, however, New York maintained the requirement
for the inclusion of student test scores in the teacher evaluation process.
State- and local-level experts in teacher evaluations and pedagogical con-
tent knowledge should be included in the revamping of science teacher
evaluation policy to meet the needs of specific stakeholders. Science
teachers and administrators, as practitioners of policy, have insightful rec-
ommendations to offer, and fostering their agency will facilitate profes-
sional engagement in the process.
When it comes to specialized content areas, such as the sciences, educa-
tional leadership loses some of its generalizability. Science teachers are con-
tent specialists, and their supervisors sometimes do not share the same content
expertise. Teachers need credible sources of knowledge to benefit from evalu-
ation (Murphy et al., 2013), so it is essential that administrators provide the
requisite content expertise to make effective recommendations. The certifica-
tion of science administrators does not need to be changed or become more
30 Educational Policy 00(0)

specialized, rather, teacher leaders and peer learning communities could share
disciplinary skills and strategies while lessening the time burden for science
administrators. The administrator’s role could be adjusted to become a facili-
tator for peer observations, teacher collaborative communities, and educa-
tional rounds. The administrator could change his or her role from strictly an
evaluator to an instructional leader, seeking out meaningful professional
development to fulfill the needs of faculty. These innovations would strengthen
the value of peer validation, a desirable component of extrinsic motivation
(Finnigan & Gross, 2007).
The results of this study indicate that the professional development of
science teachers should be the focus of teacher evaluations. The partici-
pants found accountability to be an important component of evaluation and
an extrinsic motivator, yet these evaluations may not need to be conducted
every year. Composite scores calculated over a longer period of time could
provide a more reliable representation of the teacher’s quality. Developing
individual professional learning plans is another mechanism for evaluating
science teachers. Multiple assessments with specific growth targets within
the academic year could provide the information needed to demonstrate
professional improvement.
Preservice teacher programs could also benefit from this study. These pro-
grams could introduce content-specific observation rubrics to preservice
teachers to familiarize them with metrics for lesson evaluations. Highly moti-
vated veteran teachers are calling for more content-specific professional
development, so it seems logical that preservice teachers would also require
the same training in disciplinary pedagogical content knowledge. Bridging
the gap between preservice science teacher training and professional science
teacher development could improve the quality and self-reflection of novice
science teachers as they enter the profession.

Conclusion
Using the information gathered from this study, successful implementation of
teacher evaluation reform should correspond with positive teacher percep-
tions and focus on intrinsically motivating science teachers to improve
instruction. Overall, this study found positive potential in the evaluations of
teachers in New York under the APPR system, yet negative stakeholder reac-
tions regarding the perceived unfair, chaotic, and punitive nature of the pol-
icy. The goal of science teacher evaluations should be to provide the
framework for developing excellent science educators through localized con-
trol. This study has established some important considerations when design-
ing and implementing evaluation policy.
Mintz and Kelly 31

Several important points were learned as a result of this study: (a) science
teachers and administrators value the importance of teacher evaluations, (b)
conversations between administrators and teachers have improved as a result
of APPR’s implementation of observation rubrics, (c) the condensed timeline
and top-down approach of APPR policy may have contributed to teacher
resistance because of lack of policy clarity, (d) science teachers and adminis-
trators found APPR did not provide them with practical and reliable informa-
tion to improve science teacher practice with the use of student test scores,
and (e) science educators believed professional development should be the
main focus of teacher evaluations. The cautious nature of teacher perceptions
associated with student test scores and the positive perceptions associated
with the observation process indicate that policy makers should address this
dichotomy in future evaluation reform efforts. Motivational constructs pro-
vide an insightful lens for designing policy to leverage teachers’ commitment
to improving practice and raising student achievement.
The insights gathered from this study add to the limited literature regarding
science teacher and administrator perspectives of evaluation policy. As policy
makers have increased the focus on teacher evaluation, examining these per-
ceptions in a motivational context provides the foundation for further research
in this area. This study calls for future qualitative and large-scale randomized
controlled studies regarding the effects of professional development on teacher
motivation, as well as the impacts of science teacher collaborative networks,
peer review, and educational rounds on science teaching and learning. Future
reform efforts regarding science teacher evaluations should include educator
input, explicit and valid metrics, and possess practicality for science teachers
and administrators. The evaluation policy should emphasize disciplinary pro-
fessional development and teacher professionalism to motivate teachers
intrinsically and foster continuous pedagogical growth.

Appendix A
Science Teacher Semi-Structured Interview Protocol

1. Comment on your experiences with the new observation process.


2. Did your school district explain the new process in detail?
3. Talk about your perceptions of the local achievement measure.
4. Talk about your perceptions of the student learning objectives.
5. Discuss the observation process, including the rubric(s) used to eval-
uate your lesson.
6. What is your general impression of the overall system?
32 Educational Policy 00(0)

7. Do you feel APPR [Annual Professional Performance Review] fos-


ters a culture of continuous professional growth?
8. Discuss what motivates you in your career.
9. Which science content area and level do you teach?
10. What levels of science courses does your district offer (AP, IB, honors
level, Regents Level)?
11. Do you feel as though you are an effective science teacher? Why or
why not?
12. What qualities do you think make up a great teacher?
13. Do you participate in any activities outside the school day for the
school district you work in?
14. Do you think student test scores should be used to evaluate teacher
quality?
15. Describe your feelings about the new Annual Professional
Performance Reviews.
a. Do you think the current system is a fair way to evaluate a
teacher?
16. What would be, in your opinion, the ideal way to evaluate science
teachers?
17. Have you changed anything about the way you teach because of the
new APPR system of evaluation?
a. Do you do more Regents test preparation?
18. Would you be opposed to a student/parent survey as a part of your
evaluation?
19. How do you feel about the addition of a teacher self-evaluation as a
part of your evaluation?

Appendix B
Science Administrator Semi-Structured Interview Protocol

1. Discuss your educational background and work experience.


2. What certifications do you hold?
3. Discuss what motivates you in your career.
4. How are you evaluated?
5. What is your understanding of how to evaluate science teachers in
your district?
6. How many teachers are you responsible for?
7. Comment on your experiences with the new observation process.
8. Talk about your perceptions of the local achievement measure.
9. Talk about your perceptions of the student learning objectives.
Mintz and Kelly 33

10. Do you think student test scores should be used to evaluate teacher
quality? Why or why not?
11. Discuss your general impression of the overall evaluation system.
12. Do you think teachers have changed anything about the way they
teach because of the new APPR [Annual Professional Performance
Review] system of evaluation?
13. Do you feel APPR fosters a culture of continuous professional
growth?
14. Discuss the qualities that characterize a “great” teacher.
15. What are your views on tenure?
16. Describe your feelings about the new Annual Professional
Performance Reviews.
17. What would be, in your opinion, the ideal way to evaluate teachers?

Declaration of Conflicting Interests


The author(s) declared no potential conflicts of interest with respect to the research,
authorship, and/or publication of this article.

Funding
The author(s) received no financial support for the research, authorship, and/or publi-
cation of this article.

ORCID iDs
Jessica A. Mintz https://orcid.org/0000-0002-8789-9164
Angela M. Kelly https://orcid.org/0000-0003-1393-1296

References
American Recovery and Reinvestment Act (ARRA) of 2009, Pub. L. No. 111-5, 123
Stat. 115, 516 (February 19, 2009).
Amrein-Beardsley, A. (2008). Methodological concerns about the education value-
added assessment system. Educational Researcher, 37, 65-75.
Baker, E. L., Barton, P. E., Darling-Hammond, L., Haertel, E., Ladd, H. F., Linn,
R. L., . . . Shepard, L. A. (2010). Problems with the use of student test scores to
evaluate teachers (vol. 278). Washington, DC: Economic Policy Institute.
Barusch, A., Gringeri, C., & George, M. (2011). Rigor in qualitative social work
research: A review of strategies used in published articles. Social Work Research,
35(1), 11-19.
Bill and Melinda Gates Foundation. (2011). Learning about teaching: Initial findings
from the measures of effective teaching project. Bellevue, WA: Author.
34 Educational Policy 00(0)

Clipa, O. (2015). Roles and strategies of teacher evaluation: Teachers’ perceptions.


Procedia—Social and Behavioral Sciences, 180, 916-923.
Coburn, C. E., Hill, H. C., & Spillane, J. P. (2016). Alignment and accountability
in policy design and implementation: The Common Core State Standards and
implementation research. Educational Researcher, 45, 243-251.
Common Core State Standards Initiative. (2010). Common Core State Standards for
English Language arts & literacy in history/social studies, science, and technical
subjects. Washington, DC: Council of Chief State School Officers.
Conley, S., & Glasman, N. S. (2008). Fear, the school organization, and teacher eval-
uation. Educational Policy, 22, 63-85.
Corbin, J., & Strauss, A. (2014). Basics of qualitative research: Techniques and pro-
cedures for developing grounded theory. Thousand Oaks, CA: SAGE.
Creswell, J. W. (2013). Qualitative inquiry & research design: Choosing among five
approaches (3rd ed.). Los Angeles, CA: SAGE.
Cuevas, R., Ntoumanis, N., Fernández-Bustos, J. G., & Bartholomew, K. (2018).
Does teacher evaluation based on student performance predict motivation, well-
being, and ill-being? Journal of School Psychology, 68, 154-162.
Danielson, C. (2008). The handbook for enhancing professional practice: Using
the framework for teaching in your school. Alexandria, VA: Association for
Supervision and Curriculum Development.
Darling-Hammond, L., Amrein-Beardsley, A., Haertel, E., & Rothstein, J. (2012).
Evaluating teacher evaluation: Popular modes of evaluating teachers are
fraught with inaccuracies and inconsistencies, but the field has identified better
approaches. Phi Delta Kappan, 93(6), 8-15.
Datnow, A., & Castellano, M. (2000). Teachers’ responses to success for all:
How beliefs, experiences, and adaptations shape implementation. American
Educational Research Journal, 37, 775-799.
de Jesus, S. N., & Lens, W. (2005). An integrated model for the study of teacher moti-
vation. Applied Psychology, 54, 119-134.
Derrington, M. L., & Campbell, J. W. (2015). Implementing new teacher evaluation
systems: Principals’ concerns and supervisor support. Journal of Educational
Change, 16, 305-326.
Eccles, J. S., & Wigfield, A. (2002). Motivational beliefs, values, and goals. Annual
Review of Psychology, 53, 109-132.
Every Student Succeeds Act (ESSA) of 2015, Pub. L. No. 114-95, 114 Stat. 1177
(2015-2016).
Fetterman, D. M., & Wandersman, A. (Eds.). (2005). Empowerment evaluation prin-
ciples in practice. New York, NY: Guilford Press.
Finnigan, K. S., & Gross, B. (2007). Do accountability policy sanctions influence
teacher motivation? Lessons from Chicago’s low-performing schools. American
Educational Research Journal, 44, 594-629.
Firestone, W. A. (2014). Teacher evaluation policy and conflicting theories of moti-
vation. Educational Researcher, 43, 100-107.
Mintz and Kelly 35

Firestone, W. A., Nordin, T. L., Shcherbakov, A., Blitz, C. L., & Kirova, D. (2014).
New Jersey’s Pilot Teacher Evaluation Program: Year 2 final. New Brunswick,
NJ: Center for Effective School Practices.
Forman, K., & Markson, C. (2015). Is “effective” the new “ineffective”? A crisis
with the New York state teacher evaluation system. Journal for Leadership and
Instruction, 14(2), 5-11.
Fullan, M. G. (1994). Coordinating top-down and bottom-up strategies for edu-
cational reform. In R. J. Anson (Ed.), Systemic reform: Perspectives on per-
sonalizing education (pp. 7-22). Washington, DC: U.S. Government Printing
Office.
Fullan, M. G. (2001). The new meaning of educational change (3rd ed.). New York,
NY: Teachers College Press.
Gigante, N. A., & Firestone, W. A. (2008). Administrative support and teacher lead-
ership in schools implementing reform. Journal of Educational Administration,
46, 302-331.
Glaser, B. G., & Strauss, A. L. (1967). The discovery of grounded theory: Strategies
for qualitative research. Chicago, IL: Aldine.
Goe, L., Bell, C., & Little, O. (2008). Approaches to evaluating teacher effective-
ness: A research synthesis. Washington, DC: National Comprehensive Center
for Teacher Quality.
González, R. A., & Firestone, W. A. (2013). Educational tug-of-war: Internal and
external accountability of principals in varied contexts. Journal of Educational
Administration, 51, 383-406.
Guest, G., Bunce, A., & Johnson, L. (2006). How many interviews are enough? An
experiment with data saturation and variability. Field Methods, 18, 59-82.
Hanushek, E. A., Kain, J. F., O’Brien, D. M., & Rivkin, S. G. (2005). The market for
teacher quality (No. w11154). Washington, DC: National Bureau of Economic
Research.
Harris, D. N. (2009). Would accountability based on teacher value added be smart
policy? An examination of the statistical properties and policy alternatives.
Education Finance and Policy, 4, 319-350.
Harris, D. N., & Herrington, C. D. (2015). The use of teacher value-added measures in
schools: New evidence, unanswered questions, and future prospects. Educational
Researcher, 44, 71-76.
Harris, D. N., Ingle, W. K., & Rutledge, S. A. (2014). How teacher evaluation meth-
ods matter for accountability: A comparative analysis of teacher effectiveness
ratings by principals and teacher value-added measures. American Educational
Research Journal, 51, 73-112.
Hill, H., & Grossman, P. (2013). Learning from teacher observations: Challenges and
opportunities posed by new teacher evaluation systems. Harvard Educational
Review, 83, 371-384.
Hill, H., Kapitula, L., & Umland, K. (2011). A validity argument approach to evalu-
ating teacher value-added scores. American Educational Research Journal, 48,
794-831.
36 Educational Policy 00(0)

Holloway, I. (1997). Basic concepts for qualitative research. London, England:


Wiley-Blackwell.
Hopkins, P. (2016). Teacher voice: How teachers perceive evaluations and how lead-
ers can use this knowledge to help teachers grow professionally. NASSP Bulletin,
100(1), 5-25.
Jennings, J. (2012). The effects of accountability system design on teachers’ use of
test score data. Teachers College Record, 114(11), 1-23.
Jiang, J. Y., Sporte, S. E., & Luppescu, S. (2015). Teacher perspectives on evaluation
reform Chicago’s REACH Students. Educational Researcher, 44, 105-116.
Kimball, S. M., & Milanowski, A. (2009). Examining teacher evaluation validity
and leadership decision making within a standards-based evaluation system.
Educational Administration Quarterly, 45, 34-70.
Kraft, M. A., & Gilmour, A. F. (2017). Revisiting the widget effect: Teacher
evaluation reforms and the distribution of teacher effectiveness. Educational
Researcher, 46, 234-249.
Leithwood, K., Steinbach, R., & Jantzi, D. (2002). School leadership and teachers’
motivation to implement accountability policies. Educational Administration
Quarterly, 38, 94-119.
Lincoln, Y. S., & Guba, E. G. (1985). Naturalistic Inquiry. Thousand Oaks, CA:
Sage.
Long Island Index. (2009). Long Island index report. Garden City, NY: Rauch
Foundation.
Louis, K. S., Febey, K., & Schroeder, R. (2005). State-mandated accountability in
high schools: Teachers’ interpretations of a new era. Educational Evaluation and
Policy Analysis, 27, 177-204.
Malen, B. (2003). Tightening the grip? The impact of state activism on local school
systems. Educational Policy, 17, 195-216.
Martínez, J. F., Schweig, J., & Goldschmidt, P. (2016). Approaches for combining
multiple measures of teacher performance: Reliability, validity, and implications
for evaluation policy. Educational Evaluation and Policy Analysis, 38, 738-756.
Marzano, R. J. (2012). The two purposes of teacher evaluation. Educational
Leadership, 70(3), 14-19.
Miles, M. B., & Huberman, A. M. (1994). Qualitative data analysis (2nd ed.).
Thousand Oaks, CA: SAGE.
Murphy, J., Hallinger, P., & Heck, R. H. (2013). Leading via teacher evaluation: The
case of the missing clothes? Educational Researcher, 42, 349-354.
New York State Education Department. (2013). Approved APPR plans. Albany:
Author.
New York State Education Department. (2015a). General education and diploma
requirements. Albany: Author.
New York State Education Department. (2015b). Guidance on New York State’s
annual professional performance review for teachers and principals to imple-
ment educational law 3012-c and the commissioners regulations. Albany:
Author.
Mintz and Kelly 37

New York State Legislature. (2010). Article 61. Teachers and supervisory and
administrative staff (Section 3012-c Annual professional performance review of
classroom teachers and building principles). Retrieved from http://public.leginfo
.state.ny.us/menuf.cgi
New York State United Teachers. (2011). The NYSUT teacher practice rubric.
Latham: Author.
NORC at the University of Chicago. (2018). State-administered HS end of course
(EOC) science assessments, intended uses, 2016-17. Retrieved from http://stem
-assessment.org/table/pages/table10.aspx
Pallas, A. M. (2012). The fuzzy scarlet letter. Educational Leadership, 70, 54-57.
Papay, J. P. (2012). Refocusing the debate: Assessing the purposes and tolls of teacher
evaluation. Harvard Educational Review, 82, 123-141.
Patton, M. Q. (1990). Qualitative evaluation and research methods. Newbury Park,
CA: SAGE.
Patton, M. Q. (1999). Enhancing the quality and credibility of qualitative analysis.
Health Services Research, 34(5 Pt. 2), 1189-1208.
Polkinghorne, D. E. (1989). Phenomenological research methods. In R. S. Valle
& S. Halling (Eds.), Existential-phenomenological perspectives in psychology
(pp. 41-60). New York, NY: Plenum Press.
Ryan, R. M., & Deci, E. L. (2000). Intrinsic and extrinsic motivations: Classic defi-
nitions and new directions. Contemporary Educational Psychology, 25, 54-67.
Ryan, R. M., & Deci, E. L. (2006). Self-regulation and the problem of human auton-
omy: Does psychology need choice, self-determination, and will? Journal of
Personality, 74, 1557-1586.
Sadler, D. R. (1985). Evaluation, policy analysis, and multiple case studies:
Aspects of focus and sampling. Educational Evaluation and Policy Analysis,
7, 143-149.
Saldaña, J. (2013). The coding manual for qualitative researchers. Thousand Oaks,
CA: SAGE.
Sipple, J. W., Killen, K., & Monk, D. H. (2003). Adoption and adaptation: School
district responses to state imposed learning and graduation requirements.
Educational Evaluation and Policy Analysis, 26, 143-168.
Smetana, L. K., Wenner, J., Settlage, J., & McCoach, D. B. (2016). Clarifying and
capturing “trust” in relation to science education: Dimensions of trustworthiness
within schools and associations with equitable student achievement. Science
Education, 100, 78-95.
Springer, M. G., Ballou, D., & Peng, A. (2008). Impact of the teacher advancement
program on student test score gains: An independent appraisal. Nashville, TN:
National Center on Performance Incentives.
Steinberg, M. P., & Garrett, R. (2016). Classroom composition and measured teacher
performance: What do teacher observation scores really measure? Educational
Evaluation and Policy Analysis, 38, 293-317.
Strauss, A., & Corbin, J. (1990). Basics of qualitative research: Grounded theory
procedures and techniques. Newbury Park, CA: SAGE.
38 Educational Policy 00(0)

Strong, M., Gargani, J., & Hacifazlioglu, O. (2011). Do we know a successful teacher
when we see one? Experiments in the identification of effective teachers. Journal
of Teacher Education, 62, 367-382.
Stronge, J. H., Ward, T. J., Tucker, P. D., & Hindman, J. L. (2007). What is the rela-
tionship between teacher quality and student achievement? An exploratory study.
Journal of Personnel Evaluation in Education, 20, 165-184.
Tuytens, M., & Devos, G. (2009). Teachers’ perception of the new teacher evalua-
tion policy: A validity study of the Policy Characteristics Scale. Teaching and
Teacher Education, 25, 924-930.
Tyler, J. H. (2011). Designing high quality evaluation systems for high school teach-
ers: Challenges and potential solutions. Washington, DC: Center for American
Progress.
U.S. Department of Education. (2001). The No Child Left Behind Act of 2001 [public
law]. Retrieved from https://www.congress.gov/bill/107th-congress/house-bill/1
U.S. Department of Education. (2009). Race to the Top Program Executive Summary.
Washington, DC: Author. Retrieved from https://www2.ed.gov/programs/raceto-
thetop/executive-summary.pdf
Watt, D. (2007). On becoming a qualitative researcher: The value of reflexivity. The
Qualitative Report, 12, 82-101.
Weisberg, D., Sexton, S., Mulhern, J., & Keeling, D. (2009). The widget effect: Our
national failure to acknowledge and act on differences in teacher effectiveness.
Brooklyn, NY: New Teacher Project. Retrieved from https://tntp.org/publica-
tions/view/the-widget-effect-failure-to-act-on-differences-in-teacher-effective-
ness
Weiss, J. A. (2012). Data for improvement, data for accountability. Teachers College
Record, 114, 110307.
Xu, X., Grant, L. W., & Ward, T. J. (2016). Validation of a statewide teacher evalua-
tion system: Relationship between scores from evaluation and student academic
progress. NASSP Bulletin, 100, 203-222.
Yin, R. K. (1994). Case study research: Design and methods (5th ed.). Thousand
Oaks, CA: SAGE.

Author Biographies
Jessica A. Mintz was awarded the PhD in Science Education from Stony Brook
University, NY, in 2017. She is a New York State Master Teacher and a high school
science teacher in Bay Shore, NY. Her research interests include science teacher
accountability practices and chemistry teacher professional development.
Angela M. Kelly is an associate director of Science Education at the Institute for
STEM Education, and an associate professor of Physics at Stony Brook University.
Her research interests include inequities in physical science and engineering educa-
tion; reformed teaching practices in science; and sociocognitive influences on STEM
access and participation.

Das könnte Ihnen auch gefallen