You are on page 1of 11


Feasibility of Using RATE with Teacher Candidates

Ann Bullock, Kristen Cuthrell, Elizabeth A. Fogarty, Joy N. Stapleton

East Carolina University
With heightened accountability in teacher education, it is increasingly important to
develop or adopt a teacher candidate assessment system that is comprised of multiple measures
of effectiveness. Such a system needs to be comprehensive and provide multiple data on
individual teacher candidates. Drill-down studies of existing program performance assessment
data found that our home-grown assessments of teacher candidate performance, including teacher
candidate-developed portfolios, observational assessments from the internship, and dispositional
assessments were not valid or reliable measures of teacher candidate performance (Henry et al.,
2013). The lack of ability to make reliable assumptions about our candidates based on available
data drove both program leadership and faculty to look for a more reliable method for assessing
the readiness of teacher candidates (Cuthrell et al., 2014).
Close examination of data related to the current observation instrument highlighted how
ineffective the instrument was at producing reliable and valid data to help us evaluate our teacher
candidates and improve our program. Follow up conversations with university supervisors
revealed that they were reluctant to score a teacher candidate down in an area on the instrument
because they were trying to be supportive, thus not using the observation instrument as an
evaluation tool but rather a developmental tool with teacher candidates. The use of the


observation instrument for different purposes led to grade documentation issues with teacher
candidates. For example, many times at the end of the semester the scores on the observation
instrument would not show any deficits but the supervisor would want to give the teacher
candidate a lower grade. Without evidence of reduced performance from the observation
instrument, giving the intern a lower grade was not always possible. This data-driven knowledge
led us to investigate other reliable and valid observation instruments.
The Rapid Assessment of Teacher Effectiveness (RATE) has been validated in use with
practicing teachers. RATE is a process for evaluating teachers through the observation of
classroom lesson videos with items that focus on aspects of teaching that have been proven to be
highly predictive of student performance gains on measures of learning. Its primary goal is to
enable targeted teacher support early in the teaching cycle in order to maximize the possibility
for student and teacher success. RATE is designed to be not only predictive of student learning
but also efficient and practical to use.
This paper describes the collaboration between East Carolina Universitys Department of
Elementary Education and Middle Grades Education and the developers of the RATE instrument
through which a proof of concept study focused on norming the RATE instrument for teacher
candidates. As initial findings from the RATE team are summarized, the feasibility of using the
RATE instrument for teacher candidates recorded and live observations will be explored.
Related Literature
Various frameworks have emerged that provide research-based, comprehensive
approaches to describing and identifying effective teaching. Widely accepted frameworks include


Danielson (2011) Framework for Teaching; Teaching Works Framework (Sawchuck, 2011), the
Gates Foundation Measures of Effective Teaching (MET) project (Danielson, 2010); and
Marzanos Classroom Instruction that Works (Marzano, Pickering, and Pollock, 2009).
Due to their wide acceptance, these frameworks are being used in the development of
observation and evaluation instruments. Danielsons work is being used in a collaborative
partnership with Educational Testing Service that resulted in the development of the Teachscape
observation instrument for teachers (Teachscape, 2010.) Another project, through the Bill &
Melinda Gates Foundation and in collaboration with Stanford University, University of Virginia,
Harvard University, the RAND Corporation, the Education Testing Service and others, designed
the Measures of Effective Teaching (MET) (Gates Foundation, 2009). This project placed
emphasis on how to measure effective teaching and teachers using a wide range of data
collection. Researchers collected multiple points of data to identify effective teaching and
support teachers in reaching the highest levels of instruction. Conclusions of the initial report
noted, Reinventing the way we evaluate and develop teachers will eventually require new
infrastructure, perhaps using digital video to connect teachers with instructional coaches,
supervisors and their peers (p.31).
RATE was designed by researchers Gargani and Strong (2014). When comparing their
observation instrument to others, Strong, Gargani, and Hacifazlioglu (2011) found using shorter
video clips of instruction, fewer instances of observation, a simplified scoring rubric, and less
training led to scores that were able to be generated with greater consistency in reliability and
predictiveness. Overall, this resulted in a less expensive approach to teacher observation. RATE
was designed to predict the ability of teachers to raise the achievement of their students and has


been found to valid and reliable (Gargani, & Strong, 2014; Strong, Gargani, & Hacifazliolu,
2011). RATE has been studied in various settings and has undergone refinements. The current
RATE design consists of six items relating to the lesson objective, instructional delivery
mechanisms, teacher questioning strategies, clarity of presentation of concepts, time on task, and
level of student understanding. Each item is assessed on a 3-point scale, and evaluators are
encouraged to jot down notes to support their scores. RATE, like all the other instruments, has
critics. Good and Lavigne (2015) argue that the work surrounding the testing and use of RATE is
inadequate and not convincing in contributing to teacher improvement.
In this project, faculty at East Carolina University (ECU) partnered with Strong and
Gargani to gain access to the RATE observation instrument to use with preservice teachers. This
proof of concept study was beneficial to both groups in that faculty at ECU needed a valid and
reliable observation instrument that was feasible to use with large populations of interns. Strong
and Gargani benefited from this pilot in that it allowed them to explore the reliability and validity
of this observation instrument with preservice teachers, a population not yet explored with this
instrument. The overarching questions for this proof of concept study were:

Does the RATE observation instruments reliability and validity hold when
used with preservice teachers?


Is the RATE observation instrument a feasible instrument to be used by large

teacher education programs?



Do you have to use two raters in order for the rater scores to remain reliable
and valid?

As a program requirement, teacher candidates submitted a final teaching video at the end
of their internship semester. Full lesson videos were submitted to Taskstream, an electronic
portfolio platform. Upon submission, candidates were asked to identify their subject, topic, and
20 consecutive minutes of direct instruction captured in the full lesson video. These videos were
the subject of this research project.
In total, 179 university videos were used for scoring. Of the 179, 150 were scored by both
members of a pair. Of the 150 videos scored by two raters, only 70 videos were watched with no
flags. Flags consisted of video issues including the video being too long, too short, the rater
knowing the teacher in the video, and/or audio issues. For the purposes of this study, some of the
analyses used all 150 videos while others used 70 videos.
In March 2015, Strong and Gargani provided an orientation session on the RATE
instrument for 38 ECU faculty (across all content areas), university supervisors, school district
instructional coaches, and college of education administration faculty. This session focused on
introducing faculty to the RATE instrument and how it may be helpful for teacher candidates.
Strong and Gargani returned in May 2015. On the first day, thirty-eight university faculty (across
all content areas) participated in eight hours of orientation and training on the RATE instrument.
In the second day, each participant scored 8 videos in 30 minute time segments (a four hour time
period plus breaks). The 30 minute timeframe included viewing the video clip, recording the
scores on the RATE instrument, then conferring with a partner who viewed the same video.
During the conferring, scores were reconciled. Each of the 150 videos were scored twice using


both rater pairs and solo raters. Off site, trained RATE personnel evaluated all videos for
comparison purposes.
ECU faculty tended to rate teacher candidates higher on the lesson objective and student
thinking portion of the videos. They tended to score teacher candidates lower on checking for
understanding, clarity of delivery, flow of instruction, and time on task.
See Table 1 comparing ECU scores verses RATE personnel scores as reported in the final
project report by Strong and Gargani.
Table 1: Comparing ECU Faculty independent scores to RATE Personnel independent


While there was good agreement among raters in most areas, flow of instruction seemed
to show the biggest discrepancy. The cause of this discrepancy is not immediately apparent and
warrants further examination.
See Table 2 comparing the times paired raters arrived at the same score as reported in the
final project report by Strong and Gargani.
Table 2: Paired Raters Arriving at Same Scores

It appears that discussing ratings with another person brings the extreme raters closer
back to the norm. The closer the score was to the norm, the less it moved. The closer it was to the
norm, the more it moved. This finding especially holds for ECU raters. RATE team scores were
less variable across the rater. See Figure 1 as reported in the final project report by Strong and


Figure 1. Means and Standard Deviation of Independent (Open Circle) and revised (Closed
Circle) Sum Scores for ECU Raters

Scientific or scholarly significance of the study

Findings from this RATE proof of concept study indicate that the RATE scale can be used
with preservice teachers. Even though discrepancies existed, rater reliability was higher in this
study than in the RATEs (VS7) validation study and the MET study conducted by the Gates
Family Foundation. The biggest discrepancies occurred in the ECU facultys rating of clarity and
flow of the lesson. Further exploration into the facultys interpretation of these categories is
needed before that discrepancy can be explained.


The instrument itself is shorter and less complex than the current version of the
instrument and requires only a 20 minute observation. One concern coming into the proof of
concept study was that two raters were needed for each video. Given the large number of
individuals within the program, having two individuals rate each person seems prohibitive.
However, data indicate that having two raters for each video tends to hold weakly justified
extreme scores in check. Since the raters are trained to do each session in a 30 minute time
period, the process may actually end up taking the same amount of time, if not less time.
It is too early to tell if the instrument will remain predictive of increased student learning
outcomes. As these teacher candidates enter the work force, further data will be collected that
will be able to track the predictive nature of the instrument.
Continuing research is being conducted on observation practices that are effective and
whether RATE is an observation instrument that can best support candidate growth in
challenging clinical experiences. Being able to use the RATE instrument in live setting with
teacher candidates would help teacher education programs have a reliable and valid outcome
measure of their teachers. This measure in addition to other reliable outcomes measures, such as,
the edTPA, would strengthen programs ability to predict the success of their graduates.



Cuthrell, K., Stapleton, J., Bullock, A., Lys, D., Smith, J., and Fogarty, E. (2014). Mapping the
journey of reform and assessment for an elementary education teacher preparation
program. Journal of Curriculum and Instruction, 8(1).
Gargani, J., & Strong, M. A. (2014). Can we identify a successful teacher better, faster, and
cheaper? Evidence for innovating teacher observation systems. Journal of Teacher
Education. Pre-print copies available online.
Good, T., & Lavigne, A. (2015). Response to "Rating Teachers Cheaper, Faster, and Better: Not
So Fast": It's about evidence. Journal of Teacher Education, 0: 0022487115587110v122487115587110
Henry, G. T., Campbell, S. L., Thompson, C. L., Patriarca, L. A., Luterbach, K .J., Lys, D. B., &
Covington, V. M. (2013). The predictive validity of measures of teacher candidate
programs and performance: Toward an evidence-based approach to teacher preparation.
Journal of Teacher Education 64, 439-453.
Sawchuk, S. (2011). University of Michigan project scales up "high leverage" teaching practices.
Education Week. Retrieved from the www on August 31, 2012 at



Strong, M., Gargani, J., & Hacifazliolu, . (2011). Do we know a successful teacher when we
see one? Experiments in the identification of effective teacher. Journal of Teacher
Education, 62(4), 367-382.
Strong, M. (2011). The Highly Qualified Teacher: What is Teacher Quality and How Do We
Measure It? New York: Teachers College Press.