Sie sind auf Seite 1von 6

Grading Speaking Performance: Teachers Attitudes towards Two Rating Scales and Subjectivity

Issues
Nurfitri Habibi
Universitas Pendidikan Indonesia
nurfitrihabibi@gmail.com
Introduction
Speaking test is considered as the most difficult aspect to test by some teachers since it involves various
criteria to be measured, such as vocabulary, pronunciation, grammar, fluency, pragmatic, and some other
aspects (Lasito, 2015; Babaii, Taghaddomi, & Pashmforoosh, 2015), and it takes substantial time and
effort to attain the valid and reliable results (Li, 2011). Owing to its difficulty, teachers tend to avoid the
implementation of speaking test, or if any, they must be confused. Knight (1992) supports that the
difficulties faced by teachers in testing speaking skill, frequently, direct them to use inappropriate
speaking test or even not to conduct speaking test at all.
However, speaking test difficulty is not only caused by the aforementioned factors, but also by the
subjectivity factor in grading students performance. Teng (2007, p. 3) claims that one of problems
associated with speaking test is it is subjective in nature. The subjectivity factor, in this case, means
teachers or raters might have various perspectives in scoring students speaking ability. This different
interpretation leads to the unfairness in testing and scoring speaking skill since it triggers biased judgment
(Kim, 2006; Chou, 2013). The bias judgment might also be caused by the rating scale or scoring rubrics,
whether it is holistic or analytical scale, used by the teachers in assessing speaking. It is stated that the
method used might influence teachers attitude towards the decision of students speaking ability (Chou,
2013). Kim (2006) further argues that speaking assessment can be affected by a rating scale, since there
might be an interaction effect between the rating criteria and raters performance.
In regards to aforementioned issues, this study attempts to discover subjectivity and rating scale in
assessing speaking. In fact, issues of subjectivity and rating scale in assessing speaking are by no means
new. Some studies have been conducted investigating teachers interpretation and rating scales, such as
Orr (2002), Oscarson & Apelgren (2011), Wang (2010), Kim (2006), Tuan (2012), and Chou (2013). All
of aforementioned works focus on the validity and reliability of speaking assessment. However, recent
works focusing on teachers attitude of two rating scales in speaking test and factors affecting their
decision in grading speaking ability may have been very few. Therefore, this study aims at investigating
teachers perspective on a performance data-driven scale (analytical scale) and a rating checklist (holistic
scale) in grading speaking performance as well as their consideration factors influencing their subjectivity
judgment in scoring students speaking ability by answering the following questions:
1. How do teachers perceive a performance data-driven scale (analytical scale) and a rating checklist
(holistic scale) in grading speaking performance?
2. What factors do influence teachers subjectivity in testing speaking in Indonesia context?
In accordance with those questions, this present study is expected to provide a meaningful
contribution to the practice of speaking assessment. Understanding the factors influencing teachers
subjectivity in grading speaking and teachers perspective on rating scales will allow educators to
minimize the raters bias in giving score and to adjust their rating scales that will facilitate a more fair
judgment of students speaking performance.
Literature Review
Analytical vs. Holistic Rating Scale
Speaking is a productive skill, which does not only involve the production, but also the comprehension
(Karim & Haq, 2014).When it is tested, some considerations need to be taken into account, such as the
goal of skill development and the measurement of the skill as well (Li, 2011; Karim & Haq, 2014).
Karim & Haq (2014) argue that speaking skill is developed to communicate successfully by using
particular language in various context and situations; thus, every speaking testing should be able to
measure how language is used effectively to maintain interaction between speakers (Li, 2011).
To assess the speaking skill, teachers should be able to provide appropriate and suitable rating
scale. By developing scoring rubric or rating scale, it will standardize the grading process (Pufpaff,
Clarke, & Jones, 2014); thus the bias can be minimized if it is seriously followed (Hitt & Helms, 2009).
In addition, it will help teachers in achieving the goal setting of the language test (Lasito, 2015). There are
two approaches of creating rating scaleanalytical approach and holistic approach.

An analytical scale or a performance data driven scale is a method of assessment in which it


involves various features of performance to be assessed separately by using several subscales (Carr, 2000;
Tuan, 2012 ; Wiseman, 2012; Iwashita & Grove, 2003). It provides more information about test takers
competence (Tuan, 2012 ; Wiseman, 2012) by describing various level of ability across different features
of the students speaking performance (Carr, 2000; Iwashita & Grove, 2003). The analytical scale has
been found useful particularly for teachers or raters who are not well experienced (Tuan, 2012 ). In
addition, it is considered a more reliable rating scale (Wiseman, 2012; Iwashita & Grove, 2003; Tuan,
2012 ) since the result is more consistence (Iwashita & Grove, 2003; Carr, 2000). However, Tuan (2012;
Chou, 2013) shows that the analytical scale has some disadvatages in terms of its practicality, which are
time consuming and expensive. It means that it is difficult for raters to observe test takers performance,
read all of detailed descriptions, and mark everything at the same time.
On the other hand, a holistic scale or a rating checklist is a method using a single global numerical
rating to assess and examine test takers performance (Iwashita & Grove, 2003; Wiseman, 2012). It does
not consist of a long description of various criteria (Chou, 2013; Wiseman, 2012), but it sometimes only
consists of dual choices. The advantage of using the holistic scale is the simplicity and efficiency in its
implementation (Carr, 2000; Iwashita & Grove, 2003; Wiseman, 2012), in which it takes less time in
marking test takers performance, it is easier to be conducted, and it is more economical. However, Carr
(2000) and Iwashita & Grove (2003) find that the main problem of holistic scale lies in its validity, in
which it is still confusing what is that actually holistic score measures and whether the holistic score are
able to capture adequately the whole test takers performance.
Subjectivity issues
As stated in the introduction, speaking test is subjective in nature (Teng, 2007). It is very significant to
know the nature of speaking since the understanding of the nature of speaking test does not only help
describing the construct in question, but also makes it possible to identify factors influencing speaking
assessments (Babaii, Taghaddomi, & Pashmforoosh, 2015). Therefore, it can be stated that in assessing
speaking test, raters or teachers interpretation might be different because every raters or teachers might
have their own perception. Hsu (2015) states raters knowledge and perception might influence their
scoring judgments in assessing speaking.
This situation is proved by some studies investigating raters interpretation in speaking
performance. It is found that teachers are difficult to score students ability (Oscarson & Apelgren, 2011)
and they seem to focus on different aspects of assessment criteria when assessing speaking ability (Orr,
2002), leading to unfair judgment. Therefore, to comply with the need of the accuracy and
appropriateness of the interpretation of test takers performance, the raters fairness and unbiased
judgment on speaking ability are required (Chou, 2013).
In order to provide a fairness and unbiased judgment on speaking ability, some factors influencing
raters subjectivity in scoring speaking performance should come to the fore. Wang (2010) finds that
raters aspectsvariability and tendency, are considered as the factors influencing their judgment in
giving score of speaking performance. Moreover, Taylor (2009; in Chou, 2013) reveals socio-cultural
sensitivities and subject to preconception seem influence the decision made by raters or teachers while
scoring. In spite of those factors, there might be another factors which will influence raters subjectivity in
scoring speaking performance, as will be revealed in this present study.
Methods
To answer the research questions, which are (1) How do teachers perceive a performance data-driven
scale (analytical scale) and a rating checklist (holistic scale) in grading speaking performance?, and (2)
What factors do influence teachers subjectivity in testing speaking in Indonesia context?, four teachers
are chosen purposively to provide a specific intention (Yin, 2011), which is, figuring out the teachers
perception and subjectivity factors influencing assessing speaking performance. These teachers are
selected due to some reasons(1) they are still contributing in teaching life, and (2) have enough
knowledge in assessing speaking (Sugiyono, 2013) since they have already used analytical and holistic
rating scales in scoring speaking.
All of the data were collected through a close-ended questionnaire and a semi-structure interview.
The questionnaire was used for gaining information about teachers perception towards two rating scales
a performance data-driven scale (analytical scale) and a rating checklist (holistic scale). The
questionnaire consists of 14 questions adapted from Chou (2013).In addition, the semi-structure interview
was conducted in order to find out factors influencing in assessing students speaking performance.
After all of necessary data were collected, the data from the questionnaire were categorized based
on the questions; while the data from the interview were transcribed and classified based on the answer.

Next, those data, both the questionnaire and the interview data were analyzed and interpreted by using
relevant theories underlying speaking assessment.
Findings and Discussion
Since this study aims at answering research questions proposed in the Introduction, the findings and
discussion will cover those instancesteachers attitude towards two rating scales; a performance datadriven scale (analytical scale) and a rating checklist (holistic scale), and the subjectivity factors
influencing students speaking scores.
Teachers attitudes towards a performance data-driven scale (analytical scale) and a rating checklist
(holistic scale)
Teachers or raters attitude toward rating scales will determine the way they score speaking performance,
hence it is crucial knowing their attitude towards the two rating scales The findings of teachers
perception on two rating scales are presented in Table 1, which are as the result of close-ended
questionnaire.
Questions
Answers
Yes
No
Performance Data Scale (Analytical method)
1
4 teachers
2
4 teachers
3
4 teachers
4
2 teachers
2 teachers
5
4 teachers
6
1 teachers
3 teachers
7
4 teachers
Checklist Scale (Holistical method)
8
1 teacher
3 teachers
9
4 teachers
10
4 teachers
11
4 teachers
12
3 teachers
1 teacher
13
4 teachers
14
4 teachers
Table 1. Teachers attitude towards two rating scales.
Based on Table 1, it is found that there are various perceptions towards those two scalessome of the
responses are similar, some of them are different. It can be seen that both analytical and holistic scale are
considered easy and helpful. It seems that they think that both of rating scales can be used in assessing
students speaking ability.
In regards to Teachers attitudes towards a performance data-driven scale (analytical scale),as it is
shown on Table 1, most of the teachers state the performance data driven scale provides more details and
comprehensive description, so it is easier for them to score students speaking performance more
accurately in different context. This findings indicate that the performance data driven scale (analytical
scale) provides more information about test takers competence (Tuan, 2012 ; Wiseman, 2012) by
describing various level of ability across different features of the students speaking performance
precisely (Carr, 2000; Iwashita & Grove, 2003). Moreover, there is still a prominent debate on the issues
of its practicality. In terms of time, only two out of four respondents argue that the performance data
driven scale is time consuming. It seems the issues of practicality of time, in which some pervious
research stated that this scale is time consuming (Chou, 2013; Tuan, 2012 ) are still debatable. The
other two teachers investigated claim that the performance data driven scale does not take much time in
assessing speaking performance. Furthermore, in terms of cost, three of them state the performance data
driven scale is cheap to be applied, while one out of four articulates that this scale is expensive to be used.
It seems that the finding almost rejects Chous (2013) and Tuans (2012) work stating that the
performance data driven scale is an expensive method of scoring.
Concerning a rating checklist (holistic scale), as it is presented on Table 1, all of investigated
teachers claim that checklist scale contains subjectivity in grading students speaking performance. This
finding is relevant to Carrs (2000) and Iwashita & Groves (2003) works stating that checklist scale

trigers some confusions on what is that actually checklist score measures and whether the checklist score
are able to capture adequately the whole test taker performance. In addition, they all agree that a rating
checklist contains simpler and short description, so it does not take much time in observing and marking
the students speaking performance; hence it needs less cost in its implementation. This finding indicates
that a rating checklist, according to Carr (2000); Iwashita & Grove (2003); Wiseman (2012), is efficient
in terms of time and cost.
Teachers subjectivity factors
Finding out teachers subjectivity factors might reduce unfair judgment in assessing the speaking test. The
finding in this present study reveals that there are several subjectivity factors influencing students
speaking scorewhich are socio-cultural, accent, strategy of communication, and expressing idea. Those
findings are shown in Table 2.
Questions
1

Answers
Yes
2 teachers

Further information (Teachers Response)


No
2 teachers

Teacher 1

Teacher 2

2 teachers

2 teachers

Teacher 1
Teacher 2

2 teachers

2 teachers

Teacher 3

Teacher 2

Teacher 1
Teacher 2
4

2 teachers

2 teacher

Teacher 4

Teacher 3
Teachers 2

Teacher 1
5

4 teachers

All teachers

As teachers we have to consider students social cultural


background. Students ability in mastering the language will be
different, sometimes students social cultural background will be
shown in the way they express their arguments. Since I teach in a
vocational high school, students will have different social cultural
background and as a teacher it will be fair for them if I also
consider their condition based on their social cultural
background.
It is because social cultural background is one factor that
influence students performance. I can consider in students
performance score.
As long as their English is eligible I will consider that they have
achieve the basic competence that stated in the syllabus.
It is because each students has different and has their own
characteristics. By considering when their performance.
Because the way we communicate means the way we think. We
know who they are through how they communicate with us. So,
it affect to their score.
It is because the way of people in using language and in
communicating determine their level in thinking and education
level. It is often influence my grading system.
Although it is important, I will not give much attention to it.
It is necessary, but my attention to this aspect is not big
enough.
It is because when a student is able to articulate his or her idea
smoothly and fluently, it indicates something right? Of course I
will give different score to students who are able to say what they
have in minds clearly and fluently.
I will not give big attention to this aspect, but it is still crucial.
there is very limited students who are brave to deliver their idea,
so it becomes one of my attention, but I will not give higher
weight on this aspect.
If my students can elaborate something properly and it is not out
of context I will consider that their capability is accaptable.
All of them agree that it is important aspects in assessing
speaking since it is one of indicators that distinct low achiever
and high achiever.

Table 2. Teachers subjectivity factors


As it is presented on Table 2, two out of four teachers state that social-cultural might influence their
decision of grading. One of them thinks students have different background, which influence the way they
communicate in English. On the other hand, the others two do not considered this aspect. This different
opinion clarifies that socio-cultural background leads to teachers subjectivity in grading students
speaking ability (Taylor, 2009; in Chou, 2013). It is because some of them argue that there is no
relationship between social background and speaking performance, while some of them have another
opinion.

The debatable opinion also appears in terms of accent of the speaking. Some of them think the
accent the students possess might influence the way they pronounce the words, while others argue that the
accent do not influence students performance as long as it is intelligible. In addition, it is found that
teachers give different scores in terms of the way students communicate and articulate the ideas. Two out
of four teachers argues that the communication style and expressing idea logically represent students
level of intelligence, so they might give a higher weight for this part; while others claim it is important
knowing students communication style and how the articulate their idea; yet, they will not give much
attention to it. Those debatable decisions in scoring students speaking performance, according to Chou
(2013) and Wang (2010), indicate raters variability and tendency in perceiving something lead to
prejudice judgment.
Conclusion
From the discussion elaborated, it can be seen that every teacher has their own attitude towards two rating
scales; however, it seems that their attitude towards a performance data-driven scale (analytical scale) is
more debatable than a rating checklist (holistic scale). In addition, it seems that subjectivity factors
influencing teachers decision is caused by some factors, yet the most influential factors affecting
teachers subjectivity are social- cultural background, accent, the way of expressing idea and
communication style. It is because teachers have their own perception towards those factors.
Furthermore, this study serves several implications to the practice of testing speaking and other
researchers who want to conduct similar study. In terms of testing speaking practice, it is imperative for
teachers understand the nature of speaking and the rating scales that will be used, hence the more fair
judgments can be created. In regards to further study, it is better if the further study could give more
comprehensive on different perspective and focuses with more representative sample in investigating the
practice of speaking.
References
Babaii, E., Taghaddomi, S., & Pashmforoosh, R. 2015. Speaking self-assessment: Mismatches between
learnersand teachers criteria. Sage, 43, 1-27.
Carr, N. T. 2000. A Comparison of the effects of analytic and holistic rating scale types in the context of
composition tests. IAL.
Chou, M.-h. 2013. Teacher interpretation of test scores and feedback to students in efl classrooms: A
comparison of two rating methods. Higher Education Studies, 3(2), 86-95.
Hitt, A. M., & Helms, C. E. 2009. Best in show: Teaching old dogs to use new rubrics. The Professional
Educator, 33(1).
Hsu, T. H.-L. 2015. Removing bias towards world englishes: the development of a rater atituted
instruemnt using indian english as a stimulus. SAGE, 1-23.
Iwashita, N., & Grove, E. 2003. A comparison of analytic and holistic scales in the context of a specific
purpose speaking test. Prospect ,18(3), 25-35.
Karim, S., & Haq, N. 2014. An assessment of IELTS speaking test. International Journal Of Evaluation
And Research In Education (IJERE), 3(3),152-157.
Kim, H. J. 2006. Issues of rating scale in speaking performance assessment .Teachers College, Columbia
University Working Papers in TESOL & Applied Linguistics, 6(2), 1-3.
Knight, B. 1992. Assessing speaking test: A workshop for teacher development. ELT Journal Oxford
University, 46(3), 294-302.
Lasito. 2015. Speaking test for medical internship program: A Construct analysis and test development.
The 7th COTEFL International Conference (pp. 81-84). Purwokerto : Universitas
Muhammadiyah Purwokerto.
Li, W. 2011. The validity considerations in designing an oral test. Academy Publisher, 2(1), 267-269.
Orr, M. 2002. The FCE speaking test: Using rater reports to help interpret test scores. System, 143-154.
Oscarson, M., & Apelgren, B. M. 2011. Mapping language teachers conceptions of student assessment
procedures in relation to grading: A two-stage empirical inquiry. System, 2-16.
Pufpaff, L. A., Clarke, L., & Jones, E. R. 2014. The Effects of rater training on inter-rater agreement.
Mid-Western Educational Researcher.
Sugiyono. 2013. Metode penelitian pendidikan: pendekatan kuantitatif, kualitatif, dan R&D. Bandung:
Alfabeta.
Teng, H.-C. 2007. A study of task type for L2 speaking assessment. Honolulu: Eric.
Tuan, L. T. 2012. Teaching and assessing speaking performance through analytic scoring approach.
Academy Publisher, 2(4), 673-679.
Wang, B. 2010. On rater agreement and rater training . English Language Teaching, 3(1) 108-112.

Wiseman, C. S. 2012. A comparison of the performance of analytic vs. holistic scoring rubrics to assess l2
writing. Iranian Journal of Language Testing, 2 (1), 59-92.
Yin, R. K. 2011. Qualitative research from start to finish. United States of America: The Guilford Press.

Das könnte Ihnen auch gefallen