Sie sind auf Seite 1von 9


(based on evaluation of the English Language Test for the Saudi Arabian

evaluation should be for the
benefit of learners.

Eric Pearse


According to Rea-Dickins and Germain (1992), “ no important enterprise

should just go on and on without some kind of evaluation”. Teaching and
learning are not an exception. There is, certainly, some close relationship
between teaching and testing, but they are definitely not the same thing.
“Most teaching should not be testing, and should not be seen as a test by the
learners”(Scrivener, 1994).But we should be evaluating the learners’
performance and progress - and our own teaching - constantly. Evaluation is
essential in teaching.And tests are “the main instruments for
evaluation”(Harmer,2001) in most teaching situations. For this reason, this
investigation focuses on testing, using the English Language Test for Saudi
Arabian students (high school, pre-intermediate level) as a sample.


The researcher agrees with Davis and Pearse (2000), that "the purpose of
English language tests is to gather reliable evidence of what learners can do
in English and what they know of English", and believes that tests should be
designed for specific "teaching-learning situations". Nevertheless, it is
widely known that to create good tests is an extremely difficult task. Any
test-writer should keep in mind that a test should be "valid, reliable and
practical"(Fraenkel, 2003). However, after several years of teaching at
school, the researcher faced the problem of tests being irrelevant and,
sometimes, inappropriate for the course syllabus.On the example of an
English language test for Saudi Arabian students (high school,
pre-intermediate level, 1st session) the researcher will investigate levels of
validity and reliability of the progress test used in Saudi Arabian Secondary
Schools, and, taking into consideration previous research in this field and his
own teaching experience, will suggest some recommendations for
developing better tests in the future.

Several reasons could be named for justifying the necessity of this particular
test. They are :
-measuring what the students know;
-checking general progress and obtaining some feedback;
-identifying problem areas for further attention;
-giving each student a session grade;
-assessing the teacher's efficiency;
-evaluating ( to some extand) the course structure;
-reinforcing students' motivation.
The test, which lasts 3 hours, is supposed to evaluate reading and writing
skilles, as well as grammar and vocabulary knowledge obtained during the
1st session of Year 6. Together with Davies and Pearse (2000), the
researcher would classify this test as a progress test (short-term achievement
test) used to " check how well learners are doing after each lesson or unit,
and provide consolidation or remedial work if necessary. They usually focus
on the language that has recently been introduced and practised".Though,
according to Rea-Dickins (1992), it can be also classified as a course test
(longer-term achievement test) used to "find out how the students have done
over a whole course". The researcher tends to believe that a session is
definitely longer than a unit ( it normally includes several units), but it can
hardly be considered 'a course' either. Anyway, such a test provides a basis
for the marks which teachers give to learners at the end of a session, and
they are rather significant for the students, who feel very nervous before this
test. At the same time, such a test is a serious concern for testing the teachers
During the first session these students study such topics as ' Vacation Time',
' Travelling' , ' Places of interest to visit' , and such grammar as ' articles with
proper and geographic names' , ' passive forms', ' conditionals',etc. It can be
seen that the same items are being tested.The following tasks are used for
testing purposes : multiple choice (grammar, vocabulary), one-word
fill-in-gaps, correction of the mistakes, scan/skim/detailed reading, matching
the meaning and the definition, 'open the brackets', writing a narrative, etc.
Such a variety should provide good opportunities for composing a good test,
but the students critisize it for confusing instructions, difficulty and being
too long, whereas teachers are not satisfied with lack of listening testing and
sometimes irrelevant contents.They agree, though, that the test reflects items
taught, deals with structure, functions and vocabulary practiced, pays
attention to some macro skills and accuracy. So, the question arises whether
the existing test is really good (valid, reliable and practical).

An achievement language test is considered to have validity if :

- it contains only forms and uses the learners have practiced in the course;
- it employs only exercises and tasks that correspond to the general
objectives and methodology of the course. (Davis and Pearse,2000).
The first type of validity, called 'content validity'(Davies and Pearse, 2000),
means that the grammar, vocabulary and the functional content of any test
should be carefully selected and be based on the course syllabus. This is
undoubtfully true and logical. If the learners have not practiced some item,
they can not be tested on it ( grammar or vocabulary or some skills). The
language content of the test may go outside the syllabus only when it is not
essential or significant for completing the task : it may take place in the
reading exercise when students can guess the meaning of some words from
the context.
The second type of validity, called ' construct validity' (Davies and Pearse,
2000), means that the exercises and task used in the test should be similar to
those used in the course and correspond to the general demands and
approaches.If the students have never written any essays , they can not be
asked to do it at the test.Or, if the teacher used to introduce grammar only by
means of natural discourse - in conversations-, he can not test this grammar
suggesting to do some grammar book exercises.If the test conforms to these
principles , both teachers and students usually approve of it. If not, it will
cause many problems.
In our case the test meets both requirements and can be regarded as
reasonably valid :
- it tests vocabulary and grammar studied and practised by students during
the 1st semester;
- it includes the tasks similar to those the students come across in their
textbooks and workbooks (with the exception of 5A-4 and 5B-3, which don't
relate to the English language but to some general students' knowledge). The
test is not boring, it makes the students think , and it has rich variety of tasks.
At the same time, the test doesn't include any listening task (which may have
a negative "backwash effect"(Davis and Pearse, 2000) on learning
afterwards, due to the fact that students may lose interest in listening
activities - rather difficult, but not tested).Another weak point is that testing
speaking skills is rather artificial - 1B:"complete the following conversation"
- it is not actually speaking or communicating.
From all said above, it can be seen that despite being valid in general (covers
the items studied, follows the realistic context, uses exercises the students
are acquainted with), the test still has some shortcomings.


Reliability is another essential component of evaluating a test . It means how

far the teachers can believe or trust the results of a test. Sometimes it
happens that the teacher has two very similar in ability students' groups, but
they get different results in the same test after the same course. This is when
one should start asking questions about reliability.
Evaluating the reliability of the test under review the researcher can't but
agree with Davies and Pearse (2000) on that " the reability of a test depends
on :
-its length (a long test is usually more reliable than a short one);
- its administration ( teacher's help, invigilation );
- objective marking;
- clear instructions;
- absence of errors in the test;
- clear marking criteria" , and some other factors.
The test in question is long enough and has rich contents which adds as
positive to the reliability. Though, in the researcher's opinion, it could be
advisable to make it last 2 hours instead of 3, and have 1 vocabulary and 1
grammar task instead of 2 each.
The compromise between using both " objectively marked" tasks ( multiple
choice, one-word fill-in exercises) and "subjectively marked" tasks (
composition, 'conversation') (Heaton, 1990) also makes this test both reliable
and practical. However, as it had been mentioned above , listening and
speaking should also be taken into consideration for testing.
The instructions, being reasonably clear, still leave much to be desired. For
example, 2A- doesn't spesify which mistakes : grammar or vocabulary, or
both. Or, in case with the Reading task : students will be confused with 4
different tasks given after the text, as they are used to
pre-reding,while-reading and post-reading tasks structure of 'reading'
There are no errors in the test itself which also adds to its reliability, and one
can presume that teachers will administer it properly (not helping students
and controlling possibility of cheating). However, there is still one more
dubious point in reliability, and that is marking criteria. It is rather difficult
to mark 1A and 1B, and it would be advisable to have the same number of
tasks for each assessed skill.

One should't forget that newly designed English textbooks used in the
education process in Saudi Arabia today are mostly based on the
communicative approach. And , according to the recent research "high
validity [ of a test ], espesially for communicative courses, usually means
low reliability, and vice versa" ( Hughes, 2003) . Summarizing the analysis
of validity and reliability of the test under discussion, the researcher admits
that the test-writer definitely tried to combine testing exercises at both ends
with the aim of achieving a good balance of reliability and validity, although
there is always room for improvement.
Taking into consideration the above analysis, the researcher would
recommend the following ways of improving the existing English language
Firstly, all four language skills (reading, writing,listening and speaking)
should be included into the test as all of them are integral parts of the
existing course syllabus.
Secondly, tasks 1A and 1B should be separated, 1B enlarged, and clear
marking criteria for them worked out.
Thirdly, the instructions before the tasks should be clearer (see 2A), and
tasks themselves should all relate to English testing (not teaching as in 5B-3
or testing general knowledge as in 5A-4).
Lastly, as it has already been mentioned, the researcher would advise giving
fewer tasks( unite grammar testing 2 and 5, as well as vocabulary testing 3
and 6), and allowing less time for the test completion ( very long tests are
tiresome, students lose concentration and do not always perform to their


Hughes (2003) in his 'Testing for Language Teachers' pointed out that " in
general, tests that conform to the criteria for validity and reliability are better
than those that don't". Looking back at his own teaching experience, the
researcher came to the conclusion that the tests should be able to find out if
the learners can use the English language in real life, not just if they can do
artificial exercises in unreal contexts.A valid test for a course with
communicative objectives shoul include exercises and tasks in which the
students use the language they learned in realistic way.For example, they
could write a letter or role-play some dialogue. Such tasks would test their
ability to use specific grammar and vocabulary, use written language
effectively, and understand effective spoken English. The fact that tests can
influence (positively or negatively) the way teachers teach and learners
study,implies great responsibility of test-writers and demands constant and
continuous test assessment, improving and development based on teachers'
and students' feedback.


Davies, P. & Pearse, E. (2000). Success in English Teaching. Oxford : OUP.

Fraenkel, J. & Wallen, N. (2003). How to Design and Evaluate Research in

Education. (5th Ed.).New York : McGrow Hill.
Harmer, J. (2001). The Practice of English Language Teaching. London :

Heaton, J. (1990). Classroom Testing. Harlow : Longman.

Hughes, A.(2003). Testing for Language Teachers. Cambridge : CUP.

Rea-Dickins, P. & Germaine, K. (1992). Evaluation. Oxford : OUP.

Scrivener, J. (1994). Learning Teaching : A Guidebook for English

Language Teachers. Oxford : Heinemann.