Beruflich Dokumente
Kultur Dokumente
Interdisciplinary
Research [ISSN:2320-1460].Volume II, Issue IV, Oct.-Dec., 2014. P. 61-66
Abstract
The last three decades have seen developments in the exam production and
evaluation process. Because Language testing at any level is a highly complex
undertaking that must be based on theory as well as practice. Language testing
traditionally is more concerned with the production, development and analysis of
tests. Recent critical and ethical approaches to language testing have placed more
emphasis on the qualities of a good test. Of the major features of a good test involved
in performance assessment, Reliability, Validity, Practicality, Discrimination And
Authenticity in particular have been of great concern to language testers and
educators. In this regard, it is the intent of this paper to briefly discuss these five
issues of a good test with special reference to language performance assessment.
Introduction:
A test's usefulness, according to Bachman and Palmer (1996), can be
determined by considering the measurements qualities of the test such as reliability,
validity, practicality, discrimination and authenticity. These qualities can easily
describe a good language test's usefulness. The test usefulness is the most important
quality or cornerstone of testing. They state that test usefulness provides a kind of metric
by which we can evaluate not only the tests that we develop and use, but also all aspects
of test development and use.
A good test should have a positive effect on learning, and teaching should
result in improved learning habits. Such a test will aim at locating the specific and
precise areas of difficulties experienced by the class or the individual student so that
assistance in the form of additional practice and corrective exercises can be given.
The test should enable the teacher to find out which parts of the language program
cause difficulty for the class. In this way, the teacher can evaluate the effectiveness of
the syllabus as well as the methods and materials he or she is using. A good test
should also motivate by measuring student performance without in any way setting
"traps" for them. A well-developed test should provide an opportunity for students to
show their ability to perform certain language tasks. A test should be constructed with
1
Published in National Journal of Extensive Education and. Interdisciplinary
Research [ISSN:2320-1460].Volume II, Issue IV, Oct.-Dec., 2014. P. 61-66
the goal of having students learn from their weaknesses. In this way a good test can be
used as a valuable teaching tool.
The major five features of a good test namely: Reliability, Validity,
Practicality, Discrimination and Authenticity will be discussed briefly.
1)Reliability:
A good test should be reliably. This means that the results of a test should be
dependable. They should be consistent (remain stable, should not produce different
results when it is used in different days). A test that is reliable will yield similar
results with similar group of students took the same test on two occasions, and their
results are roughly the same—then the test will be called a reliable test. If the results
are very different. Then the test is not reliable.
A test is also reliable in the following cases :
a) If two comparative groups of students (students of similar abilities) score similar
marks even if the test is given to them on two different days(provided that the
students have not compared notes and prepared specially for it). If on the other
hand, the results are so different, that in one group, the students score above
average marks and the students in the other group fare badly, then the test is
unreliable.
b) A test is reliable if students are marked by different teachers, and this does not
produce high different marks.
c) Finally, a test is reliable if it has been properly administered. A 'perfect' test
administration is one of that allows all examinees to perform at their best level
under identical conditions. Conditions outside the test itself (e.g., the seating
arrangement, bad acoustics, etc.) must not stop a student from performing at his /
her best level. Thus a reliability has three aspects to it: reliability of the test itself,
the reliability of the way in which it has been marked, and the reliability of the
way in which it has been administered.
Assessing the Three Aspects of Reliability
There are three aspects of reliability, namely: equivalence, stability and
internal consistency (homogeneity).
The first aspect, equivalence, refers to the amount of agreement between two
or more instruments that are administered at nearly the same point in time.
2
Published in National Journal of Extensive Education and. Interdisciplinary
Research [ISSN:2320-1460].Volume II, Issue IV, Oct.-Dec., 2014. P. 61-66
3
Published in National Journal of Extensive Education and. Interdisciplinary
Research [ISSN:2320-1460].Volume II, Issue IV, Oct.-Dec., 2014. P. 61-66
• Fluctuations in the Learner: A variety of changes may take place within the
learner that either will introduce error or change the learners’ true score from test
to test. Examples of this type of change might be further learning or forgetting.
Influences such as fatigue, sickness, emotional problems and practice effect may
cause the test taker’s score to deviate from the score which reflects his/her actual
ability.
• Fluctuations in Scoring: Subjectivity in scoring or mechanical errors in the scoring
process may introduce error into scores and affect the reliability of the test’s
results. These kinds of errors usually occur within (intra-rater) or between (inter-
rater) the raters themselves.
• Fluctuations in Test Administration: Inconsistent administrative procedures and
testing conditions may reduce test reliability. This is most common in institutions
where different groups of students are tested in different locations on different days.
Reliability is an essential quality of test scores, because unless test scores are
relatively consistent, they cannot provide us with information about the abilities we
want to measure. A common theme in the assessment literature is the idea that
reliability and validity are closely interlocked. While reliability focuses on the
empirical aspects of the measurement process, validity focuses on the theoretical
aspects and seeks to interweave these concepts with the empirical ones. For this
reason it is easier to assess reliability than validity.
Yet some scholars observes that there are four general classes of reliability estimates,
each of which estimates reliability in a different way and each types of these have
different procedures to measure. These types of reliability are:
i. Inter-Rater or Inter-Observer Reliability
Used to assess the degree to which different raters/observers give consistent
estimates of the same phenomenon.
ii. Test-Retest Reliability
Used to assess the consistency of a measure from one time to another.
iii. Parallel-Forms Reliability
Used to assess the consistency of the results of two tests constructed in the same
way from the same content domain.
iv. Internal Consistency Reliability
Used to assess the consistency of results across items within a test.
4
Published in National Journal of Extensive Education and. Interdisciplinary
Research [ISSN:2320-1460].Volume II, Issue IV, Oct.-Dec., 2014. P. 61-66
2)Validity:
The term validity refers to "the extent to which the test measures what it says
it measures". Alderson, J.C. and Hughes, A (1981:135). In other words, test what you
teach, how you teach it. Types of validity include face validity, content validity,
Criterion- referenced validity and construct validity. For classroom teachers, content
validity means that the test assesses the course content and the outcomes using
formats familiar to the students. Construct validity refers to the 'fit' between the
underlying theories and the methodology of the language learning and the type of
assessment. For example, a communicative language learning approach must be
matched by communicative language testing. Face validity means that the test looks
as though it measures what it is supposed to measures. This is an important factor for
both students and administrators.
Types of Validity.
Investigations of test validity are, in general, investigations into the extent to
which a test measures what it is supposed to measure. This is however, a very general
definition of validity, and it is useful to distinguish among several different types of
validity. We will distinguish among four here.
Face validity
Face validity is the appearance of validity the extent to which a test looks like
it measures what it is supposed to, but without any empirical evidence that it does.
There is no statistical measure of face validity, and there is no generally accepted
procedure for determining that a test does or does not demonstrate face validity.
Example: a grammar test should test the grammar not the vocabulary. Thus, in
a grammar test, the vocabulary should be easy and vice versa.
Content validity
The second, and a much more important, type of validity is 'content validity'.
Content validity is the extent to which the selection of tasks one observes in a test
taking situation is representative of the larger set of tasks of which the test is assumed
to be a sample. A test needs to have a representative sample of the teaching/
instructional contents as defined and covered in the curriculum.
Criterion referenced validity
5
Published in National Journal of Extensive Education and. Interdisciplinary
Research [ISSN:2320-1460].Volume II, Issue IV, Oct.-Dec., 2014. P. 61-66
6
Published in National Journal of Extensive Education and. Interdisciplinary
Research [ISSN:2320-1460].Volume II, Issue IV, Oct.-Dec., 2014. P. 61-66
This is a kind of validity. It relates to the closeness between the score obtained
from a test with the other criteria outside that test. It is divided into two:
a) Concurrent validity: how well the test estimates current performance on
some valued measure other than the test itself.
b) Predictive Validity: how well the test predicts future performance on
some valued measure other than the test itself.
3)Practicality
Classroom teachers are familiar with practical issues, but they need to think of
how practical matters relate to testing. A good classroom test should be 'teacher -
friendly'. A teacher should be able to develop, administer and market within the
available time and with available resources.
"Practicality is the relationship between the resources that will be required in
design, development, and use of the test and the resources that will be available for
these activities" (Bachman and Palmer, 1996:36). They illustrated that this quality is
unlike the others because it focuses on how the test is conducted. Moreover, Bachman
and Palmer (1996) classified the addressed resources into three types: human
resources, material resources, and time.
Based on this definition, practicality can be measured by the availability of the
resources required to develop and conduct the test. Therefore, our judgment of the
language test is whether it is practical or impractical.
4)Discrimination:
All assessment is based on comparison, either between one student and
another, or between a student as he is now and as he was earlier. An important feature
of a good test is its capacity to discriminate among the performance of different
students or the same student in different points in time. The extent of the need to
discriminate will vary according to the purpose of the test .
5)Authenticity:
Bachman (1991) defines authenticity as the appropriateness of a language user’s
response to language as communication. The test items should be related to the target
language's use. Bachman and Palmer (1996) defined authenticity as the degree to
which a given language test's tasks' characteristics correspond to a target language use
task's features. Authenticity relates a test's task to the domain of generalization to
7
Published in National Journal of Extensive Education and. Interdisciplinary
Research [ISSN:2320-1460].Volume II, Issue IV, Oct.-Dec., 2014. P. 61-66
8
Published in National Journal of Extensive Education and. Interdisciplinary
Research [ISSN:2320-1460].Volume II, Issue IV, Oct.-Dec., 2014. P. 61-66