Sie sind auf Seite 1von 9

Published in National Journal of Extensive Education and.

Interdisciplinary
Research [ISSN:2320-1460].Volume II, Issue IV, Oct.-Dec., 2014. P. 61-66

FIVE CHARACTERISTICS OF A GOOD LANGUAGE TEST


Mr. Hussein Ahmed Abdo Rajhy
Lecturer, College of Education-Abs, Hajjah University Yemen
Ph. D. Researcher, Dep. Of Education, Dr. B.A.M. University Aurangabad
husseinragehy@yahoo.com

Abstract
The last three decades have seen developments in the exam production and
evaluation process. Because Language testing at any level is a highly complex
undertaking that must be based on theory as well as practice. Language testing
traditionally is more concerned with the production, development and analysis of
tests. Recent critical and ethical approaches to language testing have placed more
emphasis on the qualities of a good test. Of the major features of a good test involved
in performance assessment, Reliability, Validity, Practicality, Discrimination And
Authenticity in particular have been of great concern to language testers and
educators. In this regard, it is the intent of this paper to briefly discuss these five
issues of a good test with special reference to language performance assessment.

Key words: Test , Reliability, Validity, Practicality, Discrimination And Authenticity.

Introduction:
A test's usefulness, according to Bachman and Palmer (1996), can be
determined by considering the measurements qualities of the test such as reliability,
validity, practicality, discrimination and authenticity. These qualities can easily
describe a good language test's usefulness. The test usefulness is the most important
quality or cornerstone of testing. They state that test usefulness provides a kind of metric
by which we can evaluate not only the tests that we develop and use, but also all aspects
of test development and use.
A good test should have a positive effect on learning, and teaching should
result in improved learning habits. Such a test will aim at locating the specific and
precise areas of difficulties experienced by the class or the individual student so that
assistance in the form of additional practice and corrective exercises can be given.
The test should enable the teacher to find out which parts of the language program
cause difficulty for the class. In this way, the teacher can evaluate the effectiveness of
the syllabus as well as the methods and materials he or she is using. A good test
should also motivate by measuring student performance without in any way setting
"traps" for them. A well-developed test should provide an opportunity for students to
show their ability to perform certain language tasks. A test should be constructed with

1
Published in National Journal of Extensive Education and. Interdisciplinary
Research [ISSN:2320-1460].Volume II, Issue IV, Oct.-Dec., 2014. P. 61-66

the goal of having students learn from their weaknesses. In this way a good test can be
used as a valuable teaching tool.
The major five features of a good test namely: Reliability, Validity,
Practicality, Discrimination and Authenticity will be discussed briefly.
1)Reliability:
A good test should be reliably. This means that the results of a test should be
dependable. They should be consistent (remain stable, should not produce different
results when it is used in different days). A test that is reliable will yield similar
results with similar group of students took the same test on two occasions, and their
results are roughly the same—then the test will be called a reliable test. If the results
are very different. Then the test is not reliable.
A test is also reliable in the following cases :
a) If two comparative groups of students (students of similar abilities) score similar
marks even if the test is given to them on two different days(provided that the
students have not compared notes and prepared specially for it). If on the other
hand, the results are so different, that in one group, the students score above
average marks and the students in the other group fare badly, then the test is
unreliable.
b) A test is reliable if students are marked by different teachers, and this does not
produce high different marks.
c) Finally, a test is reliable if it has been properly administered. A 'perfect' test
administration is one of that allows all examinees to perform at their best level
under identical conditions. Conditions outside the test itself (e.g., the seating
arrangement, bad acoustics, etc.) must not stop a student from performing at his /
her best level. Thus a reliability has three aspects to it: reliability of the test itself,
the reliability of the way in which it has been marked, and the reliability of the
way in which it has been administered.
Assessing the Three Aspects of Reliability
There are three aspects of reliability, namely: equivalence, stability and
internal consistency (homogeneity).
The first aspect, equivalence, refers to the amount of agreement between two
or more instruments that are administered at nearly the same point in time.

2
Published in National Journal of Extensive Education and. Interdisciplinary
Research [ISSN:2320-1460].Volume II, Issue IV, Oct.-Dec., 2014. P. 61-66

Equivalence is measured through a parallel forms procedure in which one administers


alternative forms of the same measure to either the same group or different group of
respondents. This administration of the various forms occurs at the same time or
following some time delay.
The second aspect of reliability, stability, is said to occur when the same or
similar scores are obtained with repeated testing with the same group of respondents.
In other words, the scores are consistent from one time to the next. Stability is
assessed through a test-retest procedure that involves administering the same
measurement instrument to the same individuals under the same conditions after some
period of time. Test-rest reliability is estimated with correlations between the scores at
Time 1 and those at Time 2 (to Time x). Two assumptions underlie the use of the test-
retest procedure. The first required assumption is that the characteristic that is
measured does not change over the time period. The second assumption is that the
time period is long enough that the respondents’ memories of taking the test at Time 1
does not influence their scores at the second and subsequent test administrations.
The third and last aspect of reliability is internal consistency (or
homogeneity). Internal consistency concerns the extent to which items on the test or
instrument are measuring the same thing. If, for example, you are developing a test to
measure organizational commitment you should determine the reliability of each item.
If the individual items are highly correlated with each other you can be highly
confident in the reliability of the entire scale.
Coombe C. and Hubley N. (2003) observed: Three important factors affect test
reliability. Test factors such as the formats and content of the questions and the length
of the exam must be consistent. For example, testing research shows that longer
exams produce more reliable results than very brief quizzes. In general, the more
items on a test, the more reliable it is considered to be. Administrative factors are also
important for reliability. These include the classroom setting (lighting, seating
arrangements, acoustics, lack of intrusive noise etc.) and how the teacher manages the
exam administration. Affective factors in the response of individual students can also
affect reliability. Test anxiety can be allayed by coaching students in good test-taking
strategies
Henning (1987) describes a number of threats to test reliability. These factors
have been shown to introduce fluctuations in test scores and thus reduce reliability.

3
Published in National Journal of Extensive Education and. Interdisciplinary
Research [ISSN:2320-1460].Volume II, Issue IV, Oct.-Dec., 2014. P. 61-66

• Fluctuations in the Learner: A variety of changes may take place within the
learner that either will introduce error or change the learners’ true score from test
to test. Examples of this type of change might be further learning or forgetting.
Influences such as fatigue, sickness, emotional problems and practice effect may
cause the test taker’s score to deviate from the score which reflects his/her actual
ability.
• Fluctuations in Scoring: Subjectivity in scoring or mechanical errors in the scoring
process may introduce error into scores and affect the reliability of the test’s
results. These kinds of errors usually occur within (intra-rater) or between (inter-
rater) the raters themselves.
• Fluctuations in Test Administration: Inconsistent administrative procedures and
testing conditions may reduce test reliability. This is most common in institutions
where different groups of students are tested in different locations on different days.
Reliability is an essential quality of test scores, because unless test scores are
relatively consistent, they cannot provide us with information about the abilities we
want to measure. A common theme in the assessment literature is the idea that
reliability and validity are closely interlocked. While reliability focuses on the
empirical aspects of the measurement process, validity focuses on the theoretical
aspects and seeks to interweave these concepts with the empirical ones. For this
reason it is easier to assess reliability than validity.
Yet some scholars observes that there are four general classes of reliability estimates,
each of which estimates reliability in a different way and each types of these have
different procedures to measure. These types of reliability are:
i. Inter-Rater or Inter-Observer Reliability
Used to assess the degree to which different raters/observers give consistent
estimates of the same phenomenon.
ii. Test-Retest Reliability
Used to assess the consistency of a measure from one time to another.
iii. Parallel-Forms Reliability
Used to assess the consistency of the results of two tests constructed in the same
way from the same content domain.
iv. Internal Consistency Reliability
Used to assess the consistency of results across items within a test.

4
Published in National Journal of Extensive Education and. Interdisciplinary
Research [ISSN:2320-1460].Volume II, Issue IV, Oct.-Dec., 2014. P. 61-66

2)Validity:
The term validity refers to "the extent to which the test measures what it says
it measures". Alderson, J.C. and Hughes, A (1981:135). In other words, test what you
teach, how you teach it. Types of validity include face validity, content validity,
Criterion- referenced validity and construct validity. For classroom teachers, content
validity means that the test assesses the course content and the outcomes using
formats familiar to the students. Construct validity refers to the 'fit' between the
underlying theories and the methodology of the language learning and the type of
assessment. For example, a communicative language learning approach must be
matched by communicative language testing. Face validity means that the test looks
as though it measures what it is supposed to measures. This is an important factor for
both students and administrators.
Types of Validity.
Investigations of test validity are, in general, investigations into the extent to
which a test measures what it is supposed to measure. This is however, a very general
definition of validity, and it is useful to distinguish among several different types of
validity. We will distinguish among four here.
Face validity
Face validity is the appearance of validity the extent to which a test looks like
it measures what it is supposed to, but without any empirical evidence that it does.
There is no statistical measure of face validity, and there is no generally accepted
procedure for determining that a test does or does not demonstrate face validity.
Example: a grammar test should test the grammar not the vocabulary. Thus, in
a grammar test, the vocabulary should be easy and vice versa.
Content validity
The second, and a much more important, type of validity is 'content validity'.
Content validity is the extent to which the selection of tasks one observes in a test
taking situation is representative of the larger set of tasks of which the test is assumed
to be a sample. A test needs to have a representative sample of the teaching/
instructional contents as defined and covered in the curriculum.
Criterion referenced validity

5
Published in National Journal of Extensive Education and. Interdisciplinary
Research [ISSN:2320-1460].Volume II, Issue IV, Oct.-Dec., 2014. P. 61-66

Another important but controversial type of validation is 'criterion-referenced


validity'. Criterion-referenced validity is the extent to which a test predicts something
that is considered important.
It is important to note that in criterion-referenced validity, knowing exactly
what a test measures is not crucial, so long as whatever is measured is a good
predictor of the criterion behaviour. For example, a score on a translation test from a
student's native language into English might be a very good predictor of how well a
student would do in courses in an English-medium university.
Construct validity
The fourth type of validity is the relation, between a test and the Psychological
abilities it measures. This characteristics called construct validity - the extent to which
a test, or a set of tests, yield scores which behave in the ways one would predict they
should if the researcher's theory of what is in the mind of the subject is correct. For
example, if it is claimed that a test measures 'knowledge of grammar', one should be
able to demonstrate that one can measure knowledge of grammar (as a psychological
property) to a certain extent independently of other purported psychological properties
such as 'knowledge of vocabulary', 'knowledge of the writing system', 'ability to
reason verbally', etc. Alderson, J.C. and Hughes, A (1981).
Construct validation in the language testing field, then, is a process of
hypothesis formation and hypothesis !fasting that allows the investigator to slowly
zero in on the nature of the competence of the language user. As more and more
construct validation studies are completed. Researcher scan say with more and more
conviction that the evidence tends to support one position, and not another one.
According to Alderson (2000), “the term construct validity is used to refer to
the general, overarching notion of validity”. Therefore, the main focus of discussing
the test's validity is construct validity, in addition to some issues regarding this test's
content validity.
According to Bachman and Palmer (1996) and Weir, (2005)., the term construct
validity refers to the extent to which people can interpret a given test score as an
indicator of the abilities or constructs that people want to measure. However, no test is
entirely valid because validation is an ongoing process .
Empirical Validity

6
Published in National Journal of Extensive Education and. Interdisciplinary
Research [ISSN:2320-1460].Volume II, Issue IV, Oct.-Dec., 2014. P. 61-66

This is a kind of validity. It relates to the closeness between the score obtained
from a test with the other criteria outside that test. It is divided into two:
a) Concurrent validity: how well the test estimates current performance on
some valued measure other than the test itself.
b) Predictive Validity: how well the test predicts future performance on
some valued measure other than the test itself.
3)Practicality
Classroom teachers are familiar with practical issues, but they need to think of
how practical matters relate to testing. A good classroom test should be 'teacher -
friendly'. A teacher should be able to develop, administer and market within the
available time and with available resources.
"Practicality is the relationship between the resources that will be required in
design, development, and use of the test and the resources that will be available for
these activities" (Bachman and Palmer, 1996:36). They illustrated that this quality is
unlike the others because it focuses on how the test is conducted. Moreover, Bachman
and Palmer (1996) classified the addressed resources into three types: human
resources, material resources, and time.
Based on this definition, practicality can be measured by the availability of the
resources required to develop and conduct the test. Therefore, our judgment of the
language test is whether it is practical or impractical.
4)Discrimination:
All assessment is based on comparison, either between one student and
another, or between a student as he is now and as he was earlier. An important feature
of a good test is its capacity to discriminate among the performance of different
students or the same student in different points in time. The extent of the need to
discriminate will vary according to the purpose of the test .
5)Authenticity:
Bachman (1991) defines authenticity as the appropriateness of a language user’s
response to language as communication. The test items should be related to the target
language's use. Bachman and Palmer (1996) defined authenticity as the degree to
which a given language test's tasks' characteristics correspond to a target language use
task's features. Authenticity relates a test's task to the domain of generalization to

7
Published in National Journal of Extensive Education and. Interdisciplinary
Research [ISSN:2320-1460].Volume II, Issue IV, Oct.-Dec., 2014. P. 61-66

which we want our scores' interpretations to be generalized. It potentially affects test


takers' perceptions of the test and their performance (Bachman, 2000).
In conclusion, what we need in order to evaluate a student properly is to utilize
the an assessment tool having the features and qualities of a good test highlighted
above. A test should be constructed with the goal of having students learn from their
weaknesses. It will locate the exact areas of difficulties experienced by the class or the
individual student so that assistance in the form of additional practice and corrective
exercises can be given. The more a test has good qualities , the accurate results a
tester can draw about the testees and right decisions would be taken accordingly.
References
1) Adediran A. Taiwo (1995) Fundamentals of Classroom Testing. New Delhi: Vikas
Publishing House PVT LTD.
2) Aggarwal J. C. (1997) Essentials of Examination System Evaluation, Tests and
Measurement. New Delhi: Vikas Publishing House PVT LTD.
3) Alderson, J.C. and Hughes, A (eds.) (1981). Issues in Language Testing. ELT
Documents 111. London: The British Council.
4)Alderson, J. C. (2005). Diagnosing foreign language proficiency: The interface
between learning and assessment. London, UK: Continuum
5)Alderson, J.C., Clapham, C.M. and Wall, D. (1995) Language Test Construction
and Evaluation, Cambridge University Press, Cambridge.
6)Bachman, L. F., & Palmer, A. S. (1996). Language testing in practice: Designing
and developing useful language tests. Oxford: Oxford University Press.
7)Bloxham, S., & Boyd, P. (2007). Developing effective assessment in higher
education: a practical guide. Maidenhead: Open University Press.
8)Boud, D. (1998, 4-5 November). Assessment and Learning - Unlearning Bad Habits
of Assessment. Paper presented at the Effective Assessment at University Conference.
Retrieved from
http://damianeducationresearchlinks.wikispaces.com/file/view/unlearningassessment_Boud.pdf
9)Boud, D., & Falchikov, N. (2007). Rethinking assessment in higher education:
Learning for the longer term. London: Routledge
10)Braine, G. (Ed.). (2005). Teaching English to the world: History, curriculum, and
practice. Mahwah, New Jersey: Lawrence Erlbaum Associates.
11)Bryan, C. and Clegg, K. (eds.) (2006) Innovative Assessment in Higher Education.
Abingdon, Routledge.
12)Coombe C. and Hubley N. (2003) Fundamentals of Language Assessment: A
Practical Guide for teachers in the Gulf
13)Cyril J. Weir (2005) Language Testing and Validation: An Evidence-based
Approach. Palgrave Macmillan
14)Donna Heiland and Laura J. Rosenthal (eds.) (2011) Literary Study, Measurement,
and the Sublime: Disciplinary Assessment . New York: The Teagle Foundation
15)Fulcher, G. (2010) Practical Language Testing. London: Hodder Education.
16)Fulcher, Glenn, and Fred Davidson (2007). Language Testing and Assessment: An
Advanced Resource Book. New York : Routledg.
17)Hall, G. (2005). Literature in language education. New York: Palgrave Macmillan

8
Published in National Journal of Extensive Education and. Interdisciplinary
Research [ISSN:2320-1460].Volume II, Issue IV, Oct.-Dec., 2014. P. 61-66

18)Heidi A. and Cizek G.J. (2010). Handbook of Formative Assessment. Routledge.


Taylor & Francis Group, New York .
19)Henning G. (1987). A guide to language testing. Cambridge, Mass :Newbury
House.
20)Joughin Gordon (Ed.), Assessment, Learning and Judgement in Higher Education
Wollongong. Australia : Springer.
21)Scott, D. & Morrison, M. (2007) Key Ideas in Educational Research London:
Continuum.
22)Sharma R. A. (2008) Educational Research , Design of Research and Report
Writing. Vinary Rakhaja .
23)Sharma R. S. (2006) Measurement and Evaluation Techniques, Educational
Perspectives. Jaipur : ABD Publishers .

Das könnte Ihnen auch gefallen