Sie sind auf Seite 1von 7

FOR EDUCATIONAL USE ONLY

YESDL Eskiehirin LP Markas

http://www.yesdil.com/v1/pdf/testing_language_assessment.pdf
CHAPTER 2 PRINCIPLES OF LANGUAGE ASSESSMENT
Therere five testing criteria for testing a test: 1. 2. 3. 4. 5. Practicality Reliability Validity Authenticity Washback

1. PRACTICALITY A practical test is not excessively expensive, stays within appropriate time constraints, is relatively easy to administer, and has a scoring/evaluation procedure that is specific and time-efficient. Furthermore For a test to be practical administrative details should clearly be established before the test, students should be able to complete the test reasonably within the set time frame, the test should be able to be administered smoothly (prosedrle bomamal), all materials and equipment should be ready, the cost of the test should be within budgeted limits, the scoring/evaluation system should be feasible in the teachers time frame. methods for reporting results should be determined in advance. 2. RELIABILITY A reliable test is consistent and dependable. (Ayn test bir renciye farkl zamanlarda verildiinde ayn sonular alnabilmeli.) The issue of reliability of a test may best be addressed by considering a number of factors that may contribute to the unreliability of a test. Consider following possibilities: fluctuations in the student(Student-Related Reliability), in scoring(Rater Reliability), in test administration(Test Administration Reliability), and in the test(Test Reliability) itself.

Student-Related Reliability: Temporary illness, fatigue, a bad day, anxiety and other physical or psychological factors may make an observed score deviate from ones true score. Also a test-takers test-wiseness or strategies for efficient test taking can also be included in this category.

www.yesdil.com

-12-

YESDL Eskiehirin LP Markas Rater Reliability: Human error, subjectivity, lack of attention to scoring criteria, inexperience, inattention, or even preconceived (pein hkml) biases may enter into scoring process. Inter-rater unreliability occurs when two or more scorers yield inconsistent scores of the same test. (Deerlendirme sonucunda farkl eitmenlerin ayn test iin tutarsz skorlar vermesi.) Intra-rater unreliability is a common occurrence for classroom teachers because of unclear scoring criteria, fatigue, bias toward particular good and bad students, or simple carelessness. One solution to such intra-rater unreliability is to read through about half of the tests before rendering any final scores or grades, then to recycle back through the whole set of tests to ensure an even-handed judgement. The careful specification of an analytical scoring instrument can increase raterreliability. Test Administration Reliability: Unreliability may also result from the conditions in which the test is administered. Samples: Street noise, photocopying variations, poor light, variations in temperature, condition of desks and chairs. Test Reliability: Sometimes the nature of the test itself can cause measurement errors. o o Timed tests may discriminate against students who do not perform well on a test with a time limit. Poorly written test items (that are ambiguous or that have more than one correct answer) may be a further source of test unreliability.

3. VALIDITY Arguably, validity is the most important principle. The extent to which the assessment requires students to perform tasks that were included in the previous classroom lessons. How is the validity of a test established? There is no final, absolute measure of validity, but several different kinds of evidence may be invoked in support. In some cases it may be appropriate to examine the extent to which a test calls for performance that matches that of the course or unit of study being tested. In other cases we may be concerned with how well a test determines whether or not students have reached an established set of goals or level of competence. Still in some other cases it could be appropriate to study statistical correlation with other related but independent measures. Other concerns about a tests validity may focus on the consequences beyond measuring the criteria themselves - of a test, or even on the test-takers perception of validity. We will look at these five types of evidence below.

www.yesdil.com

-13-

YESDL Eskiehirin LP Markas

Content Validity: If a test requires the test-taker to perform the behaviour that is being measured, it can claim content-related evidence of validity, often popularly referred to as content validity. Example: If you are trying to assess a persons ability to speak a second language in a conversational setting, asking the learner to answer paper-and-pencil multiple choice questions requiring grammatical judgements does not achieve content validity. In contrast, a test that requires the learner actually to speak within some sort of authentic context does. Additionally, in order for content validity to be achieved in a test, one should be able to elicit the following conditions: Classroom objectives should be identified and appropriately framed. The first measure of an effective classroom test is the identification of objectives. Lesson objectives should be represented in the form of test specifications. In other words, a test should have a structure that follows logically from the lesson or unit you are testing.

If you clearly perceive the performance of test-takers as reflective of the classroom objectives, then you can argue this, content validity has probably been achieved. Another way of understanding content validity is to consider the difference between direct and indirect testing. Direct testing involves the test-taker in actually performing the target task. Indirect testing involves the test-taker in performing not the target task itself, but that is related in some way. Example: When you test learners oral production of syllable stress, if you have them mark stressed syllables in a list of written words, this will be an indirect testing, but if you require them actually produce target words orally then, this will be a direct testing. Consequently, it can be said that direct testing is the most feasible (uygun) way to achieve content validity in classroom assessment. Criterion-related Validity: It examines the extent to which the criterion of the test has actually been achieved. (Test edilen becerinin, konunun, bilginin gerekte ne kadar iyi kavranm olduu.) For example, a classroom test designed to assess a point of grammar in communicative use will have criterion validity if test scores are corroborated either by observed subsequent behaviour or by other communicative measures of the grammar point in question. (Ya test edilen kiinin test edildii konuyla ilgili davranlarnn gzlem yoluyla tutarll gzlenir. Ya da test edildii konuyla ilgili farkl bir teste tabi tutularak iki test sonucu arasnda tutarl bir sonuca varlp varlmad incelenir.) Criterion-related evidence usually falls into one of two categories:

www.yesdil.com

-14-

YESDL Eskiehirin LP Markas Concurrent (uygun, ayn zamanda olan) validity: A test has concurrent validity if its results are supported by other concurrent performance beyond the assessment itself. For example, the validity of a high score on the final exam of a foreign language course will be substantiated by actual proficiency in the language. (Testte elde edilen baarnn dilin gerek kullanmnda yanstlabilmesi.) Predictive (ngrsel, tahmini) validity: The assessment criterion in such cases is not to measure concurrent ability but to assess (and predict) a test-takers likelihood of future success. For example, the predictive validity of an assessment becomes important in the case of placement tests, language aptitude tests, and the like. (rnein daha baarl snflar elde etmek iin seviye tespit snavnda homojen gruplarn oluturulmas.)

Construct Validity: Virtually every issue in language learning and teaching involves theoretical constructs. In the field of assessment, construct validity asks, Does this test actually tap into the theoretical construct as it has been identified? (Yani bu test gerekten de test etmek istediim konu ya da beceriyi test etmede gerekli olan yapsal zellikleri tayor mu?) Example 1: Imagine that you have been given a procedure for conducting an oral interview. The scoring analysis for the interview includes several factors in the final score: pronunciation, fluency, grammatical accuracy, vocabulary use, and sociolinguistic appropriateness. The justification for these five factors lies in a theoretical construct that claims those factors to be major components of oral proficiency. So if you were asked to conduct on oral proficiency interview that evaluated only pronunciation and grammar, you could be justifiably suspicious about the construct validity of that test. Example 2: Lets suppose youve created a simple written vocabulary quiz, covering the content of a recent unit, that asks students to correctly define a set of words. Your chosen items may be a perfectly adequate sample of what was covered in the unit, but if the lexical objective of the unit was the communicative use of vocabulary, then the writing of definitions certainly fails to match a construct of communicative language use. Not: Large-scale standardized tests olarak nitelediimiz snavlar construct validity asndan pek de uygun deildir. nk pratik olmas asndan (yani hem zaman hem de ekonomik nedenlerden) bu testlerde llmesi gereken btn dil becerileri llememektedir. rnein TOEFL da oral production blmnn olmamas construct validity asndan byk bir engel olarak karmza kmaktadr.

www.yesdil.com

-15-

YESDL Eskiehirin LP Markas Consequential Validity: Consequential validity encompasses (iermek) all the consequences of a test, including such considerations as its accuracy in measuring intended criteria, its impact on the preparation of test-takers, its effect on the learner, and the (intended and unintended) social consequences of a tests interpretation and use. McNamara (2000, p. 54) cautions against test results that may reflect socioeconomic conditions such as opportunities for coaching (zel ders, zel ilgi). For example, only some families can afford coaching, or because children with more highly educated parents get help from their parents. Teachers should consider the effect of assessments on students motivation, subsequent performance in a course, independent learning, study habits, and attitude toward school work. Face Validity: Face validity refers to the degree to which a test looks right, and appears to measure the knowledge or abilities it claims to measure, based on the subjective judgment of the test-takers. (Snava girenlerin snav ne kadar dzgn, konuyla ilgili ve faydal bulduuyla ilgili) Face validity means that the students perceive the test to be valid. Face validity asks the question Does the test, on the face of it, appear from the learners perspective to test what it is designed to test? Face validity is not something that can be empirically tested by a teacher or even by a testing expert. It depends on the subjective evaluation of the test-taker. A classroom test is not the time to introduce new tasks. If a test samples the actual content of what the learner has achieved or expects to achieve, face validity will be more likely to be perceived. Content validity is a very important ingredient in achieving face validity. Students will generally judge a test to be face valid if directions are clear, the structure of the test is organized logically, its difficulty level is appropriately pitched, the test has no surprises, and timing is appropriate. To give an assessment procedure that is biased for best(iyi sonu elde etmek amacyla, bacy dvmeyip ona zm yedirmek iin) , a teacher offers students appropriate review and preparation for the test, suggests strategies that will be beneficial, and structures the test so that the best students will be modestly challenged and the weaker students will not be overwhelmed.

4. AUTHENTICITY In an authentic test the language is as natural as possible, items are as contextualized as possible, topics and situations are interesting, enjoyable, and/or humorous, some thematic (konuyla ilgili) organization, such as through a story line or episode is provided, tasks represent real-world tasks.

www.yesdil.com

-16-

YESDL Eskiehirin LP Markas Reading passages are selected from real-world sources that test-takers are likely to have encountered or will encounter. Listening comprehension sections feature natural language with hesitations, white noise, and interruptions. More and more tests offer items that are episodic in that they are sequenced to form meaningful units, paragraphs, or stories. 5. WASHBACK Washback includes the effects of an assessment on teaching and learning prior to the assessment itself, that is, on preparation for the assessment. Informal performance assessment is by nature more likely to have built-in washback effects because the teacher is usually providing interactive feedback. (Resmi snavlardan nce rencinin kendisine eki dzen vermesi iin yaplan ara snavlar washback etkisi yapar.) Formal tests can also have positive washback, but they provide no washback if the students receive a simple letter grade or a single overall numerical score. Classroom tests should serve as learning devices through which washback is achieved. Students incorrect responses can become windows of insight into further work. Their correct responses need to be praised, especially when they represent accomplishments in a students inter-language. Washback enhances a number of basic principles of language acquisition: intrinsic motivation, autonomy, self-confidence, language ego, interlanguage, and strategic investment, among others. One way to enhance washback is to comment generously and specifically on test performance. Washback implies that students have ready access to the teacher to discuss the feedback and evaluation he has given. Teachers can raise the washback potential by asking students to use test results as a guide to setting goals for their future effort.
What is washback? What does washback enhance? What should teachers do to enhance washback?

In general terms: The effect of testing on teaching and learning In large-scale assessment: Refers to the effects that the tests have on instruction in terms of how students prepare for the test In classroom assessment: The information that washes back to students in the form of useful diagnoses of strengths and weaknesses

Intrinsic motivation Autonomy Self-confidence Language ego Interlanguage Strategic investment

Comment generously and specifically on test performance Respond to as many details as possible Praise strengths Criticize weaknesses constructively Give strategic hints to improve performance

- END OF CHAPTER 2 -

www.yesdil.com

-17-

YESDL Eskiehirin LP Markas

CHAPTER

EXERCISE 1: Decide whether the following statements are TRUE or FALSE. 1. 2. 3. 4. 5. An expensive test is not practical. One of the sources of unreliability of a test is the school. Students, raters, the test, and the administration of it may affect the tests reliability. In indirect tests, students do not actually perform the task. If students are aware of what is being tested when they take a test, and think that the questions are appropriate, the test has face validity. 6. Face validity can be tested empirically. 7. Diagnosing strengths and weaknesses of students in language learning is a facet of washback. 8. One way of achieving authenticity in testing is to use simplified language.

EXERCISE 2: Decide which type of validity does each sentence belong to? a) Content validity b) Criterion related validity c) Construct validity d) Consequential validity e) Face validity 1. It is based on subjective judgment. ---------------------2. It questions the accuracy of measuring the intended criteria. ---------------------3. It appears to measure the knowledge and abilities it claims to measure. --------------------4. It measures whether the test meets the objectives of classroom objectives. --------------------5. It requires the test to be based on a theoretical background. ---------------------6. Washback is part of it. ---------------------7. It requires the test-taker to perform the behavior being measured. --------------------8. The students (test-takers) think they are given enough time to do the test. --------------------9. It assesses a test-taker's likelihood of future success. (e.g. placement tests). --------------------10.The students' psychological mood may affect it negatively or positively. --------------------11.It includes the consideration of the test's effect on the learner. ---------------------12.Items of the test do not seem to be complicated. ---------------------13.The test covers the objectives of the course. ---------------------14.The test has clear directions. ---------------------EXERCISE 3: Decide with which type of reliability could each sentence be related? a) Student-related reliability b) Rater reliability c) Test administration reliability d) Test reliability 1. There are ambiguous items. 2. The student is anxious. 3. The tape is of bad quality. 4. The teacher is tired but continues scoring. 5. The test is too long. 6. The room is dark. 7. The student has had an argument with the teacher. 8. The scorers interpret the criteria differently. 9. There is a lot of noise outside the building.

www.yesdil.com

-18-

Das könnte Ihnen auch gefallen