Sie sind auf Seite 1von 13

Language Testing in Theory and Practice Aprendizaje y Enseanza de la Lengua Extranjera I

Some warm up expressions:

Image in the mirror syndrome: the fear of the teacher of doing exams because their teaching is reflected in the results of the tests, because a test is a feedback of the teacher too. Evaluation: You take into account everything on a teaching program. Assessment: It takes into account all the competences (all the learners skills and knowledge). Testing: It takes into account the formal knowledge of what the students have learnt at a particular time.
Evaluation

Assement

Testing

The main purpose of testing is getting feedback of what the students are learning. Mode: The most common value among a group. Mean: It is the central tendency of a collection of numbers taken as the sum of the numbers divided by the size of the collection.

Analitic Scoring
Scoring that evaluators use to rate or score the separate parts or traits (dimensions) of an examinee's product or process first, then sum these part scores to obtain a total score. A piece of writing, for example, may be rated separately as to ideas and content, organization, voice, choice of words, sentence structure, and use of English mechanics. These separate ratings may then be combined to report an overall assessment. It is more objective

Holistic Scoring
The word "holistic" means looking at the whole rather than at parts; holistic scoring is a procedure for evaluating essays as complete units rather than as a collection of constituent elements. Holistic scoring of student writing has three purposes: to enable valid, quick and reliable evaluation of student essays. It is the general impressions you get. It is better for beginners.

Alfonsi Arcos Rus

Pgina 1

Language Testing in Theory and Practice Aprendizaje y Enseanza de la Lengua Extranjera I

1. Kind of Tests: 1.1. Purpose: 1.1.1. Proficiency: It is a test that tries to measure the level of the student in a given language, regardless to previous experiences. 1.1.2. Diagnostic: They are used to diagnose a particular language or to check on students progress in learning particular elements of the course, and also to discover the weaknesses of the students. They help learners to know if they need further teaching. The exam is done individually, but in order to obtain a good result, we need to analyze in depth the results of a large group, because that way, we can see if many of them have difficulties or if they have achieved the knowledge instructed. 1.1.3. Achievement: A test intended to evaluate the learners knowledge according to a given syllabus. There are two types: Final achievement: It focuses on the syllabus course. Progress achievement: It focuses on the different parts of the syllabus course. 1.1.2. Aptitude: It takes place before the foreign language course to check the strength and weaknesses of students, taking into account factors like intelligence, age, motivation, memory, phonological sensitivity and grammatical patterns. 1.1.3. Placement: It obtains information of students abilities in order to place them in the most adequate class for their level. It actually helps to place the students at a stage of the teaching programme most appropriate to their abilities. 1.2. Frame of Reference: 1.2.1. Norm Reference: It relates to one learners performance to that of other students. Teachers are not told to directly what the students are capable of doing with the language. 1.2.2. Criterion Reference: Students are classified according to whether they are able to perform some tasks. What they can actually do in the language is measured.

Alfonsi Arcos Rus

Pgina 2

Language Testing in Theory and Practice Aprendizaje y Enseanza de la Lengua Extranjera I

1.3. Scoring procedure: 1.3.1. Discrete vs. integrative: DP testing refers to the testing of one particular element at a time. It requires the candidate to combine many language elements in the completion of a task. Discrete tests favour objective scoring, for example, grammatical structures. They are always indirect. Integrative tests favour subjective scoring like writing a composition, dictations and making notes while listening to a lecture. They tend to be direct. 1.3.2. Objective vs. subjective: It is related to the way of questioning. Objective contains closed questions or multiple choice questions. It does not require any judgment on the part of the scorer Subjective contains open questions. It needs judgment from the scorer. There are different levels of subjectivity. For example, the scoring of a composition may be considered more subjective than the scoring of short answers in response to a reading passage. 1.4. Content: 1.4.1. Direct testing: It is as authentic as possible. Candidates have to perform the skill to be measured, such as in writing, where the students or candidates have to write a composition. A test is said to be direct when it tests the skills we wish to measure. It is used to test production skills, such as writing or speaking. It is not so good for measuring reading or listening, because it is hard to test understanding. There is likely to be a helpful Washback effect. 1.4.2. Indirect testing: It measures the ability which underlies the skills which we are interested in. Eg.:

testing writing

candidates must identify mistakes in a text. candidates use a sample text and underline matching words.

testing pronunciation

candidates use pen and paper, matching similar sounds without speaking.

Alfonsi Arcos Rus

Pgina 3

Language Testing in Theory and Practice Aprendizaje y Enseanza de la Lengua Extranjera I

2. Testing requirements: 2.1. Content validity: The test must be coherent with the teaching itself. You have to test the contents you have taught in class. For example, if you have taught the uses of the verb to be, your exam will have to deal with exercises focused on the verb to be and it has to contain a proper sample of the relevant structures. In order to know if a test has content validity, we need a specification of the skills and structures it is meant to cover. The greater a tests content validity, the more likely it is to be an accurate measure of what it is supposed to measure. 2.2. Construct validity: A test is said to have construct validity if it measures just the ability it is supposed to measure, according to the methodology you have used when teaching. The word construct refers to any underlying ability which is hypothesized in a theory of language ability. For example, the ability to read involves a number of sub-abilities (such as guessing the meaning of unknown words from the context) and we have to take into account all of them. 2.3. Concurrent or empirical validity: It is established when the test and the criterion are administered at about the same time. This validity is obtained as a result of comparing the results of the test with the results of some criterion measure such as: Measure the An existing test, known or believed to be valid and given at the same tests time. concurrent The teachers ratings or any other such form of independent validity assessment given at the same time. The subsequent performance of testees on a certain task measured by Measure the some valid test. tests predictive The teachers ratings or any other such form of independent validity. assessment given later.

2.4. Face validity: A test is said to have face validity when it looks like a test. It is advisable to show the test to other colleagues in order to have different viewpoints of the test itself and this could also show us some mistakes that we may not find out, such as absurdities or ambiguities. Some language test may not be face validity depending on the countries you use them. The students motivation is maintained if the test has face validity.

Alfonsi Arcos Rus

Pgina 4

Language Testing in Theory and Practice Aprendizaje y Enseanza de la Lengua Extranjera I

2.5. Reliability: Reliability is a necessary characteristic of any good test. All tests must first be reliable as a measuring instrument. If the test is administered to the same candidates on different occasions and it produces different results, then the test is said to be no reliable. This is commonly known as test/re-test reliability. If the test is marked by different examiners and they get similar marks, then is known as mark/ re-mark reliability. In order to be reliable, a test must be consistent in its measurements: The larger the sample, the greater the probability that the test as a whole is reliable. A way of discovering the test reliability is to administer the same test at different groups or at different times, especially those tests of oral production and listening comprehension. The instructions must be clear. One of the most important factors affecting reliability is the scoring of the test. Objective tests overcome this problem but reliability in subjective tests is much more difficult to prove. In order to make them reliable, multiple marking or the use of rating scales are used. The candidates skills are measured separately: listening, speaking, writing, reading. 2.6. Practicallity/ Administration: A test must be practicable, which means that it must be fairly straight forward to administer. We have to take into account the time spent on the administration of the test, the reading of the test instructions, the collection of the answer sheets, etc. Another important factor to take into account is the place where the candidates have to answer the questions of the test, in the tests sheet or in a different sheet of paper. The use of a separate sheet of answer may be empowered when testing a large group of candidates. We must also bear in mind the equipment facilities of the centre where the test is going to take place, e.g.: there is no sense in recording voices or dialogues on tape if there is no cassette player in the centre. 2.7. Discrimination: An important feature of a test is the capacity to discriminate among the different candidates and to reflect the differences in the performance of the individuals in the group. The results of the test are examined to determine the extent to which it discriminates between individuals who are different. Candidates must obtain different percentages; otherwise there will be no discrimination. It is useful to find out which elements of the teaching syllabus have been mastered and which of them have not. It can also be used to assess relative abilities and locate areas of difficulties. Precision is really difficult, rather impossible, to obtain. This lack of precision is known as margin of error in a test.

Alfonsi Arcos Rus

Pgina 5

Language Testing in Theory and Practice Aprendizaje y Enseanza de la Lengua Extranjera I

3. Testing Methods: 3.1. Multiple Choice Tests: Multiple choice is a form of assessment in which respondents are asked to select the best possible answer (or answers) out of the choices from a list. 3.2. Cloze Procedure: Cloze procedure is a technique in which words are deleted from a passage according to a word-count formula or various other criteria. The passage is presented to students, who insert words as they read to complete and construct meaning from the text. This procedure can be used as a diagnostic reading assessment technique. 3.2.1. Traditional Cloze: Words are deleted at regular intervals, typically every seventh or eighth word. The more frequent the deletions, the more difficult the test. 3.2.2. Modified Cloze: Words are deleted at regular intervals, depending on what the testers want to test. All the deletions may test the same language point (e.g.: past forms) or they may test different but specific language points that the testers are concerned with. 3.2.3. Multiple-Choice Cloze: An easier version of either traditional or modified cloze where the learner is offered choices from which to select his/her answers. 3.2.4. Authentic Cloze: A version of traditional cloze in which the tester simply cuts a number of letters off from the beginning or end of each line, making the text look as if it has been clipped in a photocopier. Again, the more letters that are cut off, the more difficult the test. 3.2.5. C-Testing: The most recent variation in which the second half of every second word is deleted (words with odd numbers of letters have the extra letter supplied). Giving the first half of the word makes the test easier, but deleting every second word restores the difficulty. 3.3. Guessing from context: Candidates have to guess the meaning of underlying words from the context.

Alfonsi Arcos Rus

Pgina 6

Language Testing in Theory and Practice Aprendizaje y Enseanza de la Lengua Extranjera I

4. Designing classroom tests: Construction of a classroom tests: 4.1. Planning stage: (mirar ppt) 4.1.1. Learners need of a test. 4.1.2. Specification and sampling. 4.1.3. Construct and content validity, reliability, practicality. 4.2. Development stage: 4.2.1. Item construction (Input

+ Method).

Compilation of inputs or texts. Methods. Chanel. Strategy. Levels of difficulty. Avoidance of overlapping, tricky questions and ambiguity. 4.2.2. Instructions. 4.2.3. Design layout/ format. 4.2.4. Consideration of scoring. 4.3. Control and operational stage: 4.3.1. Administration of the test. 4.3.2. Performance of statistical tests. 4.3.3. Washback (pedagogical effects). A test will influence teaching and learning. A test will influence what/how teachers teach and what/how learners learn. A test will influence the RATE and SEQUENCE of teaching/ learning. A test will influence the DEGREE and DEPTH of teaching/learning. A test will influence the ATTITUDES to the content, method, etc. of teaching/ learning. Tests that have important consequences will have Washback, those that dont, wont. Tests will have Washback on ALL teachers and learners. 4.3.4. Presentation of test to students (correction).

Alfonsi Arcos Rus

Pgina 7

Language Testing in Theory and Practice Aprendizaje y Enseanza de la Lengua Extranjera I

5. Approaches to language testing: Spolsky (1975) identified three periods of language testing: the pre-scientific, the psychometric-structuralist and the psycholinguistic-sociolinguistic. The pre-scientific period: Testing in the pre-scientific Era did not rely on linguistic theory, and reliability was considered less important than the production of a test that felt fair. The psychometric-Structuralist period: The name was intended to reflect the joint contribution of the structural linguists, who identified elements of language s/he wanted testing, and the psychometrics, who produced objective and reliable methods of testing the candidates control of those elements. The psycholinguistic-Sociolinguistic period: By the 1970s discrete point testing was no longer felt to provide a sufficient measure of language ability, and testing moved into the psycholinguisticsociolinguistic era, with the advent of global integrative testing. Oller (1979, cited in Weir 1990) argued that global integrative testing, such as cloze tests, which required candidates to insert suitable words into gaps in a text, and dictation, provided a closer measure of the ability to combine language skills in the way they are used for actual language use than discrete point testing. The Communicative Period: The fact that discrete point and integrative testing only provided a measure of the candidates competence rather than measuring the candidates performance brought about the need for communicative language testing (Weir 1990). Before we look at the features which distinguish this form of testing, we will outline the models of communicative competence on which it is based. According to Spolsky (1989:140), Language tests involve measuring a subjects knowledge of,and proficiency in, the use of a language. A theory of communicative competence is a theory of the nature of such knowledge and proficiency. One cannot develop sound language tests without a method of defining what it means to know a language, for until you have decided what you are measuring, you cannot claim to have measured it. The main implication this model had for communicative language testing was that since there was a theoretical distinction between competence and performance, the learner had to be tested not only on his/her knowledge of language, but also on his/her ability to put it to use in a communicative situation (Canale and Swain, 1980).
Alfonsi Arcos Rus Pgina 8

Language Testing in Theory and Practice Aprendizaje y Enseanza de la Lengua Extranjera I

For Shohamy, Donitsa-Schmidt, and Ferman (1996), washback is "the connections between testing and learning" (p. 298); to Gates (1995), it is "the influence of testing on teaching and learning" (p. 101); and for Messick (1996) washback is "the extent to which the introduction and use of a test influences language teachers and learners to do things they would not otherwise do that promote or inhibit language learning" (p. 241). Clearly then, the washback is roughly speaking the effect of testing on the teaching and learning processes. An example that often comes up in Japan is the effect of the university entrance examinations in Japan on high school language teaching and learning.

Washback, whether it is positive or negative, can be a potential boon or threat to language teaching curriculum (broadly defined) because, through washback, a test can steer a curriculum in one direction or another (in terms of teaching, course content, course characteristics, and/or class time) either with or against the better judgment of the administrators, teachers, students, parents, etc.

Thinking about washback can also lead us to think about the consequential basis for test validity in terms of the social consequences of test use and the values implications of test interpretations.

Alfonsi Arcos Rus

Pgina 9

Language Testing in Theory and Practice Aprendizaje y Enseanza de la Lengua Extranjera I

6. Testing grammar and vocabulary Grammar Why test grammar? Necessary --> skeleton of the language Lack of grammatical ability --> obstacle to skills performance format of grammar tests is familiar Large # of items can be administered and scored in a short period time There is good cause to include a grammatical component in achievement, placement, and diagnostic tests of teaching institutions. Recommendations Make items sound as natural as possible Contextualize items Be clear about what each item is testing and award points for that only. Tests Multiple choice, error correction, rearrangement items, completion items, transformation items, items involving the changing of words, "broken sentence" items, pairing and matching etc.. Vocabulary Why test vocabulary? Knowledge of vocab is essential to the development and demonstration of linguistic skills. Recommendations Contextualize Decide whether test active production or passive recognition Lexical items can be selected from: syllabus, textbook, reading material, students free writing. Techniques Recognition: multiple choice, definitions, gap-filling, matching, word formation, synonyms, sets (association of words).
Alfonsi Arcos Rus Pgina 10

Language Testing in Theory and Practice Aprendizaje y Enseanza de la Lengua Extranjera I

Production: pictures, definitions, gap-filling, sets (association of words), synonyms, completion items. 7. Testing skills Listening Recommendations Usually involves a spoken stimulus. Materials should be natural and authentic Good recording quality No pressure on candidates Memorization should be avoided minimal demands on productive skills Texts should be shorter and questions easier, as respects to reading. Techniques Multiple choice, short answer, completion, information transfer (labeling pictures, completing forms, showing routes on a map...), note-taking, dictation. Speaking Recommendations Make test as long as feasible and plan it carefully Give candidates as many fresh starts as needed. Interviewers should be selected and trained. (also have a second tester while interviewing). Only tasks and topics that would be expected of candidates knowledge. Interview should be carried out in quiet room. Put candidates at their ease. Collect enough relevant info. Dont talk too much

Alfonsi Arcos Rus

Pgina 11

Language Testing in Theory and Practice Aprendizaje y Enseanza de la Lengua Extranjera I

Formats Interview, interaction with peers, response to tape recordings, Elicitation techniques Questions and requests for information, pictures, role-play, interpreting, discussion. It is not recommended --> prepared dialogue, or reading aloud. Marking Look at pg 1. holistic vs. analytical marking.
Note: Neil McLaren says that we should bear in mind the following aspects when evaluating oral production. Word Order, Vocab, Pronunciation, fluency, appropriateness of expression, tone, accuracy, capacity to reason, initiative in asking for info, accent, range of expressions, flexibility, size.

Reading Levels of comprehension Literal, Inferential, evaluative Types of texts to use Textbook, novel, magazine, newspaper, academic journal, letter, timetable, poem.. Recommendations Choose texts of appropriate length. For acceptable reliability, include as many passages as possible in tests. Choose texts which will interest candidates. Avoid texts made up of candidates general knowledge. Don't choose texts culturally laden. Don't use texts which candidates have already read. Don't ask students to write too much. Formats Multiple choice, short answer, info transfer with help of visuals, completion exercises, identifying order of events, referents, guessing meaning.
Alfonsi Arcos Rus Pgina 12

Language Testing in Theory and Practice Aprendizaje y Enseanza de la Lengua Extranjera I

Writing Recommendations Instructions should not be too long Set as many tasks as feasible. Test only writing ability. Restrict candidates by defining tasks well Avoid over-overcorrection and negative marking. Look for strengths and weaknesses. Types of exercises Controlled writing (you set one topic only), guided writing (you give ideas on what to write about), free writing. Marking Analytical vs. Holistic.

Alfonsi Arcos Rus

Pgina 13

Das könnte Ihnen auch gefallen