Sie sind auf Seite 1von 33

Scales of Reference for Testing of Proficiency

However, in the two decades since these ratings were first suggested,
applied linguists' views about what constitutes
a true 'zero' or 'perfect' points or even 'native speaker' or what an educated person's proficiency in a language means
have changed considerably. (Chandee, 1997)

McNamara (1995)
We cannot assume that native speakers will perform better than non-native speakers
in the tasks on our tests, as native and non-native speakers may not easily be distinguished in terms of the nonlinguistic performance capacities that are involved in the tasks (p. 165).

Face and Content Validity

2.4.5

Face and Content Validity

Proficiency scales have high face validity:


they look as if they are testing what they claim to be testing.

This is not validity in the technical sense(Anastasi, 1976, p. 139). Although the use of proficiency scales can help to guide teachers and learners in setting realistic goals,
they raise a number of difficult issues inherent in the nature of language proficiency
and with important implications for how it is measured
(Hyltenstam & Pienemann, 1985, p. 222).

Chandee 1997
What is important to note here is that language educators may lack relevant professional training (Chandee, 1997), And may either
(i) see language learning in terms of some - rather than all - aspects of language ability; (ii) may treat language ability and language proficiency as identical, believing that proficiency testing provides an accurate and reliable method of assessing communicative competence, and/or (iii) perceive no essential difference between proficiency testing and a range of other assessment procedures. (Chandee, 1997),

It is essential, therefore, to begin here by acknowledging the importance of teachers awareness of empirical research in this area. (Chandee, 1997),

2.4.6

The Problem of Validity in Language Testing

To be valid, a test must measure what it sets out to measure. For example, if listening and writing skills are to be tested then
the test items must involve listening and writing which may be in the form of, as Anastasi suggests,
listening to lectures and writing reports and both must contain authentic materials
(Anastasi, 1961, p. 138).

Accordingly, Anastasi's definition of content validity is


the systematic examination of the test content to determine whether it covers a representative sample of the behaviour domain to be measured.

This representative sample of the behaviour domain must closely reflect that domain in performance terms
(Anastasi, 1976, p. 134-135).

Many language test researchers have noted the inadequacy of


face validity, content relevance, and predictive utility of language tests
(Alderson, 1981; Bachman 1988; Bachman and Savignon 1986; Skehan, 1984; Stevenson, 1981, 1985; Upshur, 1979).

This poses problems for predictive validity, as, for example, Bachman (1990) notes,
an examination of predictive utility alone can largely ignore the question of what abilities are being measured
(p. 250-251).

The problem becomes evident with the use of, for example, multiple-choice grammar tests
to measure an individuals writing ability or for placing the individual in a writing course
(Bachman, 1990, p. 250-251).

Moreover, the conditions that determine the meanings of a speech act are complex and,
for the test to be valid, test writers must take this into consideration. (Chandee, 1997)

This is highlighted in Spolsky's (1986) comment that


we can study the pragmatic value and sociolinguistic probability of choosing...structures in different environments...but the complexity is such that we cannot expect ever to come up with anything like a complete list from which sampling is possible
(p. 150).

2.4.7 Authenticity of Communicative Language Tests


It is problematic to define the term authenticity
in terms of samples of 'real-life' language use
since language use depends on different
contexts, purposes, topics, participants, speech events, and so forth (Bachman, 1990, p. 690; Morrow, 1991, p. 114; Nunan, 1988, p. 99; Widdowson, 1990, pp. 44-47).

Chandee 1997
Any testing situation is, therefore, unnatural and thus not authentic. Language use in real life varies according to
speakers' linguistic and communicative competences, the contexts the language is used in, speakers and listeners' background knowledge and the cultural aspects both speakers and listeners bring with them.

This makes it difficult to distinguishing 'real-life' from 'nonreal-life' language use.

To make a test authentic,


it must, inevitably, be one that reproduces a real-life situation
in order to examine the students ability to cope with it
(Doy, 1991, p. 105)

and must measure the interaction between the language user and the discourse (Widdowson 197, p. 80) Moreover, pragmatic criteria must be present. That is, language tests...must require the learner to understand the pragmatic interrelationship of linguistic context and extralinguistic contexts
(Oller, 1979, p. 33).

This sort of authenticity is difficult to achieve in a test situation


where both the tester and the test taker know that the only purpose of the interaction is to obtain an assessment of the test taker's language performance
(Shohamy & Reves, 1985, p. 55).

Spolsky (1985) supports this view, maintaining that


however hard the tester might try to distinguish his purpose,
it is not to engage in genuine conversation with the candidate. . . but rather to find out something about the candidate in order to classify, reward, or punish him/her (p. 36).

Authenticity is, therefore, almost unachievable since, according to (Klein-Braley, 1985), if authenticity means real-life behaviour, then any language testing procedure is nonauthentic (p. 76). We are forced, therefore, with Spolsky, to conclude that testing is not authentic language behaviour,
that examination questions are not real, however much like real-life questions they seem(p. 36).

Furthermore, an examinee needs to learn the special rules of examinations before he or she can take part in them successfully (Spolsky, 1985, p. 36).

Though tests are, in general, inevitably not authentic in the full sense,
it should be possible to establish criteria which will approximate authenticity (Chandee, 1997).

Testing methods need, for example, to be modified so that they do not impinge on the language use observed (Chandee, 1997). and, as both Spolsky (1985) and Shohamy and Reves (1985) observe,
the unobtrusive observation of language use in 'natural situations' is one way of achieving at least a partial solution to the question of authenticity
(Shohamy & Reves, 1985, p. 55; Spolsky, 1985, p. 39).

Chandee, 1997
Some theorists suggest that one authentic and direct testing situation is to observe an individual over a period of time (Jones, 1985, p. 81). The main problem, of course, with extensive naturalistic observation of non-test language use is that it is
impractical, time-consuming, cumbersome and expensive, and
hence not feasible in most language testing situations.

Chandee, 1997
It is certainly impossible in a country which does not use the target language in every day life situations.

A different, but perhaps equally important problem pointed out by is the serious ethical question raised by using information obtained surreptitiously, without individuals' knowledge, for making decisions about them. Spolsky (1989),

Subjects who for various reasons do not test well


(who become over-anxious, or who are unwilling to play the special game of testing,
i.e. answering a question the answer to which is known better by the asker than the answerer)

will not be accurately measured by any kind of formal test: there will be a large gap between their test and their real-life performance
(Spolsky, p. 74).

This lack of authenticity in the material used in a test raises issues about the generalizability of results (Spolsky, 1985, p. 39).

To solve the dilemma of test authenticity,


it might be possible to argue that language tests have an authenticity of their own (Chandee, 1997),

authentic tasks are in principle impossible in a language testing situation,


and communicative language testing is in principle impossible"
(Alderson (1981a) suggests p. 48).

The problem of authenticity might be resolved by accepting Widdowsons (1978) definition of authenticity as
a characteristic of the relationship between the passage and the reader [that] has to do with appropriate response (p. 80).

This notion of authenticity is very similar to Oller's (1979) description of a 'pragmatic' test,
that is, any procedure or task that causes the learner to process sequences of elements in a language
that conform to the normal contextual constraints of that language, and which requires the learner to relate sequences of linguistic elements via pragmatic mapping to extralinguistic context (p. 38).

2.4.8 Constructing Language Proficiency Tests


Pimporn Chandee 1997

2.4.8

Constructing Language Proficiency Tests

When all of the problems of test authenticity are taken into account, it is clear that it is very difficult to construct a test that will be authentic (Chandee, 1997). Even so, even if the focus is on only one or a few components of language ability in a given testing context,
Bachman (1990) notes that there is a need to be aware of the full range of language abilities
when designing, developing and interpreting language test scores(p. 682). and that design must be informed by abroader view of language ability (p. 682).

This view mirrors those of Spolsky (1989) who suggests that


test authenticity may be achieved
if all the distinguishing characteristics or features within a finite open set,
consisting of a potentially infinite number of instances are used in test constructions" (p. 74).

However, this may be impractical (Chandee, 1997).

Chandee, 1997
Problems in creating good tests of language ability are unavoidable
since language tests can be used only as an indirect way of making inferences about a test taker's language ability.

Since language use involves the integration of multiple components and processes,
it is unlikely that there will ever be a language test that will measure all the components of language ability or even a test (Chandee), in

Bachman's (1990) terms, that will elicit language test performance


that is characteristic of language performance in non-test situations (p. 19).

To be similar to 'normal', or 'real-life' and 'nontest' language use,


test tasks essentially must include the following elements:
'pragmatic'
(Oller 1979, pp. 16-19, p. 27 and p. 33; 1991, p. 32; Spolsky, 1986, p. 150),

'functional'
(Bachman, 1990, p. 301),

'communicative'
(Bachman, 1990, p. 301; Canale & Swain, 1980, p. 31),

'performance'
(Bachman, 1990, p. 301) and

'authenticity'
(Bachman, 1990, p. 301; Morrow, 1991, p. 112, p. 114; Spolsky, 1989, p. 74).

Every instance of authentic language use involves several abilities. For example, for taxi drivers to operate in the international airport in Bangkok, they need to know not only the conversational discourse such as
a request by the customer to be taken to a particular place, an agreement by the driver to take the customer, or a request for directions followed by an agreement, and finally a statement of the fare by the driver, and a polite thank you upon receipt of the fare but also how to converse with the customer in the following situations
the fare as a point of bargaining, the fare depending on the weather, the time of day or night, the condition of the streets, traffic and so on
(Bachman, 1990, p. 312).

Chandee, 1997
Hence, Bachman points out,
there is probably an infinite variety of conversational exchanges that might take place between the taxi drivers and the customers (p. 312).

Furthermore, the very nature of language use is such that


discourse consists of interrelated illocutionary acts expressed in a variety of related forms.

If language test scores are to reflect several abilities,


and if authentic test tasks are, by definition, interrelated,
then measurement models must be appropriate for analysing and interpreting these abilities.

Das könnte Ihnen auch gefallen