Sie sind auf Seite 1von 71

Do all tests have

washback?
J Charles Alderson
Lancaster University

Tests

Washback
Diagnosis

A four-letter word
Elicitation device: getting somebody to
perform their competence
Description of performance
Procedure for making judgements
based on criteria
Measurement, not the same as
assessment
Not observation

Tests whose results are seen rightly or


wrongly by students, teachers,
administrators, parents or the general
public, as being used to make important
decisions that immediately and directly
affect them.
(Madaus, 1988)

Relates to the effects of tests on


classroom practices particularly
teaching and learning.
Can be positive or negative, to the extent
that it either promotes or impedes the
accomplishment of educational goals held
by learners and/or programme personnel.
(Bailey, 1996)

Mismatch between the stated goals of


instruction and the focus of assessment

May lead to the abandonment of


instructional goals in favour of test
preparation

Forces teachers to do things they would


not normally do

If a test has positive washback,

there is no difference between teaching


the curriculum and teaching to the test.
(Weigle & Jensen, 1997, p. 205)

Tests can be a powerful, low-cost means of


influencing the quality of what teachers
teach and what learners learn at school.
(Heyneman & Ransom, 1992)

psychometric imperialism

Leads to cramming
Narrows the curriculum
Focuses the attention on skills that are easy
to test
Restricts teacher and student creativity
Demeans the professional judgement of
teachers

A test will influence teaching.


A test will influence learning.
A test will influence what teachers teach.
A test will influence what learners learn.
A test will influence how teachers teach.
A test will influence how learners learn.
(Alderson and Wall, 1993)

A test will influence the rate and sequence,


and the degree and depth of teaching.
A test will influence the rate and sequence, and
the degree and depth of learning.
A test will influence attitudes to the content,
method, etc of teaching and learning.
Tests will have washback on all teachers and
learners.
Tests will have washback on some teachers and
some learners but not on others.
(Alderson and Wall, 1993)

Wall & Alderson 1993


different amounts of washback on content,
methods, means of assessment
Alderson & Hamp-Lyons 1996, Watanabe 1996
teachers are affected by tests in different ways
Shohamy, Donitsa-Schmitt & Ferman, 1996
the washback of tests can change over time
Tsagari, 2006 The complexity of washback:
Participants perceptions, material design and
classroom applications
Virtually all studies relate to high-stakes tests

Curriculum
contents of curriculum, timetabling
Teaching materials
choice of textbooks, use of past papers, teachermade materials
Teaching methods
choice of methods, teaching of test-taking skills
Attitudes and feelings
of learners and teachers
Learning
Do test results improve?
Does learning improve?
(Spratt, 2005)

The exam
Teacher beliefs
Teacher attitudes
Teacher training
Resources
The school
Cultural factors
(Spratt, 2005)

Very under-developed and under-theorised


in language testing and teaching
Focus on learners strengths and weaknesses;
on their prediction, even explanation
Diagnosis requires a better understanding of
what the nature might be of strengths and
weaknesses in particular language skills
There are very few diagnostic SFL tests
(Alderson 2005, 2007; Huhta 2008)

NOT
NOT
NOT
NOT
NOT

Proficiency
Achievement
Progress
Placement
Aptitude

BUT all the above could yield useful


diagnostic information
HOWEVER, better is diagnosis by design

Bachman, 1990: 60
Virtually any test has some potential for
providing diagnostic information

But he then goes on to say:


When we speak of a diagnostic test..we are
generally referring to a test that has been
designed and developed specifically to provide
detailed information about the specific content
domains that are covered in a given program or
that are part of a general theory of language
proficiency. Thus, diagnostic tests may be
either theory or syllabus-based

Yet Alderson (2005: 6) points out:

It would appear that we have a problem


here: diagnosis (is said to be) useful, most
language tests are (said to be) usable for
diagnosis, it is common for universities to
administer diagnostic tests (actually
placement tests), and yet diagnostic tests
are rare!
Two examples of diagnosis in action and in
research into theory:
DIALANG and DIALUKI

DIALANG
Diagnosis in action

Diagnosis by design

Computer-based diagnostic language testing


system

14 European languages

Delivers tests across the Internet

Supports language learners

Institutional or private use, free of charge

Still widely used throughout Europe and


beyond, 8 years after launch

DIALANG is an application of the Common


European Framework of reference
DIALANG uses

Common European Framework


scales
self-assessment statements (modified)

DIALANG provides some evidence of their


validity

to

provide language users and


learners with diagnostic
information about their strengths
and weaknesses and to help them
to find ways of improving their
proficiency

to raise the learners awareness of their


own language proficiency, of language
learning and proficiency in general, and of
the role that language tests might have in
the learning process
this takes place through the use of selfassessment and various kinds of feedback
and information services

first

large-scale system for


diagnosis / feedback rather than
certification
on-line, Internet-delivered,
universally available, not
restricted to a particular place or
time

available for all kinds & levels of


learners & can support them
throughout their language learning
career
multi-lingual (14 languages):

tests
interface (instructions, help screens)
self-assessment & advice / feedback

ASSESSMENT PROCEDURE
1
Client
enters
D
I
A
L
A
N
G

2
Vocabulary
Size
Placement
Test

3
Selection
of section:

reading
writing
listening
structures
vocabulary

ASSESSMENT PROCEDURE
4
Selfassessment

5
Responding to
tasks

6
F
e
e
d
b
a
c
k

7
EXIT
Selection

Another
section/
language

Goodbye!

Reading Comprehension (CEFR)


Listening Comprehension (CEFR)
Writing (CEFR)
Structures
Vocabulary
no overall section (nor grade & feedback)
from beginners to advanced

Danish
Dutch
English
Finnish
French
German
Greek

Icelandic
Irish
Italian
Norwegian
Portuguese
Spanish
Swedish

VSPT

results (and self-assessment)

explanatory feedback

advisory feedback

item review

score band and description

CEFR scales and report on self assessment


Why self-assessment may not match test result
What you can do and how to progress, based on
CEFR

http://www.lancs.ac.uk/researchenterprise/dialang/about

Validity relates to what the test is


intended to measure
Design for diagnosis, dont retrofit
Diagnosis should relate to future
treatment
Treatment should be teachable or
learnable
Diagnosis should be based on theory:
what we know about what affects
learning

Informed by SLA research


Focus on weaknesses rather than strengths
Enable a detailed analysis and report
Give detailed feedback which can be acted on
Provide immediate results
Involve little anxiety
Based on content covered in instruction
Less authentic
Discrete-point rather than integrated
More likely to focus on low-level language skills
than higher-order skills which are more integrated;
Likely to be enhanced by being computer-based.

DIALUKI
Understanding Diagnosis
Researching Diagnosis

Diagnosing Reading and Writing in a Second


or Foreign Language
Research project 2010-2013: work in
progress
Funded by the Academy of Finland, the
University of Jyvskyl and the UK Economic
and Social Research Council (ESRC)
Cooperation between language testers, other
applied linguists and psychologists (L1
reading)

The main research questions:

Can different L1 and L2 linguistic,


psycholinguistic, motivation and background
measures predict difficulties in SFL R/W ?
How does SFL proficiency in R/W develop in
psycholinguistic and linguistic terms?
Which features or combinations of features
characterise different CEFR proficiency levels?

Study 1

Study 2

Study 3

A cross-sectional
study with 850
students

Longitudinal study

Intervention study

Data collection:
2010-11

Data collection
2010-13

Data collection
2012-13

Exploring the value


of a range of L1 & L2
measures in
predicting L2 reading
& writing, in order to
select the best
predictors for further
studies

The development of
literacy skills, and
the relationship of
this development to
the diagnostic
measures.

The effects of
training on SFL
reading and writing

Finnish-speaking
learners of English as FL
primary school 4th grade
(age 10; N = 210)
lower secondary school,
8th grade (age 14; N=
208)
Gymnasium 2nd year
students (age 17; N=
218)

Russian-speaking
learners of Finnish as SL
primary school (3-6th
grade; N= 186)
lower secondary school
(7-9th grade; N= 78)

Independent predictor variables in L1 and FL

Instruments in DIALUKI STUDY


ONE

English as a Foreign Language


Group tasks

QUESTIONNAIRES 4th grade

(age 1011)

8th grade
(age 1415)

Gymnasium
(age 1718)

Parents
questionnaire

Students
questionnaire

Motivational
questionnaire
Self assessment:
reading &
writing L1 (2 x
18 items)
Self assessment:
reading &
writing L2 (2 x
18 items)

49 statements

58 statements

58 statements

DIALANG

DIALANG

DIALANG

DIALANG

4th grade
(age 1011)

8th grade
(age 1415)

Gymnasium
(age 1718)

Reading L1

ALLU (1 text, 12 items)

PISA 2009 (3 texts, 11 items)

PISA 2009 (3 texts, 11 items)

Reading L2

Pearson Young Learners


(20 items)

Writing L1

An opinion: Mobile phones /


Internet

Pearson PTE General (25


items)
Dialang (30 items)
An opinion: School food/
Summer job

Pearson PTE General (25


items)
Dialang (30 items)
An opinion: School food/
Summer job

A complaint

A complaint

LINGUISTIC
MEASURES

Writing L2

A message to a friend

How do you travel?


An opinion: Mobile phones /
Boys and girls on different
classes

An article
An opinion: Mobile phones /
Boys and girls on different
classes

Vocabulary L1

Dialang
(75 items)

Dialang
(75 items)

Dialang
(75 items)

Vocabulary L2

Selected from 1000 most


common English words
(60 items)

Selected from 3000 most


common English words
(90 words)

Selected from 5000 most


common English words +
AWL (120 words)

Segmentation L1

Text: Isois (Grandpa) 36


items

Text: Lilli
(73 items)

Text: Lilli
(73 items)

Segmentation L2

Text: Little pigs


(51 items)

Text: Australia
(59 items)

Text: Coffee
(71 items)

NMI test
(100 words/3min 30 sec)

NMI test
(100 words/3min 30 sec)

10 units with 311 words (52


words)

12 units with 311 words (77


words)

Typing errors L1
Dictation L2

12 units with 24 words (32


words)

Instruments in DIALUKI STUDY


ONE

English as a Foreign Language


Individual tasks (1)

4th grade
(age 1011)

8th grade
(age 1415)

Gymnasium
(age 1718)

Backwards digit span L1

28 digits,
14 items
(numbers 19)

28 digits,
14 items
(numbers 19)

28 digits,
14 items
(numbers 19)

Backwards digit span L2

25 digits,
8 items
(numbers 16)

25 digits,
8 items
(numbers 16)

25 digits,
8 items
(numbers 16)

Rapidly presented words


L1

14 words
(28 letters)

14 words
(28 letters)

14 words
(28 letters)

Rapidly presented words


L2

8 words
(24 letters)

12 words
(29 letters)

12 words
(29 letters)

List reading L1

105 words
time limit 60 sec
(Lukilasse)

105 words
time limit 60 sec
(Lukilasse)

105 words
time limit 60 sec
(Lukilasse)

List reading L2

105 words
time limit 60 sec

105 words
time limit 60 sec

105 words
time limit 60 sec

Non-word reading L1
(mlkenti)

10 non-words with
34 syllables

10 non-words with
34 syllables

10 non-words with
34 syllables

PSYCHOLINGUISTIC
AND COGNITIVE
TASKS

Non-word reading L2
(kipthirm)
Non-word repetition L1
(vrelyytti)

10 non-words (Snowling et al 10 non-words (Snowling et al


1996: Graded Nonword
1996: Graded Nonword
Reading Test )
Reading Test )
10 non-words with 25
syllables

10 non-words with 25
syllables

10 non-words with 25
syllables

4th grade
(age 1011)

8th grade
(age 1415)

Gymnasium
(age 1718)

Non-word repetition L2
(bassodoke)

10 non-words (selected from


Gupta et al 2005)

10 non-words (selected from


Gupta et al 2005)

10 non-words (selected from


Gupta et al 2005)

Non-word spelling L1
(peunumiile)

12 non-words with 4 syllables 12 non-words with 4 syllables 12 non-words with 4 syllables

Phoneme deletion L1
(hamsa hama)

12 non-words with 13
syllables

12 non-words with 13
syllables

12 non-words with 13
syllables

Phoneme deletion L2
(nolcrid olcrid)

8 non-words

10 non-words

10 non-words

Common unit L1 (lauhkua


- terike)

10 pairs of non-words

10 pairs of non-words

10 pairs of non-words

10 pairs of non-words

10 pairs of non-words

PSYCHOLINGUISTIC
AND COGNITIVE
TASKS

Common unit L2 (filk


maf)
Rapid automatic naming
L1

Mixed list of numbers, letters


and colours (50 items)

Mixed list of numbers, letters


and colours (50 items)

Mixed list of numbers, letters


and colours (50 items)

Rapid automatic naming


L2

Mixed list of numbers, colors


and objects (30 items)

Mixed list of numbers, letters


and colours (50 items)

Mixed list of numbers, letters


and colours (50 items)

Example Instruments

Reading rapidly presented words

***

Reading rapidly presented words

day

Reading rapidly presented words

%&#

Cognitive and psycholinguistic


tasks (2)
RAN Rapid Automatized Naming L1 and FL
Mixed stimuli:
numbers, letters and colours (L1)
numbers, objects and colours (FL)

Backward digit span memory test


in L1 and FL
repeat the numbers you hear but backwards

Rapid reading (aloud) of a list of real L1


words
read as many as you can in one minute

Non-word reading task


L1
1. viepere
2. larvaanto
3. mlkenti
4. seivolssi
5. euksatus

6. kylmnsi
7. hiemakkola
8. sertsapeivo
9. vaastiloima
10. ahkontalsi

L2
1. hast
2. mosp
3. prab
4. gromp
5. trolb

6. tegwop
7. molsmit
8. twamket
9. hinshink
10. kipthirm

Non-word repetition task


L1
1. seitu

6. peunivatna

2. ronksa

7. ysipulentti

3. minksakka

8. restomeliitti

4. kletsoma

9. plotiskntsingis

5. vrelyytti

10. intjirinanttiin

L2
1. bassim

6. kotiesote

2. peggut

7. doosennane

3. bipup

8. keegulol

4. gaypoom

9. beenodoofop

5. bassodoke

10. daysomaysice

Common Unit task


L1
1. lauhkua terike
2. mustele kyhinty
3. vommiras thmykkyyn
4. tookselo murlain
5. vapi lumpe

6. vaaso leikua
7. hirattu vnkki
8. kanttuuso vyyrt
9. aamestus hilpialli
10. tlkys angilme

L2
1. mip pank
2. auk honch
3. skey twisp
4. brang peb
5. kelpit membro

6. madast wordle
7. prinkle mapgom
8. sloskon nagar
9. larsk mambron
10. filk maf

Phoneme deletion task


L1
1. Tauk auk
2. Hok ok
3. Peuk euk
4. gooK goo
5. hamSa hama
6. pokRi poki

7. mesTo meso
8. puLke puke
9. kelaMpa kelapa
10. makalTo makalo
11. sinepTe sinepe
12. halneSko halneko

L2
1. kisP kis
2. Drant rant
3. Apren pren
4. balraS balra
5. Nolcrid olcrid

6. stanseRt stanset
7. dockOAn dockn
8. pronaTE prona
9. driggLE drigg
10. norCH nor

Segmentation task in L2 (4th graders


version)
Example:
|thepigsweresohappytheysangthissong|
|the|pigs|were|so|happy|they|sang|this|song|
Task:
|sothenextdaythethreelittlepigslefthomethefirstpigmadeahomef
romstrawthesecondpig|
|madeahomefromsticksbutthethirdpigwascleverhemadehishom
efrombricksonedaythebig|
|badwolfcametothestrawhouseheknockedonthedoor|

L2 Vocabulary
OSA 1
1 birth
2 dust
3 operation
4 row
5 sport
6 victory
1 choice
2 crop
3 flesh
4 salary
5 secret
6 temperature
1 cap
2 education
3 journey
4 parent
5 scale
6 trick
1 attack
2 charm
3 lack
4 pen
5 shadow
6 treasure
1 cream
2 factory
3 nail
4 pupil
5 sacrifice
6 wealth

2000

___ urheilu
___ voitto
___ syntyminen

___ lmp
___ liha
___ palkka

___ koulutus

___ asteikko
___ matka

___ aarre
___ lumous, viehtysvoima
___ puuttua, olla vailla jotakin

___ kerma
___ rikkaudet, varallisuus
___ oppilas

1 adopt
2 climb
3 examine
4 pour
5 satisfy
6 surround
1 bake
2 connect
3 inquire
4 limit
5 recognize
6 wander
1 burst
2 concern
3 deliver
4 fold
5 improve
6 urge
1 original
2 private
3 royal
4 slow
5 sorry
6 total
1 brave
2 electric
3 firm
4 hungry
5 local
6 usual

___ kiivet, nousta


___ katsoa tarkasti
___ olla joka puolella

___ yhdist
___ kvell ilman pmr
___ rajoittaa

___ srky, puhjeta

___ tehd paremmaksi


___ vied jotakin jollekulle

___ alkuperinen
___ yksityinen
___ yhteens

___ tavallinen
___ nlkinen
___ urhea,

rohkea

Motivation

English Self-concept
Intrinsic interest
Instrumentality
Motivational Intensity
Parental Encouragement
Self-regulation
Anxiety

ENGLISH SELF-CONCEPT
Compared to other students, I'm good at English
I have always done well in English.
Studying English is easy for me.
I get good marks in English.
I learn English quickly.
Im better at English than most of my classmates.
Items dropped
I am hopeless when it comes to English
I am satisfied with how well I do in English.

Independent variables: Students


How much homework do you normally do during a normal
school day?
o Not at all
o Half an hour or less a day
o From half an hour to an hour a day
o 12 hours a day
o Over 2 hours a day

How do you feel about reading in your free time?

oI like reading a lot


oI like reading somewhat
oI dont like reading

Independent variables: Students


How often do you read the following things in your free
time?
Daily or nearly
daily

12
times a week

12
times a month

Rarely or never

a) text messages

b) email
c) Facebook or
Twitter
conversations
d) messages in
chats (e.g. MSN,
IRC)
e) intenet
chatforums
f) blogs or home
pages
g) news or other
newspaper articles
online
h) online nonfiction texts (e.g.

I read

Independent variables: Parents


Parents education
Compulsory
school
a) Childs
mother
b) Childs
father

Vocational
Bachelors
school or Gymnasium
degree
institute

Masters
degree

Parents occupation
a) Childs
mother
b) Childs
father

Housewife/
Unemployed
husband

Working

Retired

Student

Independent variables: Parents


Before the child learned to read, was somebody in the family
engaged in the following activities with the child?

a) Read books or told


stories
b) Talked about
everyday activities or
events
c) Sang
d) Played with lettertoys (e.g. blocks)
e) Played wordgames
f) Wrote letters or
words
g) Read aloud signs
or labels

Rarely or never

12 times a month

12 times a week

Everyday or nearly
everyday

Structural Equation Modelling (SEM)


Cognitive variables, 4th graders
Three latent variables (path model)

Structural Equation Modelling (SEM)


Cognitive variables, Gymnasium
Three latent variables (path model)

Dependent Adjusted
variable
R Squared
4th Grade

8th Grade

Gymnasiu
m

Pearson
Young
Learners
Test in
English
Pearson
General +
DIALANG
Medium

Pearson
General +
DIALANG
Advanced

.526

.671

.708

%
variance

First IV

Second
IV

Third IV

Fourth IV

53%

Size of
English
Vocab
(.664)

Writing
in L1
Finnish
(.419)

L2
segmentation
accuracy
(-.584)

L1
Finnish
Reading
(ALLU)
(.403)

67%

Size of
English
Vocab
(.740)

Writing in L2
English
segment(.696)
ation
accuracy
(-.641)

Size of
Finnish
Vocab
(.282)

English
dictation
(.795)

Size of
English
Vocab
(.747)

L2
segmentation
accuracy
(-.677)

71%

LI Finnish
Reading
(PISA)
(.418)

Fifth IV

Writing
in
English
(.680)

Positive or negative?
On teaching?
On learning?
On content?
On method?
On rate and sequence of learning?
On degree and depth of learning?
On attitudes?
On all teachers and learners?
On some teachers and learners?

Curriculum
contents of curriculum, timetabling
Teaching materials
choice of textbooks, use of past papers, teachermade materials
Teaching methods
choice of methods, teaching of test-taking skills
Attitudes and feelings
of learners and teachers
Learning
Do test results improve?
Does learning improve?

What might be the possible unintended negative


consequences of diagnostic testing?
The fact is that so far we have no research into the
washback or impact of diagnostic tests.

Empirical research is urgently needed:


How might such research be designed and conducted?

Thank you for your attention!


c.alderson@lancaster.ac.uk

Das könnte Ihnen auch gefallen