Sie sind auf Seite 1von 7

English Language

Assessment: Meeting
two
Criterion of a good test
•Validity: provides evidence for
decisions
Reliability: consistency and precise

measurement
•Practicality: design, administration,
scoring
•Content coverage: domain / construct
represented
•Impact: test consequences are as
expected & desired
Validity
•Test are done to produce evidence to
inform real-world decisions
•Key question: Can this test support
the
decisions based on it?
•Previously split into content validity,
construct validity, criterion validity,
internal/external validity…
• Coherent validity frameworks:
– Messick (1989): construct validity
–Kane (2006, 2013): argument-based
validation

Threats to validity
• Construct underrepresentation:
only part of the target is tested, e.g.,
–Judging someone’s cooking skills
based only on whether they can make
omelettes,
–Judging a learners’ speaking ability
based on reading a text
•Construct-irrelevant variance: other
factors that we don’t intend to measure
affect scores, e.g.,
–Judging someone’s cooking skills
while having them cook in the dark,
–Judging a learner’s listening ability
by giving
them a text spoken in a heavy Scottish
accent

Reliability
•Consistency of elicited performance
and scoring
•Would a candidate get the same/very
similar result
– on a different version of the same test?
–if it were scored again by a different
examiner?
– if the candidate were given a similar
task, i.e. another task that aims to measure
the same construct?
•Test tasks are sets of task
characteristics, which should be stable
across test versions.
•Reliability is expressed as Cronbach’s
α (range: 0-1)
•For tests enabling high-stakes
decisions, α > .8 is desirable though
most aim for α > .9
• Reliability is necessary for validity

Practicality
• How resource-intensive is the test?
•Design: Item specifications allow
rapid development of parallel items
•Administration: Test can be
administered in a reasonable time (no
more than 2 hours)
•Scoring: Answers can be scored
quickly, reliably with minimum man /
woman power; fast turn-around of
results

Content coverage
• Is the curriculum material adequately
represented in the test?
•Are the target student learning
outcomes covered?
•Is the weighting of knowledge and
performance appropriate?
– Online vs. offline processing:
• in real time or without time pressure?
–Recognition vs. production vs.
performance:
• Recognize the correct answer
• Produce the correct answer
• Perform in a situation

Impacts
A test is a process involving activity before,
during
and after the moment of measurement which
affects
a network of people and institutions with a
vested
interest (stakeholders)
•Candidates: money, time, test preparation &
language learning, job opportunities, career,
life decisions & opportunities
•Professional boards: skill shortages, quality
control, patient safety
•Government: skill & training shortages,
health system functioning
•Health system users (patients & families,
other health professionals): Effective
communication in English
Language schools: Teacher planning,

materials

Das könnte Ihnen auch gefallen