Beruflich Dokumente
Kultur Dokumente
1.0 SYNOPSIS
1
CONTENT
1.3 INTRODUCTION
1.4.1 Test
2
1.4.2 Assessment
1.4.3 Evaluation
4
language ability as consisting of skills (listening, speaking, reading and
writing) and components (e.g. grammar, vocabulary, pronunciation) and
an approach to test design that focused on testing isolated ‘discrete
points’ of language, while the primary concern was with psychometric
reliability (e.g. Lado,1961; Carroll,1968). Language testingresearchwas
dominated largely bythe hypothesis that language proficiency consisted of a
single unitarytrait, and a quantitative, statisticalresearch methodology (Oller,
1979).
The beginning of the new millennium is another exciting time for anyone
interested in language testing and assessment research. Current
developments in the fields of applied linguistics, language learning and
5
pedagogy, technological innovation, and educational measurement have
opened up some rich new research avenues.
6
Malaysia Education Blueprint (2013-2025)
7
assessment, Malaysia lags far behind regional peers like Singapore, Japan,
South Korea, and Hong Kong in every category.
Poor performance in Pisa is normally linked to students not being able
to demonstrate higher order thinking skill. To remedy this, the Ministry of
Education has started to implement numerous changes to the examination
system. Two out of the three nationwide examinations that we currently
administer to primary and secondary students have gradually seen major
changes. Generally, the policies are ideal and impressive, but there are still a
few questions on feasibility that have been raised by concern parties.
Figure 2 below shows the development of educational evaluation in Malaysia
since pre-independence until today.
8
Examinations
Examinations were
wereconducted
conducted according to to
according thethe
Pre-
Pre- needs
needs of of
school or or
school based
based onon
overseas
overseas
Independence examinations
examinations such
suchasas
thethe
Overseas
OverseasSchool
School
Independence
Certificate.
Certificate.
Razak Report
Razak Reportgave
gavebirth to to
birth thethe
National
National
Education Policy
Education and
Policy andthethecreation
creationof of
Implementation
Implementation Examination Syndicate
Examination Syndicate (LP).
(LP).LPLPconducted
conducted
ofof
the Razak
the Razak examinations such asasthetheCambridge and
examinations such Cambridge and
Report
Report(1956)
(1956) Malayan Secondary
Malayan Secondary School
School Entrance
Entrance
Examination (MSSEE),
Examination (MSSEE), andand Lower
LowerCertificate of of
Certificate
Education (LCE)
Education (LCE)Examination.
Examination.
RahmanTalib
RahmanTalib Report
Reportrecommended
recommended thethe
following
following actions:
actions:
1. 1.
Extend
Extend schooling
schooling ageageto to
1515years
yearsold.
old.
2. 2.
Automatic
Automatic promotion
promotion to to
higher
higher classes.
classes.
3. 3.
Multi-stream
Multi-stream education
education (Aneka
(Aneka Jurusan).
Jurusan).
TheThefollowing
followingchanges
changes in in
examination
examination were
were
Implementation
Implementation made:
made:
ofof
the
the - The
- Theentry of of
entry elective
electivesubjects
subjects in in
LCELCE and
and
RahmanTalib
RahmanTalib SRP.
SRP.
Report
Report(1960)
(1960) - Introduction
- Introduction examination
examination of of
thethe
Standard
Standard 55
Evaluation
Evaluation Examination.
Examination.
- The
- Theintroduction
introductionof of
Malaysia's
Malaysia's Vocational
Vocational
Education
Education Examination.
Examination.
- The
- Theintroduction
introductionof of
thetheStandard
Standard 3 Dignostic
3 Dignostic
Test
Test(UDT).
(UDT).
TheThe implementation
implementation of of
Cabinet
Cabinet Report
Report
Implementation resulted
resulted in in
evolution
evolution of of
thethe education
education system
system
Implementation
to to
itsits
present
present state,
state,especially
especially with KBSR
with KBSR
ofof
the Cabinet
the Cabinet andand KBSM.
KBSM. Adjustments
Adjustments were
were made
made in in
Report
Report(1979)
(1979) examination to to
fulfill thethe new curriculum's
examination fulfill new curriculum's
needs
needs andandto to
ensure
ensure it is in in
it is line with
line withthethe
National
National Education
Education Philosophy.
Philosophy.
TheTheemphasis
emphasis is is
ononSchool-Based
School-Based Assessment
Assessment
(SBA).
(SBA).It was
It wasfirst introduced
first introduced in in
2002.
2002. It is a new
It is a new
system
system of of
assessment
assessment andandis is
oneoneof of
thethenew new
areas
areaswhere
where teachers
teachers arearedirectly
directlyinvolved.
involved.The The
Implementation
Implementation ofof revamp of of
thethe
national examination and school-
revamp national examination and school-
the Malaysia
the Malaysia based
based assessments
assessments in in
stages,
stages, whereby
whereby byby 2016,
2016,
Education
Education Blueprint
Blueprint at at
least
least40%40% of ofquestions
questions in in
(2013
(2013 2025)
– – 2025) UjianPenilaianSekolahRendah
UjianPenilaianSekolahRendah (UPSR)
(UPSR) and and50%
50%
in in
SijilPelajaran
SijilPelajaran Malaysia
Malaysia (SPM)
(SPM) areareof of
highhighorder
order
thinking
thinking skills questions.
skills questions.
9
Figure 2: The development of educational evaluation in Malaysia
By and large, the role of MES is to complement and complete the
Source: Malaysia Examination Board (MES)
implementation of the national education policy. Among its achievements are:
http://apps.emoe.gov.my/1pm/maklumatam.htm
i i
vi vi
ii ii
vv iii iii
iv iv
Exercise
Describe the stages involved in the development of
educational evaluation in Malaysia.
Read
more: http://www.nst.com.my/nation/general/school
-based-assessment-plan-may-need-tweaking-
1.166386
Tutorial question
10
ROLE AND PURPOSES OF
TOPIC 2 ASSESSMENT IN
TEACHING AND LEARNING
2.0 SYNOPSIS
Role and
Role and
Purposes ofof
Purposes
Assessment
Assessment in in
Teaching and
Teaching and
Learning
Learning
Types
Typesofof
Tests:
Tests:
Assessment
Assessment ofof Proficiency,
Proficiency,
Reasons
Reasons / Purposes
/ Purposes Learning /
Learning / Achievement,
Achievement,
ofof
Assessment
Assessment Assessment
Assessment forfor Diagnostic,
Diagnostic,Aptitude,
Aptitude,
Learning
Learning and Placement
and Placement Tests
Tests
11
CONTENT
12
assessment are most likely to maximise student learning and well being? How
best can we use assessment in the service of student learning and wellbeing?
We have a traditional answer to these questions. Our traditional answer says
that to maximise student learning we need to develop rigorous standardised
tests given once a year to all students at approximately the same time. Then,
the results are used for accountability, identifying schools for additional
assistance, and certifying the extent to which individual students are “meeting
competency.”
Let us take a closer look at the two assessments below i.e. Assessment
of Learning and Assessment for Learning.
This summative assessment, the logic goes, will provide the focus to
improve student achievement, give everyone the information they need to
improve student achievement, and apply the pressure needed to motivate
teachers to work harder to teach and learn.
13
Assessment for learning is more commonly known as formative and
diagnostic assessments. Assessment for learning is the use of a task or an
activity for the purpose of determining student progress during a unit or block
of instruction. Teachers are now afforded the chance to adjust classroom
instruction based upon the needs of the students. Similarly, students are
provided valuable feedback on their own learning.
Proficiency Tests
Achievement Tests
15
Diagnostic Tests
Aptitude Tests
This type of test no longer enjoys the widespread use it once had. An
aptitude test is designed to measure general ability or capacity to learn a
foreign language a priori (before taking a course) and ultimate predicted
success in that undertaking. Language aptitude tests were seemingly
designed to apply to the classroom learning of any language. In the United
States, two common standardised English Language tests once used were the
Modern Language Aptitude Test (MLAT; Carroll & Sapon, 1958) and the
Pimsleur Language Aptitude Battery (PLAB; Pimsleur, 1966). Since there is
no research to show unequivocally that these kinds of tasks predict
communicative success in a language, apart from untutored language
acquisition, standardised aptitude tests are seldom used today with the
exception of identifying foreign language disability (Stansfield & Reed, 2004).
Progress Tests
These tests measure the progress that students are making towards
defined course or programme goals. They are administered at various stages
throughout a language course to see what the students have learned, perhaps
after certain segments of instruction have been completed. Progress tests are
generally teacher produced and are narrower in focus than achievement tests
because they cover a smaller amount of material and assess fewer objectives.
16
Placement Tests
These tests, on the other hand, are designed to assess students’ level
of language ability for placement in an appropriate course or class. This type
of test indicates the level at which a student will learn most effectively. The
main aim is to create groups, which are homogeneous in level. In designing a
placement test, the test developer may choose to base the test content either
on a theory of general language proficiency or on learning objectives of the
curriculum. In the former, institutions may choose to use a well-established
proficiency test such as the TOEFL or IELTS exam and link it to curricular
benchmarks. In the latter, tests are based on aspects of the syllabus taught at
the institution concerned.
Discuss and present the various types of tests and assessment tasks
that students have experienced.
17
TOPIC 3 BASIC TESTING TERMINOLOGY
3.0 SYNOPSIS
Norm-Referenced
Norm-Referenced
and Criterion-
and Criterion-
Referenced
Referenced
Types
Typesofof
Tests
Tests
18
Formative
Formativeand
and Objective
Objectiveand
and
Summative
Summative Subjective
Subjective
CONTENT
20
assessment can be seen as assessment for learning. It is part of the
instructional process.We can think of formative assessment as “practice.”
With continual feedback the teachers may assist students to improve their
performance. The teachers point out on what the students have done wrong
and help them to get it right. This can take place when teachers examine the
results of achievement and progress tests. Based on the results of formative
test or assessment, the teachers can suggest changes to the focus of
curriculum or emphasis on some specific lesson elements. On the other hand,
students may also need to change and improve. Due to the demanding nature
of this formative test, numerous teachers prefer not to adopt this test although
giving back any assessed homework or achievement test present both
teachers and students healthy and ultimate learning opportunities.
21
Quizzes and essays National exams (UPSR, PMR, SPM,
STPM)
Diagnostic tests Entrance exams
Table 3.1: Common formative and summative assessments in schools
22
Let’s look at some important terminology when designing multiple-choice
questions. This objective test item comprises five terminologies namely:
2. Stem
Every multiple-choice item consists of a stem (the ‘body’ of the item
that presents a stimulus). Stem is the question or assignment in an item. It is
in a complete or open, positive or negative sentence form. Stem must be
short or simple, compact and clear. However, it must not easily give away the
right answer.
3. Options or alternatives
They are known as a list of possible responses to a test item.
There are usually between three and five options/alternatives to
choose from.
4. Key
This is the correct response. The response can either be correct
or the best one. Usually for a good item, the correct answer is not obvious as
compared to the distractors.
5. Distractors
This is known as a ‘disturber’ that is included to distract students from
selecting the correct answer. An excellent distractor is almost the same as the
correct answer but it is not.
23
i. Design each item to measure a single objective;
ii. State both stem and options as simply and directly as possible;
iii. Make certain that the intended answer is clearly the one correct
one;
iv. (Optional) Use item indices to accept, discard or revise item.
Some have argued that the distinction between objective and subjective
assessments is neither useful nor accurate because, in reality, there is no
such thing as ‘objective’ assessment. In fact, all assessments are created with
inherent biases built into decisions about relevant subject matter and content,
as well as cultural (class, ethnic, and gender) biases.
24
TOPIC 4 BASIC PRINCIPLES OF ASSESSMENT
Reflection
1. Objective test items are items that have only one answer or correct
response. Describe in-depth the multiple-choice test item.
Discussion
4.0 SYNOPSIS
25
2. explain the differences between validity and reliability ;
Reliability
Reliability
Interpretability
Interpretability Validity
Validity
Types
Typesofof
Tests
Tests
Practicality
Practicality
Authenticity
Authenticity
CONTENT Washback
WashbackEffect
Effect
Objectivity
Objectivity
26
SESSION FOUR (3 hours)
4.3 INTRODUCTION
29
4.4.3 Factors influencing Reliability
30
b. Teacher-Student factors
c. Environment factors
Because students' grades are dependent on the way tests are being
administered, test administrators should strive to provide clear and
accurate instructions, sufficient time and careful monitoring of tests to
improve the reliability of their tests. A test-re-test technique can be
used to determine test reliability.
e. Marking factors
31
potentially a large error introduced by individual markers. It is also
common that different markers award different marks for the same
answer even with a prepared mark scheme. A marker’s assessment
may vary from time to time and with different situations. Conversely, it
does not happen to the objective type of tests since the responses are
fixed. Thus, objectivity is a condition for reliability.
4.5 VALIDITY
32
without taking into account its comprehensibility (clarity), rhetorical discourse
elements, and the organisation of ideas.
• Content validity: Does the assessment content cover what you want to
assess? Have satisfactory samples of language and language skills been
selected for testing?
• Construct validity: Are you measuring what you think you're measuring?
Is the test based on the best available theory of language and language
use?
• Concurrent (parallel) validity: Can you use the current test score to
estimate scores of other criteria? Does the test correlate with other existing
measures?
33
Figure 4.5: Types of Validity
34
4.5.2 Content validity
What are the different types of validity? Describe any three types and
cite examples.
http://www.2dix.com/pdf-2011/testing-and-evaluation-in-esl-pdf.php
4.5.6 Practicality
4.5.7 Objectivity
39
especially when they represent accomplishments in a student’s
developing competence. Teachers can have various strategies in
providing guidance or coaching. Washback enhances a number of
basic principles of language acquisition namely intrinsic motivation,
autonomy, self-confidence, language ego, interlanguage, and strategic
investment, among others.
Washback is generally said to be either positive or negative.
Unfortunately, students and teachers tend to think of the negative
effects of testing such as “test-driven” curricula and only studying and
learning “what they need to know for the test”. Positive washback, or
what we prefer to call “guided washback” can benefit teachers, students
and administrators. Positive washback assumes that testing and
curriculum design are both based on clear course outcomes, which are
known to both students and teachers/testers. If students perceive that
tests are markers of their progress towards achieving these outcomes,
they have a sense of accomplishment. In short, tests must be part of
learning experiences for all involved. Positive washback occurs when a
test encourages good teaching practice.
4.5.9 Authenticity
40
those target language tasks and for transforming them into valid test
items.
4.6.0 Interpretability
41
TOPIC 5 DESIGNING CLASSROOM LANGUAGE
TEST
5.0 SYNOPSIS
Topic 5 exposes you the stages of test construction, the preparing of test
blueprint/test specifications, the elements in a Test Specifications Guidelines
And the importance of following the guidelines for constructing tests items.
Then we look at the various test formats that are appropriate for language
assessment.
CONTENT
43
i determining vi pre-testing
ii planning vii validating
iii writing
iv preparing
v reviewing
5.3.1 Determining
The essential first step in testing is to make oneself perfectly
clear about what it is one wants to know and for what purpose. When
we start to construct a test, the following questions have to be
answered.
5.3.2 Planning
The first form that the solution takes is a set of specifications for
the test.This will include information on: content, format and timing,
criteria,levels of performance, and scoring procedures.
In this stage, the test constructor has to determine the content by
answering the following questions:
Describing the purpose of the test;
Describing the characteristics of the test takers, the nature of the
population of the examinees for whom the test is being designed.
Defining the nature of the ability we want to measure;
Developing a plan for evaluating the qualities of test usefulness, which
is the degree to which a test is useful for teachers and students, it
44
includes six qualities: reliability, validity, authenticity, practicality inter-
activeness, and impact;
Identifying resources and developing a plan for their allocation and
management;
Determining format and timing of the test;
Determining levels of performance;
Determining scoring procedures
5.3.3 Writing
Although writing items is time-consuming, writing good items is an art.
No one can expect to be able consistently to produce perfect items.
Some items will have to be rejected, others reworked. The best way to
identify items that have to be improved or abandoned is through
teamwork. Colleagues must really try to find fault; and despite the
seemingly inevitable emotional attachment that item writers develop to
items that they have created, they must be open to, and ready to
accept, the criticisms that are offered to them. Good personal relations
are a desirable quality in any test writing team.
5.3.4 Preparing
One has to understand the major principles, techniques and experience
of preparing the test items. Not every teacher can make a good tester.
To construct different kinds of tests, the tester should observe some
principles. In the production-type tests, we have to bear in mind that no
comments are necessary. Test writers should also try to avoid test
items, which can be answered through test- wiseness. Test-
wiseness refers to the capacity of the examinees to utilise the
characteristics and formats of the test to guess the correct answer.
5.3.5 Reviewing
Principles for reviewing test items:
The test should not be reviewed immediately after its construction,
but after some considerable time.
Other teachers or testers should review it. In a language test, it is
preferable if native speakers are available to review the test.
5.3.6 Pre-testing
After reviewing the test, it should be submitted to pre-testing.
The tester should administer the newly-developed test to a group of
examinees similar to the target group and the purpose is to analyse
every individual item as well as the whole test.
Numerical data (test results) should be collected to check the
efficiency of the item, it should include item facility and
discrimination.
5.3.7 Validating
Item Facility (IF) shows to what extent the item is easy or difficult. The
items should neither be too easy nor too difficult. To measure the facility
or easiness of the item, the following formula is used:
46
IF= number of correct responses (Σc) / total number of candidates (N)
And to measure item difficulty:
IF= (Σw) / (N)
The results of such equations range from 0 – 1. An item with a
facility index of 0 is too difficult, and with 1 is too easy. The ideal item is
one with the value of (0.5) and the acceptability range for item facility is
between [0.37 → 0.63], i.e. less than 0.37 is difficult, and above 0.63 is
easy.
Thus, tests which are too easy or too difficult for a given sample
population, often show low reliability. As noted in Topic 4, reliability is
one of the complementary aspects of measurement.
Besides knowing the purpose of the test you are creating, you
are required to know as precisely as possible what it is you want to test.
Do not conduct a test hastily. Instead, you need to examine the
objectives for the unit you are testing carefully.
5.5 Bloom’s and SOLO Taxonomies
48
5.5.1 Bloom’s Taxonomy (Revised)
Blooms’ Taxonomy is a systematic way of describing how a
learner’s performance develops from simple to complex levels in their
affective, psychomotor and cognitive domain of learning. The Original
Taxonomy provided carefully developed definitions for each of the six
major categories in the cognitive domain. The categories were
Knowledge, Comprehension, Application, Analysis, Synthesis, and
Evaluation. With the exception of Application, each of these was
broken into subcategories. The complete structure of the original
Taxonomy is shown in Figure 5.1.
49
subcategories. The verb aspect was included in the definition given to
Knowledge in that the student was expected to be able to recall or
recognise knowledge. This brought uni-dimensionality to the framework
at the cost of a Knowledge category that was dual in nature and thus
different from the other Taxonomic categories. In 1990s, Anderson
(former student of Bloom) eliminated this inconsistency in the revised
Taxonomy by allowing these two aspects, the noun and verb, to form
separate dimensions, the noun providing the basis for the Knowledge
dimension and the verb forming the basis for the Cognitive Process
dimension as shown in Figure 5.2.
50
In the revised Bloom’s Taxonomy, the names of six major
categories were changed from noun to verb forms. As the taxonomy
reflects different forms of thinking and thinking is an active process
verbs were used instead of nouns.
Level 1 – C1
Level 2 – C2
51
Categories & Alternative Names Definition
Cognitive Processes
Illustrating
Exemplifying Instantiating Finding a specific
example or illustration of
a concept or principle
Categorising
Classifying Subsuming Determining that
something belongs to a
category
Abstracting
Summarising Generalising Abstracting a general
theme or major point(s)
Concluding
Inferring Extrapolating Drawing a logical
Interpolating conclusion from
Predicting presenting information
Contrasting
Comparing Mapping Detecting
Matching correspondences
between two ideas,
objects, and the like
Constructing models
Explaining Constructing a cause
and effect model of a
system
Level 3 – C3
53
betweena product and
external
criteria;determining
whether a product has
external consistency;
detecting the
appropriateness of a
procedure for a given
problem
54
5.5.2 SOLO Taxonomy
On the other hand, SOLO, which stands for the Structure of the
Observed Learning Outcome, taxonomy is a systematic way of
describing how a learner’s performance develops from simple to
complex levels in their learning. Biggs & Collis first introduced it, in their
1982 study. There are 5 stages, namely Prestructural, Unistructural,
Multistructural, which are in a quantitative phrase and Relational and
Extended Abstract, which are in a qualitative phrase.
55
56
57
Figure 5.3: SOLO Taxonomy
58
process explicit to the student. Use of HOTS viz. Higher Order Thinking Skills)
maps (Hook & Mills, 2011) can be used in English to scaffold in depth
discussion, encouraging students to:
The most powerful model for understanding these three levels and
integrating them into learning intentions and success criteria is the
SOLO model.
61
5.6.5 Level of difficulty
62
What is the difference between test format and test type? For example,
when you want to introduce new kinds of test, for example, reading test, which
is organised a little bit different from the existing test items, what do you say?
Test format or test type? Test format refers to the layout of questions on a test.
For example, the format of a test could be two essay questions, 50 multiple-
choice questions, etc.For the sake of brevity, I will consider providing the
outlines of some large-scale standardised tests.
UPSR
63
IELTS Test Format
IELTS is a test of all four language skills – Listening, Reading, Writing &
Speaking. Test-takers will take the Listening, Reading and Writing tests
all on the same day one after the other, with no breaks in between.
Depending on the examinee’s test centre, one’s Speaking test may be on
the same day as the other three tests, or up to seven days before or after
that. The total test time is under three hours. The test format is illustrated
below.
64
TOPIC 6 ASSESSING LANGUAGE SKILLS
CONTENT
6.0 SYNOPSIS
Topic 6 focuses on ways to assess language skills and language
content. It defines the types of test items used to assess language skills
and language content. It also provides teachers with suggestions on
ways a teacher can assess the listening, speaking, reading and writing
skills in a classroom. It also discusses concepts of and differences
between discrete point test, integrative test and communicative test.
66
to listen for names, numbers, grammatical category, directions (in a map
exercise), or certain facts and events.
iv. Extensive : listening to develop a top-down , global
understanding of spoken language. Extensive performance
ranges from listening to lengthy lectures to listening to a
conversation and deriving a comprehensive message or
purpose. Listening for the gist – or the main idea- and making
inferences are all part of extensive listening.
b. Speaking
In the assessment of oral production, both discrete feature
objective tests and integrative task-based tests are used. The first
type tests such skills as pronunciation, knowledge of what
language is appropriate in different situations, language required in
doing different things like describing, giving directions, giving
instructions, etc. The second type involves finding out if pupils can
perform different tasks using spoken language that is appropriate
for the purpose and the context. Task-based activities involve
describing scenes shown in a picture, participating in a discussion
about a given topic, narrating a story, etc. As in the listening
performance assessment tasks, Brown 2010 cited four categories
for oral assessment.
67
to participate in an interactive conversation. The only role of
listening here is in the short-term storage of a prompt, just long
enough to allow the speaker to retain the short stretch of
language that must be imitated.
2. Intensive. The production of short stretches of oral language
designed to demonstrate competence in a narrow band of
grammatical, phrasal, lexical, or phonological relationships.
Examples of intensive assessment tasks include directed
response tasks (requests for specific production of speech),
reading aloud, sentence and dialogue completion, limited picture-
cued tasks including simple sentences, and translation up to the
simple sentence level.
3. Responsive. Responsive assessment tasks include interaction
and test comprehension but at somewhat limited level of very
short conversation, standard greetings, and small talk, simple
requests and comments, etc. The stimulus is almost always a
spoken prompt (to preserve authenticity) with one or two follow-up
questions or retorts:
68
participants. Interaction can be broken down into two types : (a)
transactional language, which has the purpose of exchanging
specific information, and (b) interpersonal exchanges, which have
the purpose of maintaining social relationships. (In the three
dialogues cited above, A and B are transactional, and C is
interpersonal).
5. Extensive (monologue). Extensive oral production tasks include
speeches, oral presentations, and storytelling, during which the
opportunity for oral interaction from listeners is either highly limited
(perhaps to nonverbal responses) or ruled out together. Language
style is more deliberative (planning is involved) and formal for
extensive tasks.In can include informal monologue such as
casually delivered speech (e.g., recalling a vacation in the
mountains, conveying recipes, recounting the plot of a novel or
movie).
c. Reading
Cohen (1994), discussed various types of reading and meaning
assessed. He describes skimming and scanning as two different types
of reading. In the first, a respondent is given a lengthy passage and is
required to inspect it rapidly (skim) or read to locate specific
information (scan) within a short period of time. He also discusses
receptive reading or intensive reading which refers to “a form of
reading aimed at discovering exactly what the author seeks to
convey” (p. 218). This is the most common form of reading especially
in test or assessment conditions. Another type of reading is to read
responsively where respondents are expected to respond to some
point in a reading text through writing or by answering questions.
A reading text can also convey various kinds of meaning and reading
involves the interpretation or comprehension of these meanings. First,
grammatical meaning are meanings that are expressed through
linguistic structures such as complex and simple sentences and the
69
correct interpretation of those structures. A second meaning is
informational meaning which refers largely to the concept or messages
contained in the text. Respondents may be required to comprehend
merely the information or content of the passage and this may be
assessed through various means such as summary and précis writing.
Compared to grammatical or syntactic meaning, informational meaning
requires a more general understanding of a text rather than having to
pay close attention to the linguistic structure of sentences. A third
meaning contained in many texts is discourse meaning. This refers to
the perception of rhetorical functions conveyed by the text. One typical
function is discourse marking which adds cohesiveness to a text.
These words, such as unless, however, thus, therefore etc., are crucial
to the correct interpretation of a text and students may be assessed on
their ability to understand the discoursal meaning that they bring in the
passage. Finally, a fourth meaning which may also be an object of
assessment in a reading test is the meaning conveyed by the writer’s
tone. The writer’s tone – whether it is cynical, sarcastic, sad or etc.- is
important in reading comprehension but may be quite difficult to
identify, especially by less proficient learners. Nevertheless, there can
be many situations where the reader is completely wrong in
comprehending a text simply because he has failed to perceive the
correct tone of the author.
d. Writing
Brown (2004), identifies three different genres of writing which are
academic writing, job-related writing and personal writing, each of
which can be expanded to include many different examples. Fiction,
for example, may be considered as personal writing according to
Brown’s taxonomy. Brown (2010) identified four categories of written
performance that capture the range of written production which can
be used to assess writing skill.
70
punctuation, and brief sentences. This category includes the
ability to spell correctly and to perceive phoneme-grapheme
correspondences in the English spelling system. At this stage the
learners are trying to master the mechanics of writing. Form is
the primary focus while context and meaning are of secondary
concern.
2. Intensive (controlled). Beyond the fundamentals of imitative
writing are skills in producing appropriate vocabulary within a
context, collocation and idioms, and correct grammatical features
up to the length of a sentence. Meaning and context are
important in determining correctness and appropriateness but
most assessment tasks are more concerned with a focus on form
and are rather strictly controlled by the test design.
3. Responsive. Assessment tasks require learners to perform at a
limited discourse level, connecting sentences into a paragraph
and creating a logically connected sequence of two or three
paragraphs. Tasks relate to pedagogical directives, lists of
criteria, outlines, and other guidelines. Genres of writing include
brief narratives and descriptions, short reports, lab reports,
summaries, brief responses to reading, and interpretations of
charts and graphs. Form-focused attention is mostly at the
discourse level, with a strong emphasis on context and meaning.
4. Extensive. Extensive writing implies successful management of all
the processes and strategies of writing for all purposes, up to the
length of an essay, a term paper, a major research project report,
or even a thesis. Focus is on achieving a purpose, organizing and
developing ideas logically, using details to support or illustrate
ideas, demonstrating syntactic and lexical variety, and in many
cases, engaging in the process of multiple drafts to achieve a final
product. Focus on grammatical form is limited to occasional
editing and proofreading of a draft.
71
Tests have been categorized in many different ways. The most
familiar terms regarding tests are the objective and subjective
tests . We normally associate objective tests with multiple choice
question type tests and subjective tests with essays. However, to
be more accurate we will consider how the test is graded. Objective
tests are tests that are graded objectively while subjective tests are
thought to involve subjectivity in grading.
There are many examples of each type of test. Objective type tests
include the multiple choice test, true false items and matching items
because each of these are graded objectively. In these examples of
objective tests, there is only
one correct response and the grader does not need to subjectively
assess the response.
Two other terms, select type tests and supply type tests are related
terms when we think of objective and subjective tests. In most
cases, objective tests are similar to select type tests where students
are expected to select or choose the answer from a list of options.
Just as a multiple choice question test is an objective type test, it
can also be considered a select type test. Similarly, tests involving
essay type questions are supply type as the students are expected
to supply the answer through their essay. How then would you
classify a fill in the blank type test? Definitely for this type of test,
the students need to supply the answer, but what is supplied is
72
merely a single word or a short phrase which differs tremendously
from an essay. It may therefore be helpful to once again consider a
continuum with supply type and select type items at each end of the
continuum respectively.
It is not by accident that we find there are few, if any, test formats that are
either supply type and objective or select type and subjective. Select type
tests tend to be objective while supply type tests tend to be subjective.
In addition to the above, Brown and Hudson (1998), have also suggested
three broad categories to differentiate tests according to how students are
expected to respond. These categories are the selected response tests, the
constructed response tests, and the personal response tests. Examples of
each of these types of tests are given in Table 6.1.
73
Selected response assessments, according to Brown and Hudson (1998),
are assessment procedures in which “students typically do not create any
language” but rather “select the answer from a given list” (p. 658).
Constructed response assessment procedures require students to
“produce language by writing, speaking, or doing something else” (p.
660). Personal response assessments, on the other hand, require
students to produce language but also allows each students’ response to
be different from one another and for students to “communicate what they
want to communicate” (p. 663). These three types of tests, categorised
according to how students respond, are useful when we wish to
determine what students need to do when they attempt to answer test
questions.
b. Communicative Test
As language teaching has emphasised the importance of
communication through the communicative approach, it is not surprising
that communicative tests have also been given prominence. A
communicative emphasis in testing involves many aspects, two of
which revolve around communicative elements in tests and meaningful
content. Both these aspects are briefly addressed in the following sub
sections:
75
Integrating Communicative Elements into Examinations
Alderson and Banerjee (2002), report on various studies that seem to
point to the difficulty in achieving authenticity in tests. They cite Spence-
Brown (2001) who posits that “the very act of assessment changes the
nature of a potentially authentic task and compromises authenticity” and
that “authenticity must be related to the implementation of an activity,
not to its design” (p. 99). In her study, students were required to
interview native speakers outside the classroom and submit a tape-
recording of the interview. While this activity seems quite authentic, the
students were observed to prepare for the interview by “rehearsing the
interview, editing the results, and engaging in spontaneous, but flawed
discourse” (Alderson & Banerjee, 2002: 99), all of which are inauthentic
when viewed in terms of real life situations. Alderson himself argues
that because candidates in language tests are not interested in
communicating but to display their language abilities, the test situation
is a communicative event in itself and therefore cannot be used to
replicate any real world event (p. 98).
• involve performance;
• are authentic; and
• are scored on real-life outcomes.
In short, the kinds of tests that we should expect more of in the future
will be communicative tests in which candidates actually have to
produce the language in an interactive setting involving some degree of
unpredictability which is typical of any language interaction situation.
These tests would also take the communicative purpose of the
interaction into consideration and require the student to interact with
language that is actual and unsimplified for the learner. Fulcher finally
points out that in a communicative test, “the only real criterion of
success … is the behavioural outcome, or whether the learner was able
to achieve the intended communicative effect” (p. 493). It is obvious
from this description that the communicative test may not be so easily
developed and implemented. Practical reasons may hinder some of the
demands listed. Nevertheless, a solution to this problem has to be
found in the near future in order to have valid language that are
purposeful and can stimulate positive washback in teaching and
learning.
Exercise 1
77
performance as suggested by Brown (2004)
and relate their relationship to academic writing,
job related writing and personal writing.
7.0 SYNOPSIS
Topic 7 focuses on the scoring, grading and assessment criteria. It provides
teachers with brief descriptions on the different approaches to scoring
namely:-objective, holistic and analytic.
78
CONTENT
79
on a scale of 1 to 4, or 1 to 6, or even 1 to 10.(Bailey, 1998 : 187). Each
score on the scale will be accompanied with general descriptors of ability.
The following is an example of a holistic scoring scheme based on a 6
point scale.
RRating CCriteria
5 5-6 • Vocabulary is precise, varied, and vivid. Organization
is appropriate to writing assignment and contains
clear introduction, development of ideas, and
conclusion.
• Transition from one idea to another is smooth and
provides reader with clear understanding that
topic is changing.
• Meaning is conveyed effectively.
• A few mechanical errors may be present but do not
disrupt communication.
• Shows a clear understanding of writing and topic
development.
4 4 • Vocabulary is adequate for grade level. Events are
organized logically, but some part of the sample
may not be fully developed.
• Some transition of ideas is evident.
• Meaning is conveyed but breaks down at times.
• Mechanical errors are present but do not disrupt
communication.
• Shows a good understanding of writing and topic
development.
80
3 • Vocabulary is simple. Organization may be
extremely simple or there may be evidence of
disorganization.
• There are a few transitional markers or
repetitive transitional markers.
• Meaning is frequently not clear.
• Mechanical errors affect communication.
• Shows some understanding of writing and
topic development.
2 • Vocabulary is limited and repetitious. Sample
is comprised of only a few disjointed
sentences.
• No transitional markers.
• Meaning is unclear.
• Mechanical errors cause serious disruption in
communication.
• Shows little evidence of discourse
understanding.
1 • Responds with a few isolated words. No
complete sentences are written.
• No evidence of concepts of writing.
0 • No response.
The 6 point scale above includes broad descriptors of what a student’s essay
reflects for each band. It is quite apparent that graders using this scale are
expected to pay attention to vocabulary, meaning, organisation, topic
development and communication. Mechanics such as punctuation are
secondary to communication.
Bailey also describes another type of scoring related to the holistic approach
which she refers to as primary trait scoring. In primary trait scoring, a particular
functional focus is selected which is based on the purpose of the writing and
grading is based on how well the student is able to express that function. For
example, if the function is to persuade, scoring would be on how well the
author has been able to persuade the grader rather than how well organised
the ideas were, or how grammatical the structures in the essay were. This
technique to grading emphasises functional and communicative ability rather
than discrete linguistic ability and accuracy.
81
what is written? Is the essay meaningful? Similarly, we may also want to
consider the organisation of the essay. Does the writer begin the essay with
an appropriate topic sentence?
Are there good transitions between paragraphs? Other categories that we
may want to also consider include vocabulary, language use and mechanics.
The following are some possible components used in assessing writing
ability using an analytical scoring approach and the suggested weightage
assigned to each:
Components Weight
Content 30 points
Organisation 20 points
Vocabulary 20 points
Language Used 25 points
Mechanics 5 points
Each of the three scoring approaches claims to have its own advantages
and disadvantages. These can be illustrated by Table 7.2
82
weaknesses. writing without giving credit for what they can
do well.
EXERCISE
8.0 SYNOPSIS
Topic 8 focuses on item analysis and interpretation. It provides teachers with
brief descriptions on basic statistics terminologies such as mode, median, mean,
standard deviation, standard score and interpretation of data. It will also look at
some item analysis that deals with item difficulty and item discrimination.
Teachers will also be introduced to distractor analysis in language assessment.
83
8.2 FRAMEWORK OF TOPICS
CONTENT
Let us assume that you have just graded the test papers for your class. You
now have a set of scores. If a person were to ask you about the performance
of the students in your class, it would be very difficult to give all the scores in
the class. Instead, you may prefer to cite only one score.
Or perhaps you would like to report on the performance by giving some
values that would help provide a good indication of how the students in your
class performed. What values would you give? In this section, we will look at
two kinds of measures, namely measures of central tendency and measures
of dispersion. Both these types of measures are useful in score reporting.
Standard deviation refers to how much the scores deviate from the mean.
There are two methods of calculating standard deviation which are the
deviation method and raw score method which are illustrated by the following
formulae.
85
To illustrate this, we will use 20, 25,30. Using standard deviation method, we
come up with the following table:
Using the raw score method, we can come up with the following:
Table 8.2 : Calculating the Standard Deviation Using the Raw Score Method
86
Both methods result in the same final value of 5. If you are calculating
standard deviation with a calculator, it is suggested that the deviation
method be used when there are only a few scores and the raw score
method be used when there are many scores. This is because when
there are many scores, it will be tedious to calculate the square of the
deviations and their sum.
i. The Z score
The Z score is the basic standardised score. It is referred to as the
basic form as other computations of standardised scores must first
calculate the Z score. The formula used to calculate the Z score is as
follows:
87
Table 8.3: Calculating the Z Score for a Set of Scores
Z score values are very small and usually range only from –2 to 2.
Such small values make it inappropriate for score reporting especially
for those unaccustomed to the concept. Imagine what a parent may
say if his child comes home with a report card with a Z score of 0.47 in
English Language! Fortunately, there is another form of standardised
score - the T score – with values that are more palatable to the
relevant parties.
88
8.2.4 Interpretation of data
How can En. Abu solve this problem? He would have to have
standardised scores in order to decide. This would require the following
information:
Using the information above, En. Abu can find the Z score for each raw
score reported as follows:
Based on Table 8.4, both Ali and Chong have a negative Z score as
their total score for both tests. However, Chong has a higher Z score
total (i.e. –1.07 compared to – 1.34) and therefore performed better
when we take the performance of all the other students into
consideration.
89
find that while most will have an average height of perhaps 5 feet 4 inches,
there will be a few who will be relatively shorter and an equal number who
are relatively taller. By plotting the heights of all Malaysian men according to
frequency of occurrence, it is expected that we would obtain something
similar to a normal distribution curve. Similarly, test scores that measure any
characteristic such as intelligence, language proficiency or writing ability of a
specific population is also expected to provide us with a normal curve.
The following is a diagram illustrating how the normal curve would look like.
a. Item difficulty
Item difficulty refers to how easy or difficult an item is. The formula
used to measure item difficulty is quite straightforward. It involves
finding out how many students answered an item correctly and
dividing it by the number of students who took this test. The formula is
therefore:
91
b. Item discrimination
Let’s use the following instance as an example. Suppose you have just
conducted a twenty item test and obtained the following results:
92
As there are twelve students in the class, 33% of this total would be 4
students. Therefore, the upper group and lower group will each consist
of 4 students each. Based on their total scores, the upper group would
consist of students L, A, E, and G while the lower group would consist of
students J, H, D and I.
We now need to look at the performance of these students for each item
in order to find the item discrimination index of each item.
For item 1, all four students in the upper group (L, A, E, and G)
answered correctly while only student H in the lower group answered
correctly. Using the formula described earlier, we can plug in the
numbers as follows:
93
criterion referenced tests, item discrimination does not have as
important a role. Secondly, the use of 33.3% of the total number of
students who took the test in the formula is not inflexible as it is possible
to use any percentage between 27.5% to 35% as the value.
c. Distractor analysis
Let us assume that 100 students took the test. If we assume that A is the
answer and the item difficulty is 0.7, then 70 students answered correctly.
What about the remaining 30 students and the effectiveness of the three
distractors? If all 30 selected D, then distractors B and C are useless in their
role as distractors. Similarly, if 15 students selected D and another 15
selected B, then C is not an effective distractor and should be replaced.
94
Therefore, the ideal situation would be for each of the three distractors to be
selected by an equal number of all students who did not get the answer
correct, i.e. in this case 10 students. Therefore the effectiveness of each
distractor can be quantified as 10/100 or 0.1 where 10 is the number of
students who selected the tiems and 100 is the total number of students
who took the test. This technique is similar to a difficulty index although the
result does not indicate the difficulty of each item, but rather the
effectiveness of the distractor. In the first situation described in this
paragraph, options A, B, C and D would have a difficulty index of 0.7, 0, 0,
and 0.3 respectively. If the distractors worked equally well, then the indices
would be 0.7, 0.1, 0.1, and 0.1. Unlike in determining the difficulty of an
item, the value of the difficulty index formula for the distractors must be
interpreted in relation to the indices for the other distractors.
From a different perspective, the item discrimination formula can also be
used in distractor analysis. The concept of upper groups and lower groups
would still remain, but the analysis and expectation would differ slightly from
the regular item discrimination that we have looked at earlier. Instead of
expecting a positive value, we should logically expect a negative value as
more students from the lower group should select distractors. Each
distractor can have its own item discrimination value in order to analyse how
the distractors work and ultimately refine the effectiveness of the test item
itself.
Item 1 8* 3 1 0
Item 2 2 8* 2 0
Item 3 4 8* 0 0
Item 4 1 3 8* 0
Item 5 5 0 0 7*
d. * indicates key
For Item 1, the discrimination index for each distractor can be calculated
using the discrimination index formula. From Table 8.5, we know that all the
students in the upper group answered this item correctly and only one student
95
from the lower group did so. If we assume that the three remaining students
from the lower group all selected distractor B, then the discrimination index for
item 1, distractor B will be:
This negative value indicates that more students from the lower group
selected the distractor compared to students from the upper group. This result
is to be expected of a distractor and a value of -1 to 0 is preferred.
EXERCISE
1. Calculate the mean, mode, median and range of the following set of
scores:
23, 24, 25, 23, 24, 23, 23, 26, 27, 22, 28.
2. What is a normal curve and what does this show? Does the final
result always show a normal curve and how does this relate to
standardised tests?
96
REPORTING OF ASSESSMENT DATA
TOPIC 9
9.0 SYNOPSIS
Topic 9 focuses on reporting assessment data. It provides teachers with brief
descriptions on the purposes of reporting and the reporting methods.
CONTENT
97
9.2.1 Purposes of reporting
99
9.2.2 Reporting methods
iii An outcomes-approach
Acknowledges that students, regardless of their class or grade, can be
working towards syllabus outcomes anywhere along the learning
continuum.
100
• Is balanced, comprehensive and varied
Effective and informative assessment practice involves teachers
using a variety of assessment strategies that give students multiple
opportunities, in varying contexts, to demonstrate what they know,
understand and can do in relation to the syllabus outcomes.
Effective and informative reporting of student achievement takes a
number of forms including traditional reporting, student profiles,
Basic Skills Tests, parent and student interviews, annotations on
student work, comments in workBooks, portfolios, certificates and
awards.
• Is valid
Assessment strategies should accurately and appropriately assess
clearly defined aspects of student achievement. If a strategy does
not accurately assess what it is designed to assess, then its use is
misleading.
Valid assessment strategies are those that reflect the actual
intention of teaching and learning activities, based on syllabus
outcomes.
Where values and attitudes are expressed in syllabus outcomes,
these too should be assessed as part of student learning.
• Is fair
Effective and informative assessment strategies are designed to
ensure equal opportunity for success regardless of students' age,
gender, physical or other disability, culture, background language,
socio-economic status or geographic location.
• Engages the learner
Effective and informative assessment practice is student centred.
Ideally there is a cooperative interaction between teacher and
students, and among the students themselves.
The syllabus outcomes and the assessment processes to be used
should be made explicit to students. Students should participate in
the negotiation of learning tasks and actively monitor and reflect
upon their achievements and progress.
• Values teacher judgement
Good assessment practice involves teachers making judgements,
on the weight of assessment evidence, about student progress
towards the achievement of outcomes.
101
Teachers can be confident a student has achieved an outcome
when the student has successfully demonstrated that outcome a
number of times, and in varying contexts.
The reliability of teacher judgement is enhanced when teachers
cooperatively develop a shared understanding of what constitutes
achievement of an outcome. This is developed through cooperative
programming and discussing samples of student work and
achievements within and between schools. Teacher judgement
based on well defined standards is a valuable and rich form of
student assessment.
• Is time efficient and manageable
Effective and informative assessment practice is time efficient and
supports teaching and learning by providing constructive feedback to
the teacher and student that will guide further learning.
Teachers need to plan carefully the timing, frequency and nature of
their assessment strategies. Good planning ensures that assessment
and reporting is manageable and maximises the usefulness of the
strategies selected (for example, by addressing several outcomes in
one assessment task).
• Recognises individual achievement and progress
Effective and informative assessment practice acknowledges that
students are individuals who develop differently. All students must be
given appropriate opportunities to demonstrate achievement.
Effective and informative assessment and reporting practice is
sensitive to the self esteem and general well-being of students,
providing honest and constructive feedback.
Values and attitudes outcomes are an important part of learning that
should be assessed and reported. They are distinct from knowledge,
understanding and skill outcomes.
• Involves a whole school approach
An effective and informative assessment and reporting policy is
developed through a planned and coordinated whole school approach.
Decisions about assessment and reporting cannot be taken
independently of issues relating to curriculum, class groupings,
timetabling, programming and resource allocation.
• Actively involves parents
Schools and their communities are responsible for jointly developing
assessment and reporting practices and policies according to their local
102
needs and expectations.
Schools should ensure full and informed participation by parents in the
continuing development and review of the school policy on reporting
processes.
• Conveys meaningful and useful information
Reporting of student achievement serves a number of purposes, for a
variety of audiences. Students, parents, teachers, other schools and
employers are potential audiences. Schools can use student
achievement information at a number of levels including individual, class,
grade or school. This information helps identify students for targeted
intervention and can inform school improvement programs. The form of
the report must clearly serve its intended purpose and audience.
Effective and informative reporting acknowledges that students can be
demonstrating progress and achievement of syllabus outcomes across
stages, not just within stages.
Good reporting practice takes into account the expectations of the
school community and system requirements, particularly the need for
information about standards that will enable parents to know how their
children are progressing.
Student achievement and progress can be reported by comparing
students' work against a standards framework of syllabus outcomes,
comparing their prior and current learning achievements, or comparing
their achievements to those of other students. Reporting can involve a
combination of these methods. It is important for schools and parents to
explore which methods of reporting will provide the most meaningful and
useful information.
10.0 SYNOPSIS
Topic 10 focuses on the issues and concerns related to assessment in the
Malaysian primary schools. It will look at how assessment is viewed and used
in Malaysia.
103
10.1 LEARNING OUTCOMES
By the end of Topic 10, teachers will be able to:
104
CONTENT
SESSION TEN (3 hours)
105
century, and increased public and parental expectations of
education policy. Over the
course of 11 months, the Ministry drew on many sources of
input, from education experts at UNESCO, World Bank,
OECD, and six local universities, to principals, teachers,
parents, and students from every state in Malaysia. The
result is a preliminary Blueprint
that evaluates the performance of Malaysia’s education
system against historical starting points and international
benchmarks. The Blueprint also offers a vision of the
education system and students that Malaysia both needs and
deserves, and suggests
11 strategic and operational shifts that would be required to
achieve that vision. The Ministry hopes that this effort will
inform the national discussion on how to fundamentally
transform Malaysia’s education system, and will seek
feedback from across
the community on this preliminary effort before finalising the
Blueprint in December 2012.”
106
▪ School assessment refers to written tests that assess subject
learning. The test questions and marking schemes are developed,
administered, scored, and reported by school teachers based on
guidance from LP;
▪ Central assessment refers to written tests, project work, or
oral tests (for languages) that assess subject learning. LP develops
the test questions and marking schemes. The tests are, however,
administered and marked by school teachers;
▪ Psychometric assessment refers to aptitude tests and a
personality inventory to assess students’ skills, interests, aptitude,
attitude and personality. Aptitude tests are used to assess students’
innate and acquired abilities, for example in thinking and problem
solving. The personality inventory is used to identify key traits and
characteristics that make up the students’ personality. LP develops
these instruments and provides guidelines for use. Schools are,
however, not required to comply with these guidelines; and
▪ Physical, sports, and co-curricular activities assessment
refers to assessments of student performance and participation
in physical and health education, sports, uniformed bodies, clubs,
and other non-school sponsored activities. Schools are given the
flexibility to determine how this component will be assessed.
107
most subjects assessed through thenational examination, and some subjects
through a combination of examinations and centralised assessments.
Knowledge
Learning objectives at this level: know common terms, know specific facts,
know methods and procedures, know basic concepts, know principles.
Question verbs: Define, list, state, identify, label, name, who? when? where?
what?
Comprehension
The ability to grasp the meaning of material. Translating material from one
form to another (words to numbers), interpreting material (explaining or
summarizing), estimating future trends (predicting consequences or effects).
Goes one step beyond the simple remembering of material, and represent the
lowest level of understanding.
108
Application
The ability to use learned material in new and concrete situations. Applying
rules, methods, concepts, principles, laws, and theories. Learning outcomes
in this area require a higher level of understanding than those under
comprehension.
Question verbs: How could x be used to y? How would you show, make use
of, modify, demonstrate, solve, or apply x to conditions y?
Analysis
The ability to break down material into its component parts. Identifying parts,
analysis of relationships between parts, recognition of the organizational
principles involved. Learning outcomes here represent a higher intellectual
level than comprehension and application because they require an
understanding of both the content and the structural form of the material.
Synthesis
The ability to put parts together to form a new whole. This may involve the
production of a unique communication (theme or speech), a plan of
operations (research proposal), or a set of abstract relations (scheme for
classifying information). Learning outcomes in this area stress creative
behaviors, with major emphasis on the formulation of new patterns or
structure.
Learning objectives at this level: write a well organized paper, give a well
organized speech, write a creative short story (or poem or music), propose a
plan for an experiment, integrate learning from different areas into a plan for
109
solving a problem, formulate a new scheme for classifying objects (or events,
or ideas).
Evaluation
The ability to judge the value of material (statement, novel, poem, research
report) for a given purpose. The judgments are to be based on definite
criteria, which may be internal (organization) or external (relevance to the
purpose). The student may determine the criteria or be given them. Learning
outcomes in this area are highest in the cognitive hierarchy because they
contain elements of all the other categories, plus conscious value judgments
based on clearly defined criteria.
112
Alternative assessment brings together with it a complete set of
perspectives that contrast against traditional tests and assessments.
Table 10.1 illustrates some of the major differences between traditional
and alternative assessments.
Summative Formative
Intrusive Integrated
Judgmental Developmental
• Physical demonstration
• Pictorial products
• Reading response logs
• K-W-L (what I know/what I want to know/what I’ve learned) charts
• Dialogue journals
• Checklists
• Teacher-pupils conferences
114
• Interviews
• Performace tasks
• Portfolios
• Self assessment
• Peer assessment
Portfolios
116
peer assessment are especially found in formative stages of
assessment in which the development of the students’ abilities
are emphasised.
Self appraisals are also thought to be quite accurate and are said
to increase student motivation. Puhl (1997), describes a case
study in which she believes self-assessment forced the students
to reread and thereby make necessary editing and corrections to
their essays before they handed them in. Nevertheless, in order
for self assessment to be useful and not a futile exercise, the
learners need to be trained and initially guided in performing their
self assessment. This training involves providing students with
the rationale for self assessment and how it is intended to work
and how it is capable of helping them.
3. I have difficulty with some questions, but I generally get the meaning
EXERCISE
In your opinion, what are the advantages of using portfolios as
a form of alternative assessment?
118
REFERENCES
Biggs, J. B., & Collis, K .F. (1991) Multimodal learning and the quality
of intelligent behaviour. In: H. Rowe (Ed.) Intelligence:
Reconceptualization and measurement. Hillsdale, NJ: Lawrence
Erlbaum. pp. 57-75.
120
Carroll, J. B., & Sapon, S. M. (1958). Modern Language Aptitude
Test. New York, NY: The Psychological Corporation.
Davies, A., Brown, A., Elder, C., Hill, K., Lumley, T. and
McNamara, T. (1999). Dictionary of language testing. Cambridge:
University ofCambridge Local Examinations Syndicate and
Cambridge University Press.
Moseley, D., Baumfield, V., Elliott, J., Gregson, M., Higgins, S.,
Miller, J., & Newton, D. (2005).Frameworks for Thinking: A
handbook for teaching and learning. Cambridge: Cambridge
University Press.
122
Instruction, 19(3), pp. 259– 271. Available
at: http://linkinghub.elsevier.com/retrieve/pii/S0959475208000558
(Retrieved March 26, 2013).
Smith, T.W. & Colby, S.A. (2007). Teaching for Deep Learning.
The Clearing House. 80 (5) pp. 205–211.
Stansfield, C., & Reed, D. (2004). The story behind the Modern
Language Aptitude Test: An interview with John B. Carrol
(1916-2003). Language Assessment Quarterly, 1, pp.43-56.
Websites
http://www.catforms.com/pages/Introduction-to-Test-Items.html
(Retrieved 9.8.2013)
123
http://myenglishpages.com/blog/summative-formative-
assessment/ - (Retrieved 10.8.2013)
http://www.teachingenglish.org.uk/knowledge-database/objective-
test - (Retrieved 12.8.2013)
http://assessment.tki.org.nz/Using-evidence-for
learning/Concepts/Concept/Reliability-and-validity
NAMA KELAYAKAN
NURLIZA BT OTHMAN KELULUSAN:
othmannurliza@yahoo.com
• M.A TESL University of North Texas, USA
• B.A (Hons) English North Texas State University, USA
• Sijil Latihan Perguruan Guru Siswazah (Kementerian
Pelajaran Malaysia)
PENGALAMAN KERJA
• 4 tahun sebagai guru di sekolah menengah
• 21 tahun sebagai pensyarah di IPG
KELULUSAN
ANG CHWEE PIN • M.Ed.TESL Universiti Teknologi Malaysia
chweepin819@yahoo.com
• B.Ed. (Hons.) Agri. Science/TESL, Universiti Pertanian
Malaysia
PENGALAMAN KERJA
• 23 tahun sebagai guru di sekolah menengah
• 7 tahun sebagai pensyarah di IPG
124
125