Sie sind auf Seite 1von 9



Parkay, Anctil, and Hass (2006) have stated that assessment is the process of gathering
data and information to determine pupils’ knowledge, skills and attitudes in a certain area of the
school curriculum. It measures the effectiveness of a teaching and learning process once it is
over. This is because it evaluates what the learners have learned from the lessons and activities
carried out by their teachers at the end of the lesson, topic or a term. This is crucial for the
education system in an innumerable way. To list them out, assessment allows feedback,
improvements and preparation to happen in pupils to lay the groundwork in facing the real
world. This is important as it drives pupils’ learning and it is used to decide on the type of
learning that should take place. In line of this, I have constructed and administered an English
language test paper for Year 4 pupils using a test specification table. This paper will further
analyse the items of the test, interpretation of the results and a report of my findings based on
pupils’ performance database that has been collected.


Kubiszyn & Borich, (2003) emphasized that the significance of the Table of
Specifications is it consists of a two-way chart or grid relating instructional objectives to the
instructional content in which the column of the chart lists the objectives or level of skills to be
addressed while the rows list the key concepts or content the test is to measure. The questions
constructed must be based on what the teachers have taught and what the pupils have learned,
as well as what the teacher wants the pupils to achieve. This Table of Specification is presented
in the form of the Bloom’s Taxonomy concept. Bloom’s Taxonomy is a hierarchical ordering of
cognitive skills that can, among countless other uses, help teachers teach and pupils learn
(Heick, 2018). It encompasses of all the six stages which are remembering, understanding,
applying, analysing, evaluating and creating. In the test constructed, I have included six
objective questions based on grammar, four objective questions and one structured question
based on a comprehension passage, as well as one essay question, all based on the same
theme in the DSKP. The questions are distributed fairly among the six stages to evaluate pupils’
understanding of what they have learned and their achievement. The table below shows the
distribution of questions for the test.
Table 2.0 : Table of Specification

Number of items









No. Topic Total
Simple present
1 tense 4 4 4
2 The infinitive 1 1 1
2 Adjectives 1 1 1
3 Comprehension 4 1 2 1 4
4 question 1 1 1
5 Essay 1 1 1
Total 12 12


This test paper is composed of three major parts which are objective items and
subjective items. The different parts of the items has enabled the teacher to evaluate pupils’
capability. The questions are set from simple to most difficult skills. The test paper is divided into
three sections which are Section A, Section B and Section C.

Section A and Section B contain 11 questions in total, including 10 multiple choice

questions and one short answer question. The levels among the objective questions are
remembering, understanding, applying and understanding. The question based on the
remembering level requires pupils to remember the information gained from the comprehension
passage to answer the question. Next, it is mandatory for pupils to interpret the underlying
meanings of the passage to respond the questions on understanding as the questioning and
answering have been paraphrased. Furthermore, the questions on grammar items falls under
the applying level, where pupils need to use their previous knowledge and information in
another familiar situation. The question on analysing stand in need of pupils to break down
information into parts to explore understandings and relationships to complete the blank space
in the text. The structured question in Section B is the subordinate of the creating level. Pupils
need to generate new ideas, products or ways in viewing things to answer this question.
Section C on the other hand comprises of one essay question. The question is
generated in line with the evaluating level where pupils need to justify a decision or course of
action. All the questions in the test paper are complemented with clear instructions to give a
clear direction to pupils in answering the questions.


The test was conducted in SJK(C) Kota Emerald, Rawang. I have seek approval from
the headmaster of the school along with a letter of permission provided by Institut Pendidikan
Guru Kampus Bahasa Antarabangsa to allow the test to be carried out in a Year 4 class. The
test was administered in class 4 Merah. All 21 pupils of the class took part in the test. The
allocated time for the test is 45 minutes. Pupils are instructed to answer all 10 objective
questions, one structured question and essay question respectively within the time given.

I was even given some time to interview the English language teacher of the class to
have insights of the pupils’ background. The pupils of 4 Merah are of intermediate level, where
there is a fair amount of highly proficient pupils as well as pupils of low proficiency of the
language. This is crucial for the teacher in her lesson planning to cater to all her pupils.


Based on the administration, an item analysis is put together to identify the quality of test
items. McCowan (1999) proclaims that an item analysis is an important and probably the most
important tool to increase test effectiveness. Each items has its contribution to ensure the test
items are of quality and is an effective assessment. It is also important as it is a vital component
for ongoing improvement to occur to a lesson and curriculum level. The discrimination index and
difficulty index of each test items are calculated and recorded in the table below.
Recommendation for the decision making of each test items are also listed in the table.

5.1 Difficulty Index

Table 5.1 : The Difficulty Index of each Test Items

Question Number Difficulty Index Recommendation
1 0.75 Retain
2 0.50 Retain
3 0.58 Retain
4 0.50 Retain
5 0.75 Discard
6 0.58 Retain
7 0.50 Retain
8 0.92 Discard
9 0.33 Retain
10 0.50 Retain
11 0.47 Retain
12 0.46 Retain

According to Wajiha, Saeed and Usman (2018), difficulty value of an item may be
defined as the proportion of certain sample of subjects who actually know the answer of an item.
The value can be calculated as :

𝑈𝑠 + 𝐿𝑠
p = difficulty index

𝑈𝑠 = the number of pupils in the upper 27% who responded correctly

𝐿𝑠 = the number of pupils in the lower 27% who responded correctly

𝑈 = the number of pupils in the upper group

𝐿 = the number of pupils in the upper group

The difficulty index is used to determine the extremity of a question and its relevance to
the pupils’ level. This implies that the higher the values of the difficulty index, the easier the item
is. The item is proven to be an easy one if most pupils answer an item correctly. In contrary, if
most pupils answered an item incorrectly then it should have been a difficult one. The difficulty
index should range between 0.2 and 0.8 and the most ideal index is between 0.5 to 0.8
(Surender, 2014)

From Table 5.1, it is prominent that most of the questions are in the best possible range
where 9 out of 10 objective items fall in the range of 0.50 to 0.8 difficulty index while all
subjective items take place in the same scale as well. Only one objective question displays high
value of difficulty index at 0.92. This measures that the question is too easy for the pupils and is
seen to be a poor item in the test as it is not challenging enough for them. This is one
weaknesses of the test constructed as the distribution of difficulty index for this test paper is not
as ideal as it should be. The teacher’s direction may be diverted in discriminating the pupils
according to their ability. To improve on this aspect, I will need to review the question and also
the pupils’ proficiency levels. This effort will help in determining their capabilities in answering
the questions. Moreover, I can produce a more detailed and precise table of specifications as to
ensure that there is a good range of easy to difficult questions according to the Bloom's
Taxonomy of the cognitive domain. There should be an accurate number of question for each of
the six levels of complexity with the remembering level having the most and the creating level
having the least.

5.2 Discrimination Index

Table 5.2 : The Discrimination Index of each Test Items

Question Number Discrimination Index Recommendation
1 0.50 Retain
2 1.00 Retain
3 0.50 Retain
4 0.67 Retain
5 0.17 Discard
6 0.50 Retain
7 0.33 Scope of
8 0.17 Discard
9 0.00 Discard
10 0.33 Scope of
11 0.67 Retain
12 0.47 Retain
The index of discrimination as the ability of an item on the basis of which the
discrimination is made between superiors and inferiors (Wajiha, Saeed and Usman, 2018). The
value is calculated based on the formula below :

𝑈𝑠 − 𝐿𝑠
0.5(𝑈 + 𝐿)

𝑑 = discrimination index

𝑈𝑠 = the number of pupils in the upper 27% who responded correctly

𝐿𝑠 = the number of pupils in the lower 27% who responded correctly

𝑈 = the number of pupils in the upper group

𝐿 = the number of pupils in the upper group

The numerical value of discrimination may range from -1.00 to 1.00. This is to determine
the quality of items with respect to their discrimination index. If a test item is correctly answered
by upper group pupils and incorrectly by lower group pupils than the item is said to have positive
discrimination power. If a test item is correctly answered by lower group pupils and incorrectly
by upper group pupils than the item is a negative discriminator. Surender (2014) states that an
item with negative discrimination decreases the validity of test and thus must be discarded. If a
test item is answered correctly by equal number of upper and lower group pupils then it is
showing zero discrimination.

As recorded in Table 5.2, there is a fair share of relevant and also irrelevant scale of
discrimination index. It is evident that 7 out of 12 questions are in the ideal index range.
However, the remaining 5 questions are recommended to be discarded and to make
improvements. The quality of the test items that are under the undesired index range are either
average or poor. This proves that these test items are unable to discriminate pupils according to
their achievement levels, making it hard for the teacher to identify pupils’ potentials. Hence,
conducting a pilot test before administering the test papers can help in collecting feedback
regarding the questions so that necessary changes can be made which then improves the
accuracy of the questions in assessing the pupils' cognitive ability.
5.3 Reliability

Table 5.3 : Reliability Coefficient Value

Reliability 0.18

In the opinion of Wells and Wollack (2003), reliability refers to how dependably or
consistently a test measures a characteristic. This, in other words is that reliability can
be defined as something that indicates the stability of a test performance. They also
stated that the exemplary reliability coefficient value should be 0.70 and above. The
reliability of a test can be calculated using the formula below :

𝑛 𝑛−𝑀
𝐾𝑅 21 = [1 − ]
𝑛−1 𝑛(𝑉𝑎𝑟)

𝐾𝑅 21 = reliability

𝑛 = number of test questions

𝑀 = mean score for the test

𝑉𝑎𝑟 = variance for the test

The Kuder-Richardson Formula 21 (KR 21) is the explained as a measure for the
reliability of the test with binary variables (Shevlin, Miles, Davies & Walker, 2000). Table
5.3 visibly exhibits that the test paper is not showing the desired scale of reliability as it
falls way below the value 0.70, which is at 0.18 only. This directly implicates that if the
same test paper is to be done again, he or she might not get a similar test score. Thus,
the test paper is deemed unreliable and unsuitable which needs reviewing and
modification. A test that yields similar scores for a person who repeats the test is said to
measure a characteristic reliably.

5.4 Pearson

Table 5.4 : Pearson Correlation Coefficient

Pearson -0.20
The relationship between the difficulty index and discrimination index for each test item
was determined by Pearson correlation analysis and is reciprocally related (Surender, 2014).
The Pearson correlation coefficient recorded in Table 5.4 shows that there is a negative
relationship between the difficulty index and the discrimination index of the test items. This
negative correlation signifies that as the difficulty index increased, discrimination index also
increase. This suggested that the easier items or too difficult items poorly discriminate between
the high achiever and low achiever pupils. Consequently, after the item analysis of the test
items, five test items are recommended to be improved and to be discarded.


This task has given me the opportunity to have a hands on experience in the process of
creating test questions, as well as analysing the reliability of the questions constructed. I am
now aware that the test questions must be aligned with pupils’ learning needs and differences.
Additionally, it is vital to create questions based on the Bloom’s Taxonomy as it ensures an
appropriate distribution of questions according to topics taught and learned. I am now being
made clear of the significance of a test, whereby it is for educational decision making and a
source of motivation for pupils to learn and achieve their best. Overall, the test paper that I have
constructed is evidently unreliable and incompetent to be carried out. However, I have come to
a realisation that learning is a lifelong journey, from time to time, improvements will come along
my way and it will never stop even when I am a qualified teacher.


To conclude, assessment in education is a vital component for ongoing quality

improvement at a lesson and curriculum level. Therefore, it is crucial for teachers to carefully
prepare a plan for the construction of the test so that objective of the test can be achieved.

Heick, T. (2018) What Is Bloom’s Taxonomy? A Definition for teachers. Retrieved from

Kubiszyn, T. & Borich, G. D., (2003) Educational testing and measurement: Classroom
application and practice. Retrieved from

McCowan, R. J. (1999) Item analysis for criterion-referenced tests. Retrieved from

Parkay, F. W., Anctil, E. & Hass, G. (2006) Curriculum planning: A contemporary approach 8th
Edition. Boston: Allyn and Bacon

Shevlin, M., Miles, J., Davies, M. & Walker, S. (2000) Coefficient alpha: A useful indicator of
reliability?. Retrieved from

Surender Singh Rana (2014). Test item analysis and relationship between difficulty level and
discrimination index of test items in an achievement test in biology. Retrieved from

Wajiha Mahjabeen, Saeed Alam & Usman Hassan (2018) Difficulty index, discrimination index
and distractor efficiency in multiple choice questions. Retrieved from https://www.

Wells, C. S. & Wollack, J. A. (2003). An instructor’s guide to understanding test reliability.

Retrieved from