Introduction

which they combine with their subjective, commonsense impressions

to make better educational decisions when assessing their students.

The most accurate measurement is combining objective measurement

data and subjective impressions.

This assignment will be useful to analyse and assess the quality and

utility of an item. Quantitative item analysis is used for examining the

usefulness of multiple-choice formats, and interpreting the test data.

students’ score data, and to provide reliable feedback on the learners

performance, whether to the parents, department or principal.

administered to 25 students. The report describes the statistical

analysis which allow measurement of the effectiveness of test items.

3. Test Analysis

numbers, for example, percentages, numerals, and decimals. These

numbers are used to describe or summarise a larger set of numbers.

The numbers refers to the mean, mode, median and the standard

deviation (STDEV). Table 1 describes the statistics.

Mean 65.7876

Mode 65

Median 65

STDEV 23.314

Table 1: Descriptive statistics

The list

1

descending order:

100 85 65 55 45

90 85 65 55 31.58

90 75 65 50 31.58

90 70 65 50 15

Table 2: List of students scores in descending order

even the middle score. Listing has helped us organize this set of

scores.

those that are missing. Simple Frequency distribution is not

useful in the average classroom, because they tend to be

lengthy that it is difficult to make sense of the data.

distribution. Grouped Frequency is much more useful in the

classroom. Range/intervals of scores are used to categorise

data. It becomes apparent that most of the class obtained

scores of 55 % or higher. If we add the numbers in the

frequency column, we can see specifically that 18 of the 25

students in the class scored 55 % or higher.

interpretation the grouped frequency distribution helps us make

is that the test may have been too easy. What is important is

2

that once we construct a grouped frequency distribution, it

quickly becomes apparent that the class did well on the test

(almost too well). Table 3 shows the frequency related to the

test scores.

Lowest score (L) 15

Number of scores 25

Range 85

Number of intervals 10

Interval width 8.5

Rounded to the odd number 9

Table 3: Grouped frequency table

The highest score obtained in the test item is 100 %, and the

lowest score is 15 %. The range (R) was obtained by

subtracting the lowest score (L) from the highest score (H).

The range of scores for the 25 student is 85. The number of

intervals the teacher/researcher decided on was 10. The size

of the interval (i) is obtained by dividing the range (R) by the

number of intervals (10), and round to the nearest odd

number, of which the interval width was 8,5 ≈ 9.

used most frequently to convey statistical data. Figure 2 is

based on a graphed frequency distribution used to represent

the scores of the 25 students.

Frequency Histogram

4

Frequency

0

15-23 24-32 33-41 42-50 51-59 60-68 69-77 78-86 87-95 96-104

Intervals

based on the multiple-choice test written.

3

3.2.4. The Frequency Polygon

as test scores. The frequency polygon is an alternative way of

representing a grouped frequency distribution. The middle

values were plotted against the frequency and a straight line

were used to connect the middle values (MP) of each interval

rather than bars or columns to show the frequency with which

scores occur. The grouped frequency distribution with

midpoints and the frequency polygon shown in Figure 3

represent the test scores.

Frequency Polygon

4

Frequency

0

19 28 37 46 55 64 73 82 91 100

Middle values

4

3.3 Reliability coefficient

yields the same rank for individuals who take the test more than

once. A test is reliable if it constantly yields the same. The test

scores are free from random errors of measurement. A test

score of 1.00 has a standard error of zero, which means that it is

perfectly reliable.

Reliability Interpretation

the best standardized tests

.80 - .90 Very good for a classroom test

.70 - .80 Good for a classroom test; in the

range of most. There are

probably a few items which could be

improved.

.60 - .70 Somewhat low. This test needs to be

supplemented by other

measures (e.g., more tests) to

determine grades. There are

probably some items which could be

improved.

.50 - .60 Suggests need for revision of test,

unless it is quite short (ten or

fewer items). The test definitely needs

to be supplemented by

other measures (e.g., more tests) for

grading.

.50 or below Questionable reliability. This test

should not contribute heavily to

the course grade, and it needs

revision.

K 20

K-1 19

Variance 499.553

Test Reliability

1.053

Part 1

Test Reliability

-0.006

Part 2

r

(Reliability -0.006

coefficient)

Table 4: Reliability coefficients

5

The reliability coefficient of the test scores of the 20 multiple

choice test item were -0.006, which means that the reliability of

test is questionable.

4. Item analysis

employing student response alternatives or options.

4.1.Difficulty index

the item correctly, and indicating the difficulty of each test item.

If no student answered a certain test item correctly, then it could

indicate that the test item was too easy, while if all the students

answered a certain question wrong, you can then assume that

the question could have been too difficult. Table 5 shows the

proportion answered, with the averages obtained by the students.

(p)

1. 21 25 0.84

2. 22 25 0.88

3. 17 25 0.68

4. 12 25 0.48

5. 21 25 0.84

6. 17 25 0.68

7. 11 25 0.44

8. 12 23 0.52

9. 13 25 0.52

10. 8 24 0.33

11. 23 25 0.92

12. 19 25 0.76

13. 15 25 0.60

14. 21 25 0.84

15. 20 25 0.80

16. 22 24 0.92

17. 15 24 0.63

18. 8 24 0.33

19. 13 25 0.52

20. 16 25 0.64

6

Table 5: Difficulty index

The multiple choice test item consists of 20 questions.

Seven of the 20 questions were unacceptable, because they

were too easy. The rest (13 questions) were acceptable, as they

were fair questions. Table 6 interprets the difficulty of the

questions.

1. 0.84 Unacceptable Too easy

2. 0.88 Unacceptable Too easy

3. 0.68 Acceptable Fair

4. 0.48 Acceptable Fair

5. 0.84 Unacceptable Too easy

6. 0.68 Acceptable Fair

7. 0.44 Acceptable Fair

8. 0.52 Acceptable Fair

7

9. 0.52 Acceptable Fair

10. 0.33 Acceptable Fair

11. 0.92 Unacceptable Too easy

12. 0.76 Unacceptable Too easy

13. 0.60 Acceptable Fair

14. 0.84 Unacceptable Too easy

15. 0.80 Unacceptable Too easy

16. 0.92 Unacceptable Too easy

17. 0.63 Acceptable Fair

18. 0.33 Acceptable Fair

19. 0.52 Acceptable Fair

20. 0.64 Acceptable Fair

Table 6: Interpretation of the difficulty index

between high scoring students and low scoring students. When

p-levels (difficulty index) are above 0.75, the item is considered

relatively easy. An average p-level is 50 % . There is confusion

to how high is a “good” discrimination index, but the ideal is that

you want your items to have as high a discrimination index as

possible. It is recommended that any item with a positive

D-value is acceptable. There were 15 students in the upper

level, and 10 in the lower level.

Upper level Lower level index (D)

15 6 0.60

15 7 0.53

14 3 0.79

8 4 0.50

15 6 0.60

12 5 0.58

9 2 0.78

10 2 0.80

10 3 0.70

8 0 1.00

14 9 0.36

14 5 0.64

12 3 0.75

8

15 6 0.60

14 6 0.57

15 7 0.53

12 3 0.75

5 3 0.40

12 1 0.92

11 5 0.55

Table 7: Discrimination index

means that the high scoring students were able to choose the key

and not the distracters in the multiple choice test item.

General mistakes that students make in multiple choice test item

are:

Miskeying

Guessing

Ambiguity

5. Conclusion

This multiple choice test item was too easy, with questionable

reliability. The test should not contribute heavily to the course

marks, and it needs revision. The analysis was worthwhile, because

it improves the teacher’s skills in test construction and points out

the shortcomings in the students knowledge on the specific work.

References

http://datalib.ed.ac.uk/GRAPHICS/blue_data.gif

[Accessed: 2008-04-04]

2. http://www.stats.gla.ac.uk/steps/glossary/probability_distributions.

html#variance [Accessed: 2008-04-04]

Measurements: Classroom Application and Practice.

(8th edition), NC: John Wiley & Sons, Inc

