You are on page 1of 9

1.

Introduction

Teachers let their student write tests to provide objective information,


which they combine with their subjective, commonsense impressions
to make better educational decisions when assessing their students.
The most accurate measurement is combining objective measurement
data and subjective impressions.

This assignment will be useful to analyse and assess the quality and
utility of an item. Quantitative item analysis is used for examining the
usefulness of multiple-choice formats, and interpreting the test data.

2. Purpose of the report

The purpose of this report is to disseminate the information on


students’ score data, and to provide reliable feedback on the learners
performance, whether to the parents, department or principal.

The analysis was done on 20 multiple-choice questions, which was


administered to 25 students. The report describes the statistical
analysis which allow measurement of the effectiveness of test items.

3. Test Analysis

3.1.Descriptive test analysis

Descriptive statistics, also known as summary statistics, are simply


numbers, for example, percentages, numerals, and decimals. These
numbers are used to describe or summarise a larger set of numbers.

The numbers refers to the mean, mode, median and the standard
deviation (STDEV). Table 1 describes the statistics.

Mean 65.7876
Mode 65
Median 65
STDEV 23.314
Table 1: Descriptive statistics
The list

The set of 25 scores obtained in the multiple choice sorted in

1
descending order:

100 89.47 70 60 47.06


100 85 65 55 45
90 85 65 55 31.58
90 75 65 50 31.58
90 70 65 50 15
Table 2: List of students scores in descending order

At a glance we can determine the highest score, lowest score,


even the middle score. Listing has helped us organize this set of
scores.

3.2.Graphic representation of test scores

The ideal in statistics is the Normal distribution curve.

Figure 1: Normal distribution curve

3.2.1. Simple Frequency

This approach to tabulating data considers all scores, including


those that are missing. Simple Frequency distribution is not
useful in the average classroom, because they tend to be
lengthy that it is difficult to make sense of the data.

3.2.2. Grouped Frequency

Grouped Frequency table is a variation of the Simple frequency


distribution. Grouped Frequency is much more useful in the
classroom. Range/intervals of scores are used to categorise
data. It becomes apparent that most of the class obtained
scores of 55 % or higher. If we add the numbers in the
frequency column, we can see specifically that 18 of the 25
students in the class scored 55 % or higher.

Since most of the class scored 55 % or higher, one


interpretation the grouped frequency distribution helps us make
is that the test may have been too easy. What is important is

2
that once we construct a grouped frequency distribution, it
quickly becomes apparent that the class did well on the test
(almost too well). Table 3 shows the frequency related to the
test scores.

Highest score (H) 100


Lowest score (L) 15
Number of scores 25
Range 85
Number of intervals 10
Interval width 8.5
Rounded to the odd number 9
Table 3: Grouped frequency table

The highest score obtained in the test item is 100 %, and the
lowest score is 15 %. The range (R) was obtained by
subtracting the lowest score (L) from the highest score (H).
The range of scores for the 25 student is 85. The number of
intervals the teacher/researcher decided on was 10. The size
of the interval (i) is obtained by dividing the range (R) by the
number of intervals (10), and round to the nearest odd
number, of which the interval width was 8,5 ≈ 9.

3.2.3. The Frequency histogram

The histogram (also known as a bar graph), is the type of graph


used most frequently to convey statistical data. Figure 2 is
based on a graphed frequency distribution used to represent
the scores of the 25 students.

Frequency Histogram

4
Frequency

0
15-23 24-32 33-41 42-50 51-59 60-68 69-77 78-86 87-95 96-104
Intervals

Figure 2: Frequency Histogram

The histogram show the distributed data on students’ scores,


based on the multiple-choice test written.

3
3.2.4. The Frequency Polygon

A frequency polygon best used to present continuous data, such


as test scores. The frequency polygon is an alternative way of
representing a grouped frequency distribution. The middle
values were plotted against the frequency and a straight line
were used to connect the middle values (MP) of each interval
rather than bars or columns to show the frequency with which
scores occur. The grouped frequency distribution with
midpoints and the frequency polygon shown in Figure 3
represent the test scores.

Frequency Polygon

4
Frequency

0
19 28 37 46 55 64 73 82 91 100
Middle values

Figure 3: Frequency Polygon

4
3.3 Reliability coefficient

The reliability of a test refers to the consistency with which it


yields the same rank for individuals who take the test more than
once. A test is reliable if it constantly yields the same. The test
scores are free from random errors of measurement. A test
score of 1.00 has a standard error of zero, which means that it is
perfectly reliable.

Reliability Interpretation

.90 and above Excellent reliability; at the level of


the best standardized tests
.80 - .90 Very good for a classroom test
.70 - .80 Good for a classroom test; in the
range of most. There are
probably a few items which could be
improved.
.60 - .70 Somewhat low. This test needs to be
supplemented by other
measures (e.g., more tests) to
determine grades. There are
probably some items which could be
improved.
.50 - .60 Suggests need for revision of test,
unless it is quite short (ten or
fewer items). The test definitely needs
to be supplemented by
other measures (e.g., more tests) for
grading.
.50 or below Questionable reliability. This test
should not contribute heavily to
the course grade, and it needs
revision.

K 20
K-1 19
Variance 499.553
Test Reliability
1.053
Part 1
Test Reliability
-0.006
Part 2
r
(Reliability -0.006
coefficient)
Table 4: Reliability coefficients

5
The reliability coefficient of the test scores of the 20 multiple
choice test item were -0.006, which means that the reliability of
test is questionable.

4. Item analysis

Item analysis is a numerical method for analysing test items


employing student response alternatives or options.

4.1.Difficulty index

The concept refers to the proportion of students who answered


the item correctly, and indicating the difficulty of each test item.
If no student answered a certain test item correctly, then it could
indicate that the test item was too easy, while if all the students
answered a certain question wrong, you can then assume that
the question could have been too difficult. Table 5 shows the
proportion answered, with the averages obtained by the students.

Question # Correct # Answered Proportion


(p)
1. 21 25 0.84
2. 22 25 0.88
3. 17 25 0.68
4. 12 25 0.48
5. 21 25 0.84
6. 17 25 0.68
7. 11 25 0.44
8. 12 23 0.52
9. 13 25 0.52
10. 8 24 0.33
11. 23 25 0.92
12. 19 25 0.76
13. 15 25 0.60
14. 21 25 0.84
15. 20 25 0.80
16. 22 24 0.92
17. 15 24 0.63
18. 8 24 0.33
19. 13 25 0.52
20. 16 25 0.64

6
Table 5: Difficulty index

The difficulty level of each test item is individually analysed.


The multiple choice test item consists of 20 questions.
Seven of the 20 questions were unacceptable, because they
were too easy. The rest (13 questions) were acceptable, as they
were fair questions. Table 6 interprets the difficulty of the
questions.

Question Proportion (p) Interpretation Reason


1. 0.84 Unacceptable Too easy
2. 0.88 Unacceptable Too easy
3. 0.68 Acceptable Fair
4. 0.48 Acceptable Fair
5. 0.84 Unacceptable Too easy
6. 0.68 Acceptable Fair
7. 0.44 Acceptable Fair
8. 0.52 Acceptable Fair

7
9. 0.52 Acceptable Fair
10. 0.33 Acceptable Fair
11. 0.92 Unacceptable Too easy
12. 0.76 Unacceptable Too easy
13. 0.60 Acceptable Fair
14. 0.84 Unacceptable Too easy
15. 0.80 Unacceptable Too easy
16. 0.92 Unacceptable Too easy
17. 0.63 Acceptable Fair
18. 0.33 Acceptable Fair
19. 0.52 Acceptable Fair
20. 0.64 Acceptable Fair
Table 6: Interpretation of the difficulty index

4.2.Discrimination index (D)

Measure of the extent to which a test item discriminates


between high scoring students and low scoring students. When
p-levels (difficulty index) are above 0.75, the item is considered
relatively easy. An average p-level is 50 % . There is confusion
to how high is a “good” discrimination index, but the ideal is that
you want your items to have as high a discrimination index as
possible. It is recommended that any item with a positive
D-value is acceptable. There were 15 students in the upper
level, and 10 in the lower level.

# Correct # Correct Discrimination


Upper level Lower level index (D)
15 6 0.60
15 7 0.53
14 3 0.79
8 4 0.50
15 6 0.60
12 5 0.58
9 2 0.78
10 2 0.80
10 3 0.70
8 0 1.00
14 9 0.36
14 5 0.64
12 3 0.75
8
15 6 0.60
14 6 0.57
15 7 0.53
12 3 0.75
5 3 0.40
12 1 0.92
11 5 0.55
Table 7: Discrimination index

The D values of all the questions were in the positive values. It


means that the high scoring students were able to choose the key
and not the distracters in the multiple choice test item.
General mistakes that students make in multiple choice test item
are:

Miskeying
Guessing
Ambiguity

5. Conclusion

This multiple choice test item was too easy, with questionable
reliability. The test should not contribute heavily to the course
marks, and it needs revision. The analysis was worthwhile, because
it improves the teacher’s skills in test construction and points out
the shortcomings in the students knowledge on the specific work.

References

1. Easton, V.J., McColl, J.H. (1997), Statistics Glossary. Available from:


http://datalib.ed.ac.uk/GRAPHICS/blue_data.gif
[Accessed: 2008-04-04]

2. http://www.stats.gla.ac.uk/steps/glossary/probability_distributions.
html#variance [Accessed: 2008-04-04]

3. Kubiszyn, T., Borich, T. (2007), Educational Testing and


Measurements: Classroom Application and Practice.
(8th edition), NC: John Wiley & Sons, Inc