Sie sind auf Seite 1von 20

Report on Test and Item analysis

By: Thandi Rakgoale


Table of contents:

Number Heading Page

List of tables ii
List of figures iii
1 Introduction 1
2 Purpose of the Report 1
3 Test Analysis 1
3.1 Descriptive statistics 1-2

3.2 Frequency Graphs 2

3.2.1 Histogram 3-4

3.2.2 Frequency polygon 5


3.2.
3 Ogive 5-6
3.3 Test Reliability 7
4 Item analysis 7
4.1 Difficulty Index 8
4.2 Discrimination Index (D) 9
5 Conclusion 10
6 References 11
7 Appendices 12-16

i
List of tables

Table Caption Page


1 Measures of Central tendency and Standard deviation 2
2 Scores, range and intervals 3
Cumulative frequency distribution 4
4 Reliability coefficient of a test 6
5 Interpretation of the difficulty level of the questions 8
6 Discrimination Index (D) 6

ii
List of figures

Figure Caption Page


1 Normal distribution curve 1
2 Histogram 4
3 Frequency polygon 5
4 Ogive 5

iii
1. Introduction

Tests are administered for different reasons but the main one is to
inform decisions. Tests compilers should therefore always strive to
compile tests that are relevant to their objectives.
Statistical analysis of tests and test items makes this possible in the
following way:
The internal consistency and reliability of tests can be checked and
analysed by using the statistical measures of central tendency.
Tests and test items can also be analysed by using the difficulty and
discrimination indices.

This report is based on the data taken from a multiple choice


question test which contained twenty items (questions) with four
options each. Twenty-five students answered this multiple choice
question test.

2. Purpose of the Report

The purpose of this report is to disseminate information about the


test and test item analysis of a multiple choice type with 20
questions and 4 options, and written by 25 learners.

3. Test Analysis

3.1 Descriptive statistics

Descriptive statistics are merely numbers that describe or


summarizes very large data. They can be written as percentages,
numerals, fractions or decimals. Descriptive statistics are very useful
in quantitative data analysis, which is our focal point.
The following three measures of central tendency are also useful in
analysing data: the mean, median and mode. The standard
deviation will always accompany the mean in the description of a
distribution
A table with the values of each of these measures of central
tendency will follow this brief explanation of what each one of them
entails:
The Mean is the average of the student response to a test or item. It
is calculated by dividing the total test/item scores by the number of
students taking the test.
The Median is the central score in a distribution when all the scores
are put in order.
1.
The Mode is the score that has the most frequency in a frequency.
Standard deviation is the measure of the dispersion of student’s
scores on a test/item. Table 1 below gives the scores obtained from
the given test:

Table 1: Measures of Central tendency and Standard deviation

Description Values
Number of students who took the test 25
Total score 1645
Mean 65.79
Standard deviation 21.90
Mode 65
Median 65

From the table above we can see that the mean = median = mode =
65. Graphically we can represent this by a normal distribution curve
in this way:

Figure 1: Normal distribution curve

This curve indicates a symmetrical distribution of scores in


percentage. One standard deviation above 65 will yield a score of
86.90 that is 34% fall between 65 and 86.90.
2.
Similarly, a standard deviation below 65 will yield a score of 43. ten,
which also mean that 34% of scores are between 65 and 43.10
We can confidently say that 68% of the scores are between 43.10
and 86.90

3.2 Frequency Graphs

A test has the highest as well as the lowest scores. Table 2 below
shows these scores obtained from our multiple choice question
tests. The difference between the highest and the lowest score is
known as the range. We arbitrarily chose the size of our interval to
be 9 and the interval 10 see table below:

Table 2: Scores, range and intervals

H Value 100.00
L Value 15.00
Range 85.00
No. of Scores 25
No. of Interval 10
Size of Interval 9

3.2.1 Histogram

A histogram is a bar graph that is used to represent noncontinuous


data. To draw a histogram one needs to plot the values of the
interval against the values of frequency. Table 3 below shows the
values of the interval and frequency that were obtained by the
twenty-five learners in the twenty multiple choice questions test.
These values were used to draw the frequency polygon see figure. 2
below.

3.

Table 3: Cumulative frequency distribution


Lower Upper Middle Cumulative
Values Values Interval Value Frequency Frequency
15 24 15-24 19.5 1 1
25 34 25-34 29.5 2 3
35 44 35-44 39.5 0 3
45 54 45-54 49.5 4 7
55 64 55-64 59.5 3 10
65 74 65-74 69.5 6 16
75 84 75-84 79.5 1 17
85 94 85-94 89.5 6 23
35 104 95-104 99.5 2 25

Figure 2: Histogram

Frequency Histogram

7
6
5
4
3
Frequency
2
1
0
35-44 55-64 65-74 95-104
15-24 25-34 45-54 75-84 85-94

Interval

From the above figure, we can easily see that most of the scores are
above the mean (65). This means that most students did well in this
test. If this information were to be indicated in a curve, we would
have a negatively skewed graph.

4.
3.2.2 Frequency polygon

The frequency polygon is a line graph. The middle values i.e. (the
sum of the lower and higher values divided by two) see table 3
above for middle values, is plotted against the frequency. Figure 2
below is a frequency polygon with two peaks at a frequency of 6 and
middle values of 69.5 and 89.5 respectively.

Figure 3: Frequency polygon

Frequency Polygon
7
6
69.5 89.5
5
4
49.5
3
59.5
2 99.5
Frequency 29.5
1
19.5 79.5
0
39.5
19.5 29.5 39.5 49.5 59.5 69.5 79.5 89.5 99.5
Middle values

3.2.3 Ogive

An Ogive is a cumulative frequency polygon that represents data in


the form of percentages in most cases. The scores of the upper limit
are plotted against the cumulative frequency table 3 shows the
scores. It is usually s shaped and it is useful when one intends to
estimate percentiles. Figure 3 is a cumulative frequency graph
called an Ogive:

5.
Figure 4: Ogive
Cumulative frequency graph (An Ogive)

30
25
20
15
Cumulative 10
5
0
Frequ
ency 24 34 44 54 64 74 84 94 104

Upper Values

3.3 Test Reliability

Test reliability refers to the extent to which a test is able to produce


consistent scores for individuals who take the test more than once.
The internal consistency of a test is determined by calculating the
reliability coefficient KR20. This coefficient is obtained by dividing k
by k-1 and multiplying the answer by the total of pq divided and then
dividing by (Stedv) 2 where:

K = the number of learners


K–1 = the difference
P = proportion correct
Q = proportion incorrect
2
Stedv = standard deviation squared
Table 4 below gives the reliability coefficient: If the reliability
coefficient is less than 1 then there is no reliability, a value equal or
higher than 1 indicates a perfect reliability.

Table 4: Reliability coefficient of a test

K 20
K-1 19
TOTAL (pq) 3.83 6.
STDEV 21.90 The test reliability coefficient in
(STDEV )2 479.57 this case is 1.04, an indication
K20 1.04 of a perfect reliability.

4. Item analysis
Item analysis is a process whereby students’ responses to test items
are examined to assess the quality and accuracy of such test items.
Items that need to be eliminated or improved are exposed during
this process. Decisions on item analysis are informed by the
discrimination and the difficulty indices.

4.1 Difficulty Index (p)

The difficulty index of a test item is used to determine the learners


who were able to answer questions correctly and those who
answered the questions incorrectly.
It is calculated by dividing the number of students who answered the
item by the number of students who answered the item correctly. A
higher p value means that most of the students answered the item
correctly. An item with a p value < 0.25 is too difficult and it cannot
be accepted. Similarly, an item with a p value > 0.75 is too easy and
cannot be accepted. Items with a p value > 0.25 but < 0.75 are
acceptable. Table 5 below shows the p values of each of the twenty
multiple questions, the comment as well as the reason for each
comment:

7.
Table 5: Interpretation of the difficulty level of the questions

Question #Correct Answered P Comment Reason


Q1 21 25 0.84 Unacceptable Too easy
Q2 22 25 0.88 Unacceptable Too easy
Q3 17 25 0.68 Acceptable Fine
Q4 12 25 0.48 Acceptable Fine
Q5 21 25 0.84 Unacceptable Too easy
Q6 17 25 0.68 Acceptable Fine
Q7 11 25 0.44 Acceptable Fine
Q8 12 23 0.52 Acceptable Fine
Q9 13 25 0.52 Acceptable Fine
Q10 8 24 0.33 Acceptable Fine
Q11 23 25 0.92 Unacceptable Too easy
Q12 19 25 0.76 Unacceptable Too easy
Q13 15 25 0.60 Acceptable Fine
Q14 21 25 0.84 Unacceptable Too easy
Q15 20 25 0.80 Unacceptable Too easy
Q16 22 24 0.92 Unacceptable Too easy
Q17 15 24 0.63 Acceptable Fine
Q18 8 24 0.33 Acceptable Fine
Q19 13 25 0.52 Acceptable Fine
Q20 16 25 0.64 Acceptable Fine

From the above information one can easily see that only twelve
items have a p-value >0.25 but < 0.75 thus these items are good
and they can be retained, the items are q3, q4, q6, q7, q8, q9, q10,
q13, q17, q18, q19 and q20. Items q1, q2, q11, q12, q14, q15, q16
have a p value > 0.75 and are therefore bad questions.

4.2 Discrimination Index (D)

Discrimination index of a test item is used to differentiate between


students who know the content and those who do not know it. It is
calculated by dividing the difference between the scores in the upper
limit and the lower limit by the highest score of the two groups. A
positive discrimination value means that the item in question can be
kept. Three types of discrimination indices are:

Positive discrimination index applies to a point when students who


did well in a test answered an item correctly more than those who
did not do well.
8.
Negative discrimination index on the other hand is when students
who did not do well in a test answer an item correctly more than
those who did well.

Zero discrimination indexes applies when students who did well in a


test answer an item correctly in equal frequency with those who did
not do well. Table 7 illustrates the discrimination values of the twenty
items in the test in this way:

Table 7: Discrimination Index (D)

Question #Upper Limit #Lower Limit D


Q1 15 6 0.60
Q2 15 7 0.53
Q3 14 3 0.73
Q4 8 4 0.27
Q5 15 6 0.60
Q6 12 5 0.47
Q7 9 2 0.47
Q8 10 2 0.53
Q9 10 3 0.47
Q10 8 0 0.53
Q11 14 9 0.33
Q12 14 5 0.60
Q13 12 3 0.60
Q14 15 6 0.60
Q15 14 6 0.53
Q16 15 7 0.53
Q17 12 3 0.60
Q18 5 3 0.13
Q19 12 1 0.73
Q20 11 5 0.40

All the values of discrimination index in the above table is positive,


this implies that the test items are discriminating learners well.

9.

5. Conclusion

The scores in this test are distributed normally as all the measures
of central tendency are equal. This implies that most of the test
items are meeting the objectives. Eight out of twenty questions viz
(q1, q2, q5, q11, q12, q14, q15 and q14) need improvement or
they should be eliminated as they are too easy, the value of the
difficulty index is very high although these items discriminate
positively (the values of the discrimination indices are positive).

The reliability coefficient of the test is above 1, this implies that the
test is consistent. If the above-mentioned eight questions are
improved, the test will meet the educator’s objectives perfectly.

10.
6. References

1. Kubiszyn, T., & Borich, G., (2007). Educational Testing and


Measurement:
Classroom application and practice. Eighth Edition. USA:
Wiley/Jossey-Bass Education.
2. Interpreting Item Analysis.
http://www.uleth.ca/edu/runte/tests/iteman/interp/interp.html
Retrieved 18 September 2007

11.
7. Appendices

Appendix A: Recoded scores

KEY C B D D B C D A
STUDENT
No. Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8
1 C B B A C D A
2 C B D D B D A A
3 C B D D B C D A
4 C B D B B C B A
5 C B D C B C B A
6 C A D D C C A D
7 B B A B B C B B
8 C B D B B C B D
9 C B D A B C D D
10 C B B A B C D C
11 C B D D B C D A
12 C B D D B C D D
13 C B D A B C D A
14 C B D A B C D A
15 C B D D B B A A
16 C B D D B C D A
17 B B C C B A D D
18 C B B D B A D D
19 D C A D B A B A
20 C B D D B C D A
21 C A D D C C A D
22 B B A B B C B B
23 C B D B B C B D
24 C B B A C D A
25 C B D D B D A A

12.
C B A C B D A A C D
Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18
D D A D A A A A C B
C B A C B D A A C D
C B A C B D A A C B
C B A C A D C A C B
C D A C B D A A A B
C D A C A D A A A B
D D A C B D C A A D
B C A C B D A A C A
B D A C B D A A C B
D C A B A D D A C D
C B A C B D A A C D
D A A C A D A A C B
C B A C B D A A A B
C B A C B D A A A
B D A C D A A C B B
C B A C B D A A C D
C A D B D A C
D D A C A D A A C B
D C C D A A D B B B
C A C D B D A A C D
C D A C A D A A A B
D D A C B D C A A D
B C A C B D A A C A
D D A D A A A A C B
C B A C B D A A C D

13.
B C
Q19 Q20
D B
B C
D C
C C
B C
D C
D C
B A
D A
B C
B C
B D
B C
B C
D D
B C
A D
B C
A B
B C
D C
D C
B A
D B
B C

14.
Appendix B: Test Reliability

Question #Correct #Incorrect Prop Prop pq


Correct Incorrect
(q) (p)
Q1 21 4 0.16 0.84 0.13
Q2 22 3 0.12 0.88 0.11
Q3 17 8 0.32 0.68 0.22
Q4 12 13 0.52 0.48 0.25
Q5 21 4 0.16 0.84 0.13
Q6 17 8 0.32 0.68 0.22
Q7 11 14 0.56 0.44 0.25
Q8 12 11 0.48 0.52 0.25
Q9 13 12 0.48 0.52 0.25
Q10 8 16 0.67 0.33 0.22
Q11 23 2 0.08 0.92 0.07
Q12 19 6 0.24 0.76 0.18
Q13 15 10 0.40 0.60 0.24
Q14 21 4 0.16 0.84 0.13
Q15 20 5 0.20 0.80 0.16
Q16 22 2 0.08 0.92 0.08
Q17 15 9 0.38 0.63 0.23
Q18 8 16 0.67 0.33 0.22
Q19 13 2 0.48 0.52 0.36
Q20 16 9 0.36 0.64 0.23
Total pq 3.83

15.
Appendix C: Number of students in Upper and Lower group

Number of students No. in lower group No. in upper group


25 10 15
16.

Das könnte Ihnen auch gefallen