Sie sind auf Seite 1von 14

Test and Item Analysis Report

By
F Rufetu

Presented as a Major Assignment


In
Computer-based Assessment (CIA 722)

September 2007

1
Table of contents

Table of contents i

List of tables ii

List of figures iii

1. Introduction 1

2. Purpose of report 1

3. Test analysis 1
3.1 Descriptive statistics 1
3.2 Frequency graph 1
3.3 Test reliability 3

4. Item analysis 4
4.1 Difficulty index 4
4.2 Discrimination index 5

5. Conclusion 6

References 7

Appendix A 8

Appendix B 9

Appendix C 10

2
List of tables

Table Description

Table 1 Mode, median, mean and standard deviation 1

Table 2 Grouped frequency table 2

Table 3 Cumulative frequency table 2

Table 4 Determining reliability coefficient (KR20) 4

Table 5 Calculation of difficulty index 4

Table 6 Calculation of discrimination index 5

Table 7 Number of students in upper and lower group 6

3
List of figures

Figure Description

Figure 1 Cumulative frequency graph 2

Figure 2 Frequency histogram 3

Figure 3 Frequency polygon 3

4
1. Introduction

This is a report on test and test items analysis using descriptive statistics
(measure of tendency and variability) for a given set of scores. Twenty
five students wrote a multiple choice test containing twenty questions with
four distracters each, (see appendix A).

2. Purpose of report

The purpose of this report is to disseminate information pertaining to test


and item analysis for a given set of scores.

3. Test analysis

Test analysis examines how the items perform as a set. According to


Kubiszyn and Borich (2007), “no test you construct will be perfect”,
meaning it includes invalid or deficient items. This necessitates analysis.

3.1 Descriptive statistics

From the test data (see appendix B), the mode occurs more frequently,
the median is the score that splits a distribution by half, the mean is an
average of a group of scores and standard deviation is the estimate of
variability given by the square root of the sum of (x-Mean)2 over the
number of students.

The mode, median, mean and standard deviation are given in table 1.
The table shows a normal distribution because the mode, median and
mean is the same.

Table 1: Mode, median, mean and standard deviation

Mode Median Mean Standard


deviation
3.2 Frequency graphs
65 65 65.79 21.90
The frequency graphs are determined by having a grouped frequency
table first, given in table 2.

Table 2: Grouped frequency table

H 100

5
L 15
Range 85
Number of
Intervals 10
Size of interval 8.5

The cumulative frequency graph is determined by upper values as x-axis


and cumulative frequency as y-axis. Cumulative frequency table is shown
in table 3.

Table 3: Cumulative frequency table

Lower Upper Middle Cumulative


Limit Limit Value Frequency Frequency
15 24 19.5 1 1
25 34 29.5 2 3
35 44 39.5 0 3
45 54 49.5 4 7
55 64 59.5 3 10
65 74 69.5 6 16
75 84 79.5 1 17
85 94 89.5 6 23
95 104 99.5 2 25

The cumulative frequency graph is given in figure 1. An ‘ogive’ shape is


formed.

Figure 1: Cumulative frequency graph

Cumulative frequency

30
25
Cumulative

20
15
10
5
0
24 34 44 54 64 74 84 94 104
Upper values

The frequency histogram is determined by intervals (lower values)


as x-axis and frequency as y-axis. The frequency histogram is given in
figure 2.

6
Figure 2: Frequency histogram

Frequency histogram

7
6
5
Frequency

4
3
2
1
0
15-24 25-34 35-44 45-54 55-64 65-774 75-84 85-94 95-104
Intervals

The frequency polygon is determined by middle values as x-axis and


frequency as y-axis. The frequency polygon is given in figure 3.

Figure 3: Frequency polygon

Frequency polygon

7
6
5
Frequency

4
3
2
1
0
19.5 29.5 39.5 49.5 59.5 69.5 79.5 89.5 99.5
Middle values

3.3 Test reliability

Reliability coefficient (KR20) is the appropriate index of test reliability for


multiple choice tests. The coefficient is determined by means of a formula
which includes the number of test items (k), student performance on
every item (sum of pq), for pq values (see appendix C) and the
standard deviation squared (stddev2) for the set of student test scores.
The index ranges from 0.00 to 1.00. The larger the number the more
reliable the student scores are. The (KR20) is determined by means of
values given in table 4.

7
Table 4: Determining reliability coefficient (KR20)

k 20
k-1 19
Total pq 3.83
stdev 21.90
stddev2 479.57
KR20 1.04

Reliability coefficient (KR20) =1.04. This is a reliable number because it is


large (almost 1.00). The student scores are reliable.

4. Item analysis

Item analysis can be used to identify items that are deficient in some way
so as to improve or even eliminate them.

Matlock-Hetzel (2007) states that item analysis “investigates the


performance of items considered individually in relation to the remaining
items in the test”.

4.1 Difficulty index

This indicates the proportion of students who answered the item correctly.
The proportion (p) equals number of students with correct answer over
number of students who attempted the item. If p<0.25 it means the item
is too difficult, and if p>0.75 then the item is too easy and therefore
unacceptable.

Calculation and interpretation of difficulty index for each question is given


in table 5.

Table 5: Calculation of difficulty index

Questions #Correct #Answered p Interpretation Reason


Too
q1 21 25 0.84 Unacceptable easy
Too
q2 22 25 0.88 Unacceptable easy
q3 17 25 0.68 Acceptable Fine
q4 12 25 0.48 Acceptable Fine
Too
q5 21 25 0.84 Unacceptable easy
Table 5: Calculation of difficulty index (continued)

8
Questions #Correct #Answered p Interpretation Reason

q6 17 25 0.68 Acceptable Fine


q7 11 25 0.44 Acceptable Fine
q8 12 23 0.52 Acceptable Fine
q9 13 25 0.52 Acceptable Fine
q10 8 24 0.33 Acceptable Fine
Too
q11 23 25 0.92 Unacceptable easy
Too
q12 19 25 0.76 Unacceptable easy
q13 15 25 0.6 Acceptable Fine
Too
q14 21 25 0.84 Unacceptable easy
Too
q15 20 25 0.8 Unacceptable easy

9
Too
q16 22 24 0.92 Unacceptable easy
q17 15 24 0.63 Acceptable Fine
q18 8 24 0.33 Acceptable Fine
q19 13 25 0.52 Acceptable Fine
q20 16 25 0.64 Acceptable Fine

4.2 Discrimination index

According to Special Connections (2007), the discrimination index (D) is a


“basic measure of item’s ability to discriminate between those who scored
high (#u) on the total test and those who scored low (#L)”.
If D value is positive (closer to 1.00) there is a strong relationship
between performance on that item and overall test performance. This
means the discrimination is fine. If D value is negative this suggests poor
validity for an item. The distracters must be looked into.

Calculation and Interpretation of discrimination index for each question is


given in table 6. In this instance all items indicate a positive
discrimination.

Table 6: Calculation of discrimination index

Questions #U #L D Interpretation
q1 15 6 0.60 Fine
q2 15 7 0.53 Fine
q3 14 3 0.73 Fine
q4 8 4 0.27 Fine

Table 6: Calculation of discrimination index (continued)

Questions #U #L D Interpretation
q5 15 6 0.60 Fine
q6 12 5 0.47 Fine
q7 9 2 0.47 Fine
q8 10 2 0.53 Fine
q9 10 3 0.47 Fine
q10 8 0 0.53 Fine
q11 14 9 0.33 Fine
q12 14 5 0.60 Fine
q13 12 3 0.60 Fine
q14 15 6 0.60 Fine
q15 14 6 0.53 Fine
q16 15 7 0.53 Fine
q17 12 3 0.60 Fine
q18 5 3 0.13 Fine

10
q19 12 1 0.73 Fine
q20 11 5 0.40 Fine

The number of students in upper and lower group is the measure of ability
of an item to discriminate among students who have a high score on the
test and those with a low score on the test. It is the difference between
the correct responses in the upper group and of the correct responses in
the lower group. The number of students in upper and lower group is
given in table 7.

Table 7: Number of students in upper and lower group

#Upper 15
#Lower 10

5. Conclusion

In conclusion, since the (KR20) is reliable, while sixty percent of the items
under difficulty index are acceptable and the discrimination index is
positive on all items, the overall test is valid.

Analysis of response options allow educators to fine tune and improve


items they may wish to use again with future classes. If items are too
difficult teachers can adjust the way they teach. The greater the number
of plausible distracters, the more accurate, valid and reliable the test
becomes.

References

Kuiszyn, T. and Borich, G. (2007). Educational Testing and Measurement:


Classroom Application and Practice, p (204-326). Eighth edition. John
Wiley & Sons, INC. USA.

Matlock-Hetzel, S. (2007). Basic Concepts in Item and Test Analysis. Texas


A & M University. Retrieved October 02 2007, from
http://ericae.net/ft/tamu/Espy.htm

Special Connections. (2007). Retrieved October 02 2007, from


http://www.Specialconnections.ku.edu/cgi-
bin/cgiwrap/cpecconn/print.php?path=page/ass..

11
Appendix A

Key C B D D B C D A C B A C B D A A C D B C
St
No Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20
1 C B B A C D A D D A D A A A A C B D B
2 C B D D B D A A C B A C B D A A C D B C
3 C B D D B C D A C B A C B D A A C B D C
4 C B D B B C B A C B A C A D C A C B C C
5 C B D C B C B A C D A C B D A A A B B C
6 C A D D C C A D C D A C A D A A A B D C
7 B B A B B C B B D D A C B D C A A D D C
8 C B D B B C B D B C A C B D A A C A B A
9 C B D A B C D D B D A C B D A A C B D A
10 C B B A B C D C D C A B A D D A C D B C
11 C B D D B C D A C B A C B D A A C D B C
12 C B D D B C D D D A A C A D A A C B B D
13 C B D A B C D A C B A C B D A A A B B C
14 C B D A B C D A C B A C B D A A A B C
15 C B D D B B A A B D A C D A A C B B D D
16 C B D D B C D A C B A C B D A A C D B C
17 B B C C B A D D C A D B D A C A D
18 C B B D B A D D D D A C A D A A C B B C
19 D C A D B A B A D C C D A A D B B B A B
20 C B D D B C D A C A C D B D A A C D B C
21 C A D D C C A D C D A C A D A A A B D C
22 B B A B B C B B D D A C B D C A A D D C
23 C B D B B C B D B C A C B D A A C A B A
24 C B B A C D A D D A D A A A A C B D B
25 C B D D B D A A C B A C B D A A C D B C

12
Appendix B

x Group x-Mean (x-Mean)2


100.00 U 34.21 1170.49
100.00 U 34.21 1170.49
90.00 U 24.21 586.24
90.00 U 24.21 586.24
90.00 U 24.21 586.24
89.47 U 23.69 561.03
85.00 U 19.21 369.12
85.00 U 19.21 369.12
75.00 U 9.21 84.87
70.00 U 4.21 17.74
70.00 U 4.21 17.74
65.00 U -0.79 0.62
65.00 U -0.79 0.62
65.00 U -0.79 0.62
65.00 U -0.79 0.62
60.00 L -5.79 33.50
55.00 L -10.79 116.37
55.00 L -10.79 116.37
50.00 L -15.79 249.25
50.00 L -15.79 249.25
47.06 L -18.73 350.77
45.00 L -20.79 432.12
31.58 L -34.21 1170.23
31.58 L -34.21 1170.23
15.00 L -50.79 2579.38

13
Appendix C

Pro Pro
correct incorrect
Question #Correct #Answered (p) (q) pq
q1 21 25 0.84 0.16 0.13
q2 22 25 0.88 0.12 0.11
q3 17 25 0.68 0.32 0.22
q4 12 25 0.48 0.52 0.25
q5 21 25 0.84 0.16 0.13
q6 17 25 0.68 0.32 0.22
q7 11 25 0.44 0.56 0.25
q8 12 23 0.52 0.48 0.25
q9 13 25 0.52 0.48 0.25
q10 8 24 0.33 0.67 0.22
q11 23 25 0.92 0.08 0.07
q12 19 25 0.76 0.24 0.18
q13 15 25 0.6 0.4 0.24
q14 21 25 0.84 0.16 0.13
q15 20 25 0.8 0.2 0.16
q16 22 24 0.92 0.08 0.08
q17 15 24 0.63 0.38 0.23
q18 8 24 0.33 0.67 0.22
q19 13 25 0.52 0.48 0.25
q20 16 25 0.64 0.36 0.23

14

Das könnte Ihnen auch gefallen