Item analysis quality assessment tool

Item Analysis
By
Dr. Moawia Ahmed Elbadri

Item Analysis
By
Dr. Moawia Ahmed

Item Analysis
Item analysis provides a way of

measuring the quality of questions -
seeing how appropriate they were for
the respondents how well they
measured their ability.
Item Analysis
Item analysisis a process which

examines student responses to
individual test items(questions) in
order to assess the quality of
thoseitemsand of the test as a
whole.
Parts of an A-type MCQ Item w e r
(1-Best Answer (OBA))

s
st An
1 -Be
r f ect
ar-p e m
Ne Q Ite
MC
The whole thing is called an Item
Purpose of Item
Analysis
Evaluates the quality of each item.

Rationale: the quality of items
determines the quality of test (i.e.,
reliability & validity)
Reliability and Validity
Reliability
"consistency" or "repeatability" of your
measures.
is the degree to which an

assessment tool produces stable and
consistent results.
Reliability
Test-retest reliability
is a measure of reliability obtained by
administering the same test twice over a
period of time to a group of individuals.
Reliability
inter-rater reliability:
inter-rateragreement
is the degree of agreement
amongraters.
Validity
Validity:
Test validity is the extent to which a test

accurately measures what it is supposed
to measure.
Reliability and Validity
You want your test to be both reliable and

valid
Item Analysis
Item Analysis information can tell us
if an item (i.e. the question) was too easy
or too difficult
how well it discriminated between high

and low scorers on the test
whether all of the alternatives(distractors)

functioned as intended
Item Analysis
Item Analysis information can tell us
if an item (i.e. the question) was too easy or
too difficult (item difficulty)
how well it discriminated between high and

low scorers on the test (item
discrimination)
whether all of the alternatives(distractors)

functioned as intended (distractor analysis)
Item Analysis
Difficulty
Discrimination
Reliability
Distractor analysis
Item
Difficulty
difficulty level
(p or percentage passing)
Item Difficulty
In item analysis, the first

step is to find out the
difficulty value of the item
or the index of difficulty of
an item.
Item Difficulty(p value)
Definition:
measure of the proportion of students
who answered a test item correctly
Item Difficulty(p value)
Item Difficulty
The item difficulty is usually expressed

in
Value or percentage%
Interpretation of Item
Difficulty
For medical school tests where there is
an emphasis on mastery, MOST items
should have a p-value (0.31
0.69)
.
Interpretation of the Difficulty Index
Diff. index Interpretation
0.20 Very difficult (should be revised)
0.21 0.30 Difficult (retained in the Q. bank)
0.31 0.69 Average (retained in the Q. bank)
0.70 0.80 Easy (revised before re-use)
0.81 Very easy (discarded or carefully reviewed)

Optimum Difficulty*
*corrected for guessing
0.7
5 True-False
0.6
7 MCQ 3 alternatives
0.62
0.6
0.5
0 Essay test
Examples
Number of students who answered each item
= 50
Item No. Correct % Correct Difficulty
No. Answers Level
1 15 30 ???
2 25 50 ???
3 35 70 ???
4 45 90 ???
Discrimination Index (d)
Discrimination Index
distinguishes for each item between
the performance of students who did
well on the exam and students who
did poorly.
Discrimination index
refers to the degree to which success or
failure of an item indicates possession of
the achievement being measured.
Formula: Item
Discrimination
Student Total Score Questions
(%)
Q -I Q-II Q-III
Ram 92 1 0 1
Swetha 90 0 0 1
Ajmal 85 1 0 1
John 80 1 0 1
Prabhu 78 1 0 1
Rajesh 70 1 1 0
Asif 65 1 1 1
Manish 55 1 0 0
Sam 48 1 1 0
Manu 45 0 0 0
Item # Correct # Correct Difficulty Discriminat

(Upper Gr) (Lower Gr) Ind n Ind
Q1 4 4 ? ?
Q2 0 3 ? ?
Q3 5 1 ? ?
D = U - L
U = # in the upper group correct response

Total # in upper group
L = # in the lower group correct response

Total # in lower group
The higher the value of D, the more adequately

the item discriminates (The highest value is 1.0)
D = U - L
The higher the value of D, the more

adequately the item discriminates
(The highest value is 1.0)

Interpretation of the
Disc. index Interpretation
< 0.2 Poorly discriminating

(discarded or reviewed carefully before
re-use).
0.0 Not discriminating (Reject Q)
< 0.0 Badly discriminating (Reject

Q) , check the key answer first.
Examples
Item Number of Correct Answers Item

No. in Group Discrimination
Upper 1/4 Lower 1/4 Index
0.7
0.1
1 90 20
2 80 70
1
3 100 0 0
4 100 100
0
5 50 50
6 20 60
-0.4
Number of students per group = 100

Item Upper Lower Discrimination
Difficulty Remarks
no. 27% 27% Index
Index
1 14 12 0.81 0.13 Revised

2 10 6 0.50 0.25 Retained
3 11 7 0.56 0.25 Retained
4 9 2 0.34 0.44 Retained

1 Retained
5 6 0.56 0.38
2
6 14 -0.50 Rejected
6 0.63
7 13 4 0.53 0.56 Retained
8 3 10 0.41 -0.44 Rejected
9 13 12 0.78 0.06 Rejected

10 8 0.44 0.13 Revised
6
No. of pupils tested- 60
Reliability
Reliability
Test analysis
Test reliability is a measure of the
accuracy, consistency, or precision of the
test scores.
Reliability Coefficient
This is a measure of the likelihood of
obtaining similar results if you re-administer
the exam to another group of similar students.
1.Kuder- Richardson 20
2.Kuder-Richardson 21
3.Cronbach alpha
The most useful measure is generally the

Kuder-Richardson Formula 20 (KR-20)
Reliability coefficients
Kuder-Richardson Formula (KR-
20) ----specific for MCQs
Kuder-Richardson Formula
(KR-20)
Measures inter-item consistency or
how well your exam measures a
single construct.
Internal consistency reliability

is a measure of how well the items
on the test measure the same
construct or idea
Internal consistency
reliability
Internal consistency reliability
Kuder-Richardson Formula (KR-20)
Ranges from 0-1 (the higher the better)
>0.5------------------ >good on a teacher-made test
Best when a test measures a unified body of
content
Lots of very difficult items or poorly written
items can skew this
The higher the variability in
scores, the higher the reliability
Interpreting KR-20
KR-20 statistic is influenced by
Number of test items on the exam
Student performance on each item
Variance for the set of student test
scores
Range: 0.00 1.00
Values near 0.00 weak relationship
among items on the test
Values near 1.00 strong relationship
among items on the test
Interpretation of KR-20
(reliability)
KR-20 Interpretation
0.9 Excellent reliability; at the level of the best standardized test
0.71 - 0.89 Very good for a classroom test
0.61 0.70 Good ; in the range of most. Probably few items could be
improved.
0.51 0.60 Low reliability ; test revision and supplementation
< 50 Questionable reliability

Distractor
Analysis
Distractor Analysis
Distractor analysis is an extension of item
analysis, using techniques that are similar
to item difficulty and item discrimination.
Distractor efficiency:
deals with the way a distractor lures test
takers, especially those with low abilities.
Any distractor not picked up by at
least 5% of the students
=
NOT good
distractor
Distractor Analysis
Distractor Analysis
An ideal item
is one that all students in the upper group
answer correctly and all students in the lower
group answer wrongly.
the responses of the lower group have to be

evenly distributed among the incorrect
alternatives.
Distractor Analysis
the ideal situation would be for each
of the 4 distractors to be selected
by an equal number of all students
who did not get the answer
correct
Distractor Analysis&
item discrimination
the item discrimination formula
can also be used in distractor
analysis.
The concept of upper groups &

lower groups would still remain,
but the analysis and expectation
would differ.
Distractor Analysis&
item discrimination
Instead of expecting a + value, we should
logically expect a - value as more students
from the lower group should select
distractors.
Each distractor can have its own item

discrimination value in order to analyse
how the distractors work and ultimately
refine the effectiveness of the test item itself.
Item Analysis
Distractor Analysis
Distractor A Distractor B Distractor C Distractor D
Item 1 8 3 1 0
Item 2 2 8 2 0
Item 3 4 8 0 0
Item 4 1 3 8 0
Item 5 5 0 0 7
Distracter Analysis: Examples
Item 1 A* B C D E Omit
% of students in upper 20 5 0 0 0 0
% of students in the middle 15 10 10 10 5 0
% of students in lower 5 5 5 10 0 0
Item 2 A B C D* E Omit
% of students in upper 0 5 5 15 0 0
% of students in the middle 0 10 15 5 20 0
% of students in lower 0 5 10 0 10 0
(*) marks the correct answer.

Item analysis quality assessment tool

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Item analysis quality assessment tool

Hochgeladen von

Copyright:

Verfügbare Formate

Item Analysis

Dr. Moawia Ahmed Elbadri

Dr. Moawia Ahmed

Item analysis provides a way of

Item analysisis a process which

(1-Best Answer (OBA))

Evaluates the quality of each item.

is the degree to which an

Test validity is the extent to which a test

You want your test to be both reliable and

how well it discriminated between high

whether all of the alternatives(distractors)

how well it discriminated between high and

whether all of the alternatives(distractors)

In item analysis, the first

The item difficulty is usually expressed

Diff. index Interpretation

0.20 Very difficult (should be revised)

0.21 0.30 Difficult (retained in the Q. bank)

0.31 0.69 Average (retained in the Q. bank)

0.70 0.80 Easy (revised before re-use)

0.81 Very easy (discarded or carefully reviewed)

Item # Correct # Correct Difficulty Discriminat

U = # in the upper group correct response

L = # in the lower group correct response

The higher the value of D, the more adequately

The higher the value of D, the more

(The highest value is 1.0)

< 0.2 Poorly discriminating

0.0 Not discriminating (Reject Q)

< 0.0 Badly discriminating (Reject

Item Number of Correct Answers Item

Number of students per group = 100

1 14 12 0.81 0.13 Revised

4 9 2 0.34 0.44 Retained

9 13 12 0.78 0.06 Rejected

The most useful measure is generally the

Internal consistency reliability

0.9 Excellent reliability; at the level of the best standardized test

0.71 - 0.89 Very good for a classroom test

0.51 0.60 Low reliability ; test revision and supplementation

< 50 Questionable reliability

the responses of the lower group have to be

The concept of upper groups &

Each distractor can have its own item

(*) marks the correct answer.

Das könnte Ihnen auch gefallen