LG675 5

LG675
Session 5: Reliability II
Sophia Skoufaki
sskouf@essex.ac.uk
15/2/2012
What is item analysis?
How can we conduct item analysis for

a) norm-referenced data-collection instruments?
Only statistical analyses provided through SPSS
b) criterion-referenced data-collection measures?
How can we examine the reliability of criterion-reference

d data-collection instruments?
Work with some typical scenarios
Item analysis: definition

The
kind of reliability analysis used to identify

items in a data-collection instrument (e.g., que
stions in a questionnaire, tasks/questions in a
language test) which do not measure the sam
e thing as the other items.
It
is conducted on data from the pilot study. Th

e aim is to improve our data-collection instrum
ent by removing any irrelevant items.
NB: This item analysis is different from item anal

ysis (also called analysis by items) which is part
of data analysis in experiments. This analysis is
done to ensure that the findings of an experimen
t are generalisable not only to people with simila
r characteristics to those who participated in the
experiment but also to items similar to those in t
he experiment (Clark 1973).
If you plan to conduct an experiment, see Phils

discussion of this term and SPSS how-to: http://
privatewww.essex.ac.uk/~scholp/statsquibs.htm
#item
4
Reminder: Classification of data-collecti

on instruments according to the basis o
f grading
Data-collection
instruments
Normreferenced
Criterionreferenced
5
Item analysis for norm-referenced measures

According to the traditional approach to item ana
lysis, items are examined in terms of:
1. Item facility: It is a measure of how easy an ite
m is. High facility means easy item.
. An easy way to assess it is by looking at
the percentage of people who answer
each it
em correctly.
The data-collection instrument as a whole shoul
d have facility of 0.5 and most items should hav
e around such a level of facility.
Understanding item facility
This is an activity from http://www.caacentre.ac.u

k/dldocs/BP2final.pdf
Input the file three_tests_IF.sav into SPSS.
This file shows the item facility for each question
in three tests.
Examine the item facilities in each test and try to
spot problematic item facilities.
Which test seems to be the best in that it contain
s items which will be able to distinguish among s
tudents of various proficiency levels?
7
Item analysis for norm-referenc

ed measures (cont.)
2.
Item discrimination: It is a measure of ho

w different performance on an item in co
mparison to performance on the other ite
ms.
It can be assessed via a correlation betw
een the items score and the score of the
whole measure.
It can also be assessed via Cronbachs a i
f item deleted.
8
SPSS: Item analysis for norm-re

ferenced measures
Do the activity described in the box on pag
es 26-27 from Phils Simple statistical app
roaches to reliability and item analysis ha
ndout.
Then do the activity described in the box o
n pages 29-30.
Calculate also item facility as a percentag
e of correct answers.
Item analysis for criterion-referenced m

easures (Brown 2003)
Difference Index: Item facility in the post-t
est item facility in the pre-test
B-Index: Item facility for students who pas
sed the test item facility for those who fai
led it
10
SPSS: Item analysis for criterion-referenced

measures
This is an activity from Brown (2003). He used e

xcel to calculate DI and B-I on two data sets.
Download this article as a pdf file from http://jalt.
org/test/bro_18.htm
Input the data from page 20 in SPSS.
Calculate DI via TransformCompute.
11
Reliability of criterion-referenced meas

ures
There are two basic approaches:
Threshold loss agreement

This approach examines the proportion of p
eople who consistently did better than the cut-off
point (masters) and the proportion of those who
consistently did worse (non- masters). It uses a
test-retest method.
1.
Example statistic: Cohens Kappa (AKA kappa coef

ficient)
12
The structure of Cohens kappa table in

this scenario (figure from Brown and H
udson 2002: 171)
13
Reliability of criterion-referenced
measures (cont.)
Squared error loss agreement
These statistical tests are like the pre
vious ones but they also assess how consis
tent the degree of mastery/non-mastery is.
2.
Example: phi(lamda) dependability index

(Not available in SPSS, see Brown 2005: 206207)
14
SPSS: Assessing reliability of a criterion-refe

renced measure through Cohens Kappa
Go to page 172 at http://books.google.co.uk/boo

ks?id=brDfGghl3qIC&pg=PA169&source=gbs_to
c_r&cad=3#v=onepage&q&f=false.
Input the data in SPSS.
Conduct the Kappa test.
15
Next week
Statistics for validity assessment
ANOVA with one independent variable
16
References
Brown, J.D. 2003. Criterion-referenced item analysis (The difference index a

nd B-index). Shiken: JALT Testing & Evaluation SIG Newsletter 7 (3) , 18-2
4.
Brown, J.D. 2005. Testing in language programs: a comprehensive guide to
English language assessment. New York: McGraw Hill.
Clark, H.H. 1973. The language-as-fixed-effect fallacy. Journal of Verbal Lea
rning and Verbal Behavior 12, 335-359.
Scholfield, P. 2011. Simple statistical approaches to reliability and item analy
sis. LG675 Handout. University of Essex.
17
Suggested readings
On the statistics used for item analysis
Brown, J.D. 2003. Criterion-referenced item analysis (The difference in

dex and B-index). Shiken: JALT Testing & Evaluation SIG Newsletter 7
(3) , 18-24.
Scholfield, P. 2011. Simple statistical approaches to reliability and item
analysis. LG675 Handout. University of Essex. (pp. 24-33)
On the statistics used to assess the reliability of criterion-referenced

measures
Brown, J.D. 2005. Testing in language programs: a comprehensive gui

de to English language assessment. New York: McGraw Hill. (chapter
9)
Brown, J.D. and Hudson, T. 2002. Criterion-referenced language testin
g. Cambridge: Cambridge University Press. (chapter 5)
18

LG675 5

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

LG675 5

Hochgeladen von

Copyright:

Verfügbare Formate

LG675

What is item analysis?

How can we conduct item analysis for

Only statistical analyses provided through SPSS

b) criterion-referenced data-collection measures?

How can we examine the reliability of criterion-reference

Work with some typical scenarios

Item analysis: definition

kind of reliability analysis used to identify

is conducted on data from the pilot study. Th

NB: This item analysis is different from item anal

If you plan to conduct an experiment, see Phils

Reminder: Classification of data-collecti

Item analysis for norm-referenced measures

Understanding item facility

This is an activity from http://www.caacentre.ac.u

Item analysis for norm-referenc

Item discrimination: It is a measure of ho

SPSS: Item analysis for norm-re

Item analysis for criterion-referenced m

SPSS: Item analysis for criterion-referenced

This is an activity from Brown (2003). He used e

Reliability of criterion-referenced meas

There are two basic approaches:

Threshold loss agreement

Example statistic: Cohens Kappa (AKA kappa coef

The structure of Cohens kappa table in

Example: phi(lamda) dependability index

SPSS: Assessing reliability of a criterion-refe

Go to page 172 at http://books.google.co.uk/boo

Conduct the Kappa test.

Statistics for validity assessment

ANOVA with one independent variable

Brown, J.D. 2003. Criterion-referenced item analysis (The difference index a

Brown, J.D. 2003. Criterion-referenced item analysis (The difference in

On the statistics used to assess the reliability of criterion-referenced

Brown, J.D. 2005. Testing in language programs: a comprehensive gui

Das könnte Ihnen auch gefallen