Sie sind auf Seite 1von 31

Comparing CTT and IRT using the

Aptitude Test for High School


Adelaida de Perio
De La Salle University - Manila
Background
  Admission to tertiary education requires applicants to
pass the screening process set by schools.
  One of the assessment tools used to select their
potential applicants is the use of aptitude tests. Aptitude
tests are used to measure one’s fundamental intellectual
abilities.
Background
  The Abstract Reasoning is a non-verbal test measuring
one’s ability to identify patterns in a series.
  Numerical ability on the other hand, measure’s one’s
ability in solving mathematical problems.
  Verbal Reasoning measures one’s ability to understand
analogies and covers areas in English language.
  The spatial ability measures one’s ability to manipulate
shapes.
Background
  Mechanical reasoning measures one’s knowledge of
physical and mechanical principles. Lastly, spelling
measures one’s ability to detect errors in grammar
punctuation and capitalization (Magno & Quano, 2010).
Review of Related Literature
  Because many studies link aptitude with academic
performance, schools use the aptitude test to predict
future outcomes of students’ performance.
  Long standing key predictors of academic success is
students’ abilities measured by SAT or ACT, or high
school GPA in predicting academic success (Covington,
1992; Lavin, 1965; Willingham, Lewis, Morgan, & Ramist,
1990).
Review of Related Literature
  Garavila, Gredler & Margaret (1997) examined the extent
to which college students’ learning strategies, prior
achievement and aptitude predicted course achievement.
Analyses showed that each of the predictor was
significantly correlated with achievement. These variables
accounted for 45% of the variance in course achievement.
Review of Related Literature
  Garcia (1997) found the same results in his study
examining the relations of motivation, attitude, and
aptitude on second language achievement. The findings of
the study revealed that aptitude (β=.43) Motivation (β
=.41) and Ethnic Membership (β =.14) explained more
than 50% of the variance in language achievement.
Review of Related Literature
  In secondary education, little has been done to screen in
students before entering the high school. This is the
reason why some students lack the necessary skills and
come unprepared to meet the demands and expectations
of high school education. The use of an aptitude test
therefore will not only serve as a screening tool but
moreover, it will provide teachers with information on the
areas students have to improve on.
Objective
  The present study therefore aims to compare CTT and
IRT results in evaluating the Aptitude Test developed for
High School in terms of item difficulty, and item
discrimination.
Review of CTT and IRT
  The CTT model, also called the “True Score Theory”
espouses the idea that responses of examinees are only
due to variation of the examinee’s ability.

  In CTT, item difficulty is indicated by the frequency of


responses; item discrimination is indicated by item total
correlation; and frequency of responses is used to
examine distracters (Impara & Plake, 1997).
Review of CTT and IRT
  Traditionally, CTT has been used as a method of analysis
in evaluating test although it has several limitations.

  First, the person statistic or the observed score is item


dependent. Second, item statistics or the difficulty level
and item discrimination are examinee dependent. The
Item Response Theory answers these major limitations of
the CTT.
Review of CTT and IRT
  The Rasch model, which is also referred to as the IRT,
estimates the probability of a correct response to an item
as a function of the person’s ability and difficulty of the
item.
  In IRT, each item in a test has its own characteristic curve
which describes the probability of getting the item
correctly or depending on the test taker’s ability (Kaplan
& Saccuzzo, 1997).
Review of CTT and IRT
  IRT asserts that the easier question, the more likely a
student will be able to respond to it correctly, and the
more able the student, the more likely he or she will be
able to answer the question correctly as compared to a
student who is less able. Rasch model is based on the
assumption that guessing and item differences in
discrimination are negligible (Anastasi and Urbina (2002).
Method
Participants
  A total of 63 incoming 1st year High School students,
both male and female participated in the study. The
participants in the study were composed of grade 6
students from different elementary schools in Manila. The
participants have finished the grade 6 level and were
applying in a Science High School. Age ranges from 11-13
years old.
Method
  Instrument
  The Aptitude Test for High School was developed to
measure fundamental intellectual abilities in abstract
reasoning, verbal reasoning and quantitative reasoning. The
instrument consists of a total of 100 multiple choice
items. The AHP consists of 30 items for abstract
reasoning; 30 items for numerical reasoning, and 40 items
for verbal reasoning.
Method
  Psychometric properties of the test show the following
reliability estimates for each subtest. Obtained reliability
coefficients for each subtest are .70 for abstract
reasoning, .77 for numerical reasoning, and .78 for verbal
reasoning.
Method
  Procedure
  The test was administered to incoming 1st year high
school students in a Science High School in Manila. The
AHP was given as one of the assessment tools in their
selection of potential applicants who will be accepted in
the Science High School. A trained examiner administered
the test for one hour.
Data Analysis
  Data gathered were analyzed in terms of its reliability
coefficients, item difficulty and discrimination using both
CTT and IRT.

  In terms of item difficulty and item discrimination using


the Rasch model, two samples were tested and compared.
  The following computer software was used: SPSS version
16, and Microsoft Excel version 2007, and Winsteps for
the IRT.
Results
  Reliability Indices
  Using the Classical Test Theory, reliability coefficients for
abstract reasoning, numerical reasoning and verbal
reasoning were as follows: .70, .77, and.78.
Table 1
Summary of Person and Item Measure for Abstract Reasoning
Person Input Measured Infit Outfit
Cou OMN
Score nt Measure Error IMNSQ ZSTD SQ ZSTD
Mea
n 21.8 30 1.33 0.5 1 0.1 0.94 0.1

SD 4 0 0.88 0.11 0.15 0.6 0.31 0.6


Real Person
RMSE 0.51 True SD 0.72 Separation 1.39 Reliability 0.66

Person Input Measured Infit Outfit

Mea
n 45.9 63 0 0.36 1 0.1 0.94 0

SD 10.2 0 1.09 0.14 0.11 0.8 0.22 0.9


Real
RMSE 0.39 True SD 1.02 Separation 2.65 Item reliability 0.88
Table 2
Summary of Person and Item Measure for Numerical Reasoning

Person Input Measured Infit Outfit


OMNS
Score Count Measure Error IMNSQ ZSTD Q ZSTD
Mean 19.5 30 0.82 0.47 1 0.1 0.97 0
0.9 0.27 0.9
SD 4.9 0 0.95 0.97 0.16
Real Person
RMSE 0.47 True SD 0.82 Separation 1.74 Reliability 0.75

Person Input Measured Infit Outfit

Mean 40.9 63 0 0.32 1 0.1 0.97 0

SD 10.5 0 0.93 0.04 0.11 0.9 0.2 1


Real 0 Item
RMSE .32 True SD 0.87 Separation 2.74 reliability 0.88
Table 3
Summary of Person and Item Measure for Verbal Reasoning

Person Input Measured Infit Outfit


Score Count Measure Error IMNSQ ZSTD OMNSQ ZSTD
Mean 21.8 40 0.33 0.4 1.01 0 0.99 0

SD 5.1 0 0.75 0.04 0.19 1 0.43 0.9


Real Separati Person
RMSE 0.4 True SD 0.63 on 1.59 Reliability .0.72

Person Input Measured Infit Outfit

Mean 33.8 62 0 0.34 0.99 0.1 0.99 0.1

SD 14.8 0 1.41 0.12 0.09 0.8 0.29 1


Real Separati Item
RMSE 0.36 True SD 0.87 on 3.79 reliability 0.94
Table 4
Summary of Item Difficulty for Abstract Reasoning using Two Samples
SAMPLE 1 SAMPLE 2
MEASURE SE MEASURE SE
ITEM 1 -0.27 0.47 -0.02 0.44
ITEM 2 0.49 0.41 0.67 0.4
ITEM 3 0.96 0.39 0.17 0.43
ITEM 4 -2.36 1.03 -0.98 0.56
ITEM 5 0.32 0.42 -0.22 0.46
ITEM 6 .-51 0.51 -0.44 0.48
ITEM 7 -0.27 0.47 -0.44 0.48
ITEM 8 2.01 0.4 1.9 0.4
ITEM 9 1.7 0.39 1.44 0.39
ITEM 10 -0.27 0.47 -0.02 0.44
ITEM 11 0.8 0.39 0.17 0.43
ITEM 12 1.25 0.38 0.98 0.39
ITEM 13 -0.27 0.47 0.51 0.41
ITEM 14 -0.51 0.51 -2.55 0.03
ITEM 15 -0.79 0.55 -0.44 0.48
ITEM 16 0.96 0.39 -0.22 0.46
ITEM 17 -0.27 0.47 -1.8 0.75
ITEM 18 -1.14 0.62 -0.98 0.56
ITEM 19 -1.6 0.75 -1.8 0.75
ITEM 20 0.32 0.42 0.34 0.42
ITEM 21 -0.79 0.55 -0.44 0.48
ITEM 22 -3.56 1.81 -2.55 1.03
ITEM 23 0.14 0.43 0.34 0.42
ITEM 24 0.32 0.42 2.22 0.41
ITEM 25 -1.14 0.62 0.83 0.39
ITEM 26 1.7 0.39 2.78 0.46
ITEM 27 -0.27 0.47 -0.22 0.46
ITEM 28 0.32 0.42 1.29 0.39
ITEM 29 -0.51 0.51 -0.69 0.51
ITEM 30 -0.27 0.47 0.17 0.43
Table 5
Summary of Item Difficulty for Numerical Reasoning using Two
Samples
SAMPLE 1 SAMPLE 2

MEASURE SE MEASURE SE

ITEM 1 -0.91 0.51 -0.61 0.44

ITEM 2 -2.76 1.03 -1.02 0.48

ITEM 3 2.45 0.46 1.47 0.41

ITEM 4 0.76 0.4 -0.8 0.45

ITEM 5 0.92 0.39 0.23 0.39

ITEM 6 -0.45 0.46 -1.26 0.51

ITEM 7 -0.45 0.46 -0.61 0.44

ITEM 8 1.23 0.4 0.99 0.39

ITEM 9 0.28 0.41 -0.08 0.4

ITEM 10 0.45 0.4 0.39 0.39

ITEM 11 -0.67 0.48 -1.02 0.48

ITEM 12 -1.2 0.56 -0.25 0.41

ITEM 13 -1.55 0.63 -1.26 0.51

ITEM 14 0.28 0.41 0.54 0.39


Table 5
Summary of Item Difficulty for Numerical Reasoning using Two
Samples
ITEM 15 0.61 0.4 0.23 0.39

ITEM 16 1.38 0.4 1.64 0.42

ITEM 17 -2.76 1.03 -1.26 0.51

ITEM 18 0.61 0.4 -0.61 0.44

ITEM 19 0.28 0.41 0.39 0.39

ITEM 20 -0.45 0.46 -0.42 0.42

ITEM 21 -0.45 0.46 0.84 0.39

ITEM 22 -0.91 0.51 -1.02 0.48

ITEM 23 -0.67 0.48 -0.25 0.41

ITEM 24 -0.06 0.43 -0.25 0.41

ITEM 25 -1.2 0.56 0.23 0.39

ITEM 26 2.06 0.43 1.82 0.43

ITEM 27 0.92 0.39 0.23 0.39

ITEM 28 -0.45 0.46 -0.42 0.42

ITEM 29 1.23 0.46 0.69 0.39

ITEM 30 1.07 0.39 1.47 0.41


Table 6
Summary of Item Difficulty for Verbal Reasoning using Two Samples
SAMPLE 1 SAMPLE 2
MEASURE SE MEASURE SE
ITEM 1 -0.43 0.4 -0.66 0.41
ITEM 2 0.17 0.38 0.69 0.38
ITEM 3 0.77 0.39 1.33 0.42
ITEM 4 -2.67 0.74 -2.53 0.74
ITEM 5 -3.41 1.02 -4.5 1.83
ITEM 6 1.55 0.44 1.7 0.45
ITEM 7 -2.21 0.62 -2.53 0.74
ITEM 8 -0.59 0.41 0.55 0.38
ITEM 9 1.64 0.45 0.55 0.38
ITEM 10 -0.94 0.43 -0.66 0.41
ITEM 11 -1.35 0.47 -2.08 0.62
ITEM 12 -1.13 0.45 -3.28 1.02
ITEM 13 0.77 0.39 0.69 0.38
ITEM 14 0.77 0.39 0.69 0.38
ITEM 15 -0.94 0.43 -2.53 0.74
ITEM 16 -1.58 0.51 -3.28 1.02
ITEM 17 -1.58 0.51 -2.08 0.62
ITEM 18 -1.35 0.47 -1.02 0.45
ITEM 19 1.64 0.45 1.33 0.42
ITEM 20 0.03 0.38 1 0.4
ITEM 21 1.86 0.48 1.16 0.41
ITEM 22 0.93 0.4 0.4 0.38
ITEM 23 -0.12 0.39 -1.02 0.45
ITEM 24 0.93 0.4 1.16 0.41
ITEM 25 1.86 0.48 1.5 0.43
ITEM 26 0.62 0.39 1.7 0.45
ITEM 27 1.64 0.45 1.91 0.48
ITEM 28 0.77 0.39 1.16 0.41
ITEM 29 -0.59 0.41 0.26 0.38
ITEM 30 -0.27 0.39 0.69 0.38
ITEM 31 -0.27 0.39 -0.49 0.4
ITEM 32 1.64 0.45 0.84 0.39
ITEM 33 -0.27 0.39 -0.18 0.39
ITEM 34 0.03 0.38 -0.18 0.39
ITEM 35 -0.43 0.4 0.26 0.38
ITEM 36 2.74 0.63 2.79 0.63
ITEM 37 -0.43 0.43 -0.49 0.4
ITEM 38 0.93 0.4 1.16 0.41
ITEM 39 -0.94 0.43 -1.02 0.45
ITEM 40 0.17 0.38 0.55 0.38
Discussion
  In terms of reliability measures obtained reliability using
CTT and IRT shows moderately high estimates. This
suggests that there is a higher chance that persons
estimated with higher measures actually have really higher
measures than persons with low measures.

  Results also reveal that in terms of item and person


separation, the sample can still be separated into groups
and the test can still be divided into groups.
Discussion
  In terms of item discrimination, the same items were
found to have poor discrimination index for numerical
reasoning and verbal reasoning using CTT and IRT.
Therefore, these items should be subjected to revision.

  However, for abstract reasoning 2 out of 5 items


considered poor using CTT was also considered poor
using IRT. In terms of item difficulty, similar items
considered difficult were seen using both models.
Discussion
  However, there is discrepancy in the number of items
considered difficult for both CTT and IRT. These findings
suggest that there is a relative degree of stability across
CTT and IRT in terms of item discrimination.
  Overall results showed that there appears to have
consistency in the results using both CTT and IRT.
Discussion
  However, in this study, one of the advantages of using the
IRT over CTT was evidently seen. IRT is sample- free
nature of its results. This means that item parameters are
invariant when computed using different groups of
different abilities.
Thank you!

Das könnte Ihnen auch gefallen