Sie sind auf Seite 1von 6

Ability and Test Anxiety as Factors in "The Influence of Test Difficulty upon Study

Efforts and Achievement"?


Author(s): Edward Earl Gotts
Source: American Educational Research Journal, Vol. 8, No. 3 (May, 1971), pp. 576-580
Published by: American Educational Research Association
Stable URL: http://www.jstor.org/stable/1161940
Accessed: 06-12-2016 00:49 UTC

REFERENCES
Linked references are available on JSTOR for this article:
http://www.jstor.org/stable/1161940?seq=1&cid=pdf-reference#references_tab_contents
You may need to log in to JSTOR to access the linked references.

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted
digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about
JSTOR, please contact support@jstor.org.

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
http://about.jstor.org/terms

American Educational Research Association, Sage Publications, Inc. are collaborating with
JSTOR to digitize, preserve and extend access to American Educational Research Journal

This content downloaded from 202.43.95.117 on Tue, 06 Dec 2016 00:49:45 UTC
All use subject to http://about.jstor.org/terms
American Educational Research Journal

Ability and Test Anxiety as Factors in


"The Influence of Test Difficulty upon

Study Efforts and Achievement"?

EDWARD EARL GOTTS


Indiana University

Marso (1969) has posed important questions regarding t


traditional psychometric criteria alone, for classroom test ite
tion, in the absence of student behavioral considerations. A
item difficulty of near 50 percent (Form W-difficult) was co
in his study with one of near 70 percent (Form G-easy), using
measures of verbal ability and of test anxiety as control variables, each
blocked into three levels. Form W was designed to approximate
recommended psychometric standards; exposure to Form G, which
"approximates regular classroom practice", was hypothesized to
produce, in contrast to Form W, more study time and greater student
achievement upon final examination. Marso's results from two separate
experiments were reported, respectively, as supporting each of these
predictions. Marso directed careful and creative effort to excluding
experimentally from the studies any social expectancy bias effects.
His article is competently handled, in fact, until the reporting of
findings. But some reexamination is needed of Marso's reporting, par-
ticularly of the effects of student ability, test anxiety, and E's success
in producing examinations to his criteria.
In examining Marso's Table 2 (1969, p. 625), it appears that his
combined parts 1 and 2 (93 items) are really his parts 1 and 3. This

576 Vol. 8 No. 3 May 1971

This content downloaded from 202.43.95.117 on Tue, 06 Dec 2016 00:49:45 UTC
All use subject to http://about.jstor.org/terms
Notes and Comments

at least would match his narrative as the numbers of items included in


each part. If this is so, then his Table 2 might be reconstituted in part,
using his methods of calculation (Table 1). These 30 Part 2 items

TABLE 1

Reconstructed Table 2 for Marso (1969),


Correctly Showing Three Part Scores and Totals for Final Exami

Total Exam
Section #1 (58 items) #2 (30 items) #3 (35 items) (123 items)

Form N M S Diff. M S Diff. M S Diff. M S Diff.

W(diffic.) 77 34.2 6.7 .59 23.9 - .80 24.9 - .71 83.0 11.9 .67

G (easy) 69 35.2 5.9 .61 25.8 -- .86 25.4 - .72 86.6 10.2 .70

were drawn randomly, 15 each from prior course exams


G conditions. What now becomes apparent is that over on
difference in mean total score is attributable to test materials to which
Ss had previously been exposed (Table 2). Because the means are

TABLE 2

Differences of the Means between W and G


Test Conditions for the Final Examination

Total Exam
Section #1 (58 items) #2 (30 items) #3 (35 items) (123 items)
Diff.
of Means 1.2 1.9 0.5 3.6

always lower for the W


sign. Thus it is worthy
from prior testing com
they account for more
Possible direct rather th
raise doubts regarding M
treatments: levels of tes
significant only at the
of carry-over items, shrin
ing will undoubtedly
standard. This is contr

577

This content downloaded from 202.43.95.117 on Tue, 06 Dec 2016 00:49:45 UTC
All use subject to http://about.jstor.org/terms
American Educational Research Journal

part did not influence the comparative performance of the two experi-
mental results.

Further, he contends that he was successful in maintaining a de-


sired level of difficulty on the final exam mid-way between that of the
averages for unit exams. What needs to be asked is how comparable are
the difficulties of the W and G final exams to their associated treat-
ment series, and what may have been the psychological consequences of
any deviations from exact comparability? A difficulty level mid-way
between the means for all W and G unit examinations would be 63.5,
whereas the mean difficulty level of W and G final exams was 68.5.
This is clearly more similar to the treatment condition experienced by
the easy group. Apart from the hypothesized treatment effect, and even
discounting the carry-over error already discussed, the greater psycho-
logical similarity of difficulty of the final examination for the easy
than for the difficult group might by itself be sufficient to have pro-
duced the remaining difference between means on the final exam-
ination.

Another reporting problem is apparent in Marso's Table 5, in


which the verbal ability and test anxiety labels have apparently been
interchanged between the boxed headings and the inset headings,
making it possible to read either the left hand set of means or those on
the right as if they represent the correct or incorrect blocking vari-
ables. For this reason, one cannot tell whether Marso's narrative report
accurately reflects the obtained results. Depending upon the heading
to which one attends in Marso's Table 5, he may agree or disagree with
the author's conclusions. In any event, since the author obtained no
significant interactions in the two associated analyses of variance and
made no planned comparisons, his formulating of conclusions regarding
variations of particular cell means seems highly questionable. Besides
this, the questions already raised about the criterion variable render
the ability and test anxiety analyses unjustified.
What is perhaps most disturbing about the report is that raw
difficulties (D), uncorrected for guessing, are reported throughout. To
remain consistent with Marso's practice, prior recalculations in the
present critique were carried out in the same fashion. However, the
present investigator also recalculated all of Marso's test difficulties
using the correction for guessing formula (s) of Appendix 1, which only
assumes random guessing and thus provides no correction for partially-
informed guessing. The latter possibility is a real one (Davis, 1966)
and means that the correction still probably underestimates how diffi-
cult the tests really were.

578 Vol. 8 No. 3 May 1971

This content downloaded from 202.43.95.117 on Tue, 06 Dec 2016 00:49:45 UTC
All use subject to http://about.jstor.org/terms
Notes and Comments

When formula 5 was used to recompute Marso's D's for the


averages of all treatment-period tests from his Table 1 (Marso, 1969, p.
625), the D of his W group changed from .54 to .39 and for his G group
from .73 to .63. A similar recomputation of D for his final examination
changed his original Table 2 for group W from .67 to .55, and for group
G from .70 to .61. Or, perhaps more appropriately for evaluating hi
final exam construction, the overall corrected D level is .58. Thes
findings suggest that, contrary to his report, Marso did not succeed in
producing the desired D level of .70 and .50 for his unit examinations,
although his final may more nearly have approached its goal, yet at the
expense of approximating more nearly the unit exam experience
provided for the G group and thus favoring their final examination
performance for reasons outside the hypotheses of the study.
The reason these better estimates of difficulty are important to
Marso's theorizing is that they can be expected more accurately to
reflect subjective probabilities of success, since individuals un
doubtedly experience as different a) guessing an answer and b
knowing an answer. An individual's subjective probability of succes
should more accurately be estimated by accounting for his probability
of having guessed. The same argument applies to group data. If on
combines with this the Atkinson-McClelland model of achievement
motivation (Atkinson, 1958), it becomes apparent that Marso's argu-
ment, had he corrected for guessing, would be equivalent to saying
that achievement will be greater when the probability of success is .70
than when it is .50. A considerable body of findings regarding level of
aspiration and achievement motivation would suggest that he is wrong
on this count and that corrected D level of .50 is optimal from the
human motivational viewpoint as well as from psychometric consider-
ations, particularly among achievement-oriented university students.
It is to be hoped that a full reanalysis of his data will yield a future
report on a par with Marso's careful conceptualization and experimen-
tation.

APPENDIX 1

To make clear what is subjectively difficult or easy req


use of at least the guessing correction formula, of which the
score (X) form,
W
X=R- A-1
A-1
(1)

where R is the number of items completed correctly;


W is the number of items completed incorrectly;

579

This content downloaded from 202.43.95.117 on Tue, 06 Dec 2016 00:49:45 UTC
All use subject to http://about.jstor.org/terms
American Educational Research Journal

A is the number of answer alternatives for an item,


can be rewritten, in view of formula 2, as a corrected difficulty index
of the correct answers equals the raw mean of the test and that the
total guesses made minus the mean equals the mean wrong.
T-M
Mc=M- A-1
A-1 (2)

where Mc is the mean corrected for guessing;


M is the mean of the test;
T is the mean total guesses.
The usual difficulty (D) index formula,
M
D (3)
N

where M is an uncorrected mean for a test;


N is the total items in the test;
can be rewritten, in view of formula 2, as a corrected difficulty index
(DC),
Mc
DC =N (4)
N

or in a more usable computing formula by substituting from formula 2


for Mc.

(T-M)
M-
DC = A-1 (5)
N

This approximation formula assumes random guessing and provides no


correction for partially informed guessing.

REFERENCES

ATKINSON, J. W. (Ed.) Motives in fantasy, action, and society. Princet


Van Nostrand, 1958.
DAVIS, F. B. Use of correction for chance success in test scoring. In C
& H. G. Ludlow (Eds.), Readings in educational and psychological me
ment. Boston: Houghton Mifflin, 1969.
MARSO, R. N. The influence of test difficulty upon study efforts and ac
American Educational Research Journal, 1969, 6, 621-632.

580 Vol. 8 No. 3 May 1971

This content downloaded from 202.43.95.117 on Tue, 06 Dec 2016 00:49:45 UTC
All use subject to http://about.jstor.org/terms