هذه النسخة بصيغة html للملف

html http://clal.org.cn/personal/testing/MA/writing/madigan1991.pdf . G o o g l e html .
Page 1
METHODS AND TECHNIQUES

Holistic Grading of Written Work in Introductory Psychology: Reliability, Validity, and Efficiency
Robert]. Madigan
University of Alaska Anchorage
Iames I. Brosamer
Department of English University of Alaska Anchorage Written assignments in large classes can be made practical by the use of holistic scoring. Data are Presented to show the reliability, validity, and efficiency of adapting holistic scoring to written work in an introductory psychology course in which the grading was
done entirely by a teaching assistant (TA). Separate criteria were
used to score the content and writing skill of paragraph/length essaytest answers and shortpaper assignments. judgments of writing skill made by Psychology TAs using holistic criteria were reliable and compared favorably with writingskill judgments made by experienced English composition instructors We con~ clude that holistic scoring is a highquality, l0w~cost approach to scoring written work in large classes.
Many college faculty agree that writing assignments are valuable additions to college classes. For Boyer (1987), the
ability to communicate well in writing and speaking is one of the hallmarks of an educated person. He listed the develop' ment of writing skills among the major objectives of a college education. The growth and influence of the writing~across'
the~curriculum movement is well attested (Griffin, 1982; Maimon, 1982; McLeod, 1988). Its relevance to psychology courses was explored in a special issue of this journal (Nodine, 1990). In addition to developing writing skills, there are other reasons to include written work in courses. Writing is a special mode of learning (Emig, 1977); it can improve critical thinking (Fulwiler, 1982) by encouraging writers to scrutinize the language expressing their thoughts. Written examinations may produce better mastery of course material than do multiplechoice tests (Frederiksen, 1984). The cost of written work is that it must be read and graded. The prospect of the grading task, coupled with the
availability of multiple~choice questions from textbook pub~

lishers, prompts many teachers to eliminate written work
from large classes. Apart from the difficulties of grading,

there are legitimate questions about the quality of grading by faculty members whose formal training in writing may not exceed that of their students. White (1989) reported that many faculty members overemphasize mechanical aspects of writing at the expense of the more important conceptual and
organizational components.
Holistic grading may offer a way to address issues of work load and quality in evaluating written work. In holistic scor~
v01. 18, No. 2, April 1991

ing, the grader places a writing sample in one of a small number of descriptive categories (typically five or six) based
on the graders overall impression of the work (White,
1985). Each category is described by three or four sentences that identify its salient characteristics. This article describes our adaptation of holistic scoring to
written work in a medium'sized introductory psychology
class in which all grading was done by teaching assistants (TAs). This situation required an efficient system that could easily be taught to new TAs. But a useful scoring system must also provide students with reliable and valid judgments of
their written work. lntergrader reliability was a concern in
our study because the graders had no prior training in the

assessment of writing. It is natural to question whether psy~
chology TAs can make valid judgments about the writing

skills of freshmen.
In the following sections, we describe a holistic scoring system developed for this course and present data on its
efficiency, reliability, and validity. The grading system was
created for a General Psychology course taught in a lecture format to about 100 students. During the semester, a typical student wrote 4,000 words on essay exams and paper assignments. The primary grading was done by a TA and super~ vised by the course instructor.
The Scoring Approach The holistic scoring methods developed for the course
evolved from examples presented by White (1985) but differ from his in order to meet the pedagogical objectives of Gen eral Psychology. The course instructor intended the writing component of the course to foster content mastery and to support the development of students writing skills, whereas White was concerned only with the assessment of writing skill. New holistic scoring categories were developed to al~ low each writing sample to be given two scores, one for content and another for quality of expression. This distinc tion between content and writing skill tells the students that the ability to write well is important, and it provides them with direct feedback about the quality of their written work. The relative magnitude of the content and writing~skill
91
Page 2
scores was determined by the instructor's sense of their rela' tive pedagogical importance for the course. Content was weighted 80%, and writing skill was weighted 20%.
Another difference between our approach and White's

(1985) was to represent each holistic category by a range of numeric values for grading purposes. The specific numerical values associated with each category were determined by the instructor based on experience with the course and grading
preferences. The TA assigned a score from the range to indicate the quality of the students work within the category.
The course generated two different types of written work: (a) responses to short~answer essay'test questions and (b) short papers. Scoring criteria were developed for each.
Essay Test Scoring
Weekly tests consisted of eight multiple'choice and two essay questions. Ten essay questions were distributed in ad vance on a weekly study guide; two of tne questions were selected for the weekly test. Students also received handouts describing the grading system and suggesting ways of doing well on the tests. Students were encouraged to prepare for
the test by writing and revising trial answers to the 10 ques~

tions. A typical answer was 80 words. Madigan and Brosamer (1990) described the testing system in more detail. Table 1 presents the scoring system for these essay-test an-
swers. The essays were graded by giving up to 20 points for

Table 1. Grading Criteria for Essay Test Questions
Content Scorea Criteria 05 The answer is largely irrelevant to the question or shows major conceptual confusions. 61 3 The answer does not address significant portions of the question or contains significant factual errors. 141 9 The answer responds appropriately to all parts of the question but does not include sufficient depth, or it includes irrelevant material, or it contains minor factual errors.
20 The answer addresses the question completely. It is concise and may contain original observations or examples. There is no irrelevant material.
Writing-Skill Scoreb Criteria 0 The answer is clearly inappropriate to the question. i The answer is so lacking in coherence and unity that it is very difcult to follow, or the answer is so brief that it says almost nothing.
2 The answer does not develop in a manner appropriate to the question; in addition, the material is not presented in a unified, coherent paragraph. 3 Either the answer does not develop in a manner appropriate to the question or the answer is a poorly constructed paragraph. 4 The answer is a unified, coherent paragraph that addresses the question appropriately. lt adequately de velops and expresses the writers thoughts.
5 The answer is a well developed, unified, and coherent paragraph that skillfully addresses the question. The writing is more than adequate; it is good.
aideas, supporting facts, and reasoning. t>C)rganizing and expressing ideas.
92
the content otthe answer and up to 5 points tor writing skill. The grader read the student's work and assigned points for content and writing skill according to the categories given in Table l.
Short-Paper Scoring
Students could also earn course points by writing optional homework assignments. Qnly the most commonly selected
option is discussed here: two 500' to LOGO-word papers

describing the application of psychological principles to the
student's life. These two papers were structured around Hole

lands (1985) manual, which suggests self'improyernent ex, ercises based on psychological principles in such areas as study skills, progressive relaxation. and relating to children.
The two short papers described the students experiences

with these projects. Table 2 shows the scoring system for the papers. Each paper was given up to 60 points for its content and 15 points for writing skill. The TA was discouraged from writing comments, on the Table 2. Grading Criteria for Short-Paper Assignments
Content Score Criteriaa
044 The paper does not come to terms with the assign-
ment. Content areas are ignored or misconstrued, or

there are major errors in the interpretation of psychological principles central to the assignment,
4549 The paper ignores several content areas or shows a misunderstanding of the principles involved.
5054 The paper ignores more than one content area of the assignment but shows the writers ability to appreci-
ate key ideas and principles. 5559 The paper slights or ignores one of the content areas of the assignment. The writer demonstrates a clear understanding of the assignment and the psychological principles involved.
60 The paper deals fully with all content areas specified
for the assignment and shows a clear understanding

of the underlying psychological principles.
Writing-Skill Score Criteria

18 There is little development of ideas or no clear pro
gression from one part of the assignment to another.

There may be serious, frequent errors in sentence structure, usage, and mechanics. 911 The paper is organized enough to allow the reader to
move through it, but there may be disiointedness or

lack of focus in some sections. The paper may contain errors in mechanics, usage, and sentence structure. 1214 The paper is not as carefully organized or reasoned
as the full credit paper, but it is organized into unified, coherent sections and is largely free from serious errors in mechanics, usage, and sentence structure.
15 The paper shows careful organization and orderly
thinking. The transitions between content areas are smooth; the paper may organize the content areas
creatively to improve the readability of the paper;
and the author may make thoughtful comments. it is virtually free from errors in mechanics, usage, and
sentence structure. aContent requirements included an appropriate introduction, de-
scriptions of the chosen exercise and the psychological principles

involved, and an evaluation of the experience.
Teaching of Psychology
Page 3
papers. Evidence supporting the efficacy of instructor comments to improve student writing comes from studies in which students revised earlier drafts based on instructor
feedback (Willingham, 1990; Ziv, 1984). This was not the case here. Furthermore, handwritten comments are exceed' ingly time'consuming. Students did receive a model answer
when their tests were handed back, and they were encour'
aged to discuss tests and papers with the TA.
Scoring Reliability Method Reliability was assessed by comparing the grades assigned by a firstayear graduate student and an undergraduate psy~
chology senior, inexperienced as a TA. The graduate stu~
dent had served as the TA for the course during two se~ mesters. The TA spent about 4 hr training the psychology
senior in the scoring methods. Two essayatest answers writ, ten in a previous class were available for 30 students. The
answers were typed on separate sheets of paper and scored by the two assistants using the criteria in Table 1.
To assess the reliability of Table 2's scoring criteria for
short papers, a set of 23 papers graded during the semester by the TA were also graded by the psychology senior. All papers were typed.
Results The reliabilities for content scores given to essay~test an~ swers were .91 and .95 for the two questions analyzed. The writing/skill scores for these questions had reliabilities of .61
and .75. The shortpaper scores yielded a content reliability of .68 and a writing'skill reliability of .71. Scoring Validity Method
The validity analysis focused on whether the writing'skill
scores given by the TAs were consistent with judgments of writing skill made by better trained evaluators of written work. Two experienced instructors of freshman composition read the same essay~test answers and short papers that had been previously scored by the TAs. The English instructors worked independently of each other and judged the essays separately from the papers. They were told to assign each writing sample to one of five categories based on its quality. Writing quality was defined as a subjective term that gen~ erally refers to a writers ability to express thoughts in a clear,
organized, and interesting manner. No other constraint
was placed on their judgments except that they assign at

least one essay to each of the five quality categories. They were unfamiliar with the grading systems presented in Tables 1 and 2. Data from the English instructors were analyzed to determine interscorer reliability, and then their judgments were pooled to arrive at a mean writing~quality judgment for
each writing sample. These values were correlated with the

pooled writing~quality scores assigned by the TAs. This analysis assessed the agreement between the writing~quality
judgments of the TAs and those of experienced composition

instructors.
Vol. 18, No. 2, April 1991
Results When the English instructors sorted the same writing
samples into five writingaquality levels, their interscorer re~ liability was .68 for the test essays and .65 for the short
papers. The average of the TAs writing'skill scores were correlated with the corresponding judgments of the English
instructors and yielded .73 for the essay answers and .48 for the short papers. Scoring Efficiency Method and Results
The third area of interest was the efficiency of the scoring
system. Time required to score each essay~test answer was
recorded over a semester. Scoring times were also obtained
for a sample of 26 of the short papers.

The typical answer was 79.9 words. The average time
required to score it was 58 s. The median short'paper length

was 741 words; short papers required an average of 4 min, 5 s to grade. These grading times do not include activities such
as totaling points and recording scores.

General Discussion The scoring systems presented here have acceptable in~
terscorer reliabilities. As a point of reference, the reliability

of the system developed for writing samples collected in the California State University English Equivalency Examinaa tion is estimated to be about .78 (White, 1985). That system
has been carefully refined and is administered by experi~ enced, trained judges.
The writingskill judgments of the essay~test answers made by psychology TAs had impressive validity. We con'
clude that the psychology TAs are able to use holistic scoring criteria to make writing~skill judgments about paragraph' length test answers that are comparable to the judgments of professional raters who are better trained and more
experienced. However, the agreement between the TAs and the reach ers of freshman composition was not as high for the short papers. We attribute this to the more complex nature of the short~paper assignments. The criteria used by the TAs to
judge writing skill were made deliberately simple, and it may

be that the English instructors took more factors into ac;
count when they judged writing quality. They had been

instructed to judge writing quality based on the extent to which the writers thoughts were expressed in clear, orga1
nized, and interesting ways. They may have understood this to imply aspects of diction and style not captured by the
writingskill categories in Table 2. The use of somewhat different criteria to judge the short
papers may explain why there was high reliability between

the two psychologists and between the two English instruc~ tors, yet only moderate agreement across the two disciplines. The holistic scoring system of Table 2 focuses primarily on
the organization of the short papers. We do not know if the criteria could be successfully expanded to include other
important features of writing without compromising reliabil'
93
Page 4
ity. We suspect it would be difficult. Although the validity coefficient was not as high as we had hoped, judgments of the TAs were in substantial agreement with those of sophis ticated professionals in English. Students received reliable and useful information about at least one major dimension of their writing.
The data show that grading efficiency, the third major con cem of the project, is satisfactory. The total amount of time spent grading about 4,000 words written by a typical student in our introductory course is about 38 min (30 min to score 30
essaytest answers and 8 min for two 750eword short papers).

Our holistic grading approach appears to have reasonable
reliability, validity, and cost efficiency. It has functioned well for 3 years, serving almost 600 students under three
different TAs. Student feedback about the course has been solicited each semester, and not one complaint has been
directed at the philosophy or mechanics of the grading syse

tern. Perhaps the strongest testimony supporting the prace ticality of the grading system is the fact that it continues to
be in place and that each student in the course continues to

write about 4,000 words during the term. Because the grad ing burden is shifted to a TA, the instructor serves as a supervisor, consultant, and qualitycontrol monitor. The possibility that the demands of grading will lead the instruce tor to eliminate written work from the course is, therefore,
reduced. Competent, committed TAs are essential, but they have not been difficult to find.
References
Boyer, E. L. (1987). College: The undergraduate experience in Amer ica. New York: Harper & Row. Emig, ]. (1977). Writing as a mode of learning. College Composition
and Communication, 28, 122128. Frederiksen, N. (1984). The real test bias. American Psychologist,
39, l93~ZOZ.
Fulwiler, T. (1982). Writing: An act ofcognition. In C. W. Griffin (Ed. ), Teaching writing in all disciplines (pp. 1526), San Francis; co: JosseyeBass. Griffin, C. W. (Ed). (1982). Teaching writing in all disciplines. San Francisco: losseyBass. Holland, M. K. (1985). Usingpsychology (3rd ed.). Boston: Little, Brown. Madigan, R., & Brosamer, J. (1990). Improving the writing skills of students in introductory psychology. Teaching of Psychology,
17, 2730.
Maimon, E. P. (1982). Writing across the curriculum: Past, present, and future. In C. W. Griffin (Ed), Teaching writing in all disciplines (pp. 67~74). San Francisco: lossey-Bass.
McLeod, S. H. (Ed). (1988). Strengthening programs for writing

across the curriculum. San Francisco: losseyeBass. Nodine, B. F. (Ed). (1990). Psychologists teach writing [Special
issue]. Teaching of Psychology, 17(4).

White, E. M. (1985). Teaching and assessing writing San Francisco: JosseyeBass. White, E. M. (1989). Developing successful writing programs. San
Francisco: lossey'Bass.
Willingham, D. B. (1990). Effective feedback on written assign,
ments. Teaching of Psychology, 17, 1013.

Ziv, N. D. (1984). The effect of teacher comments on the writing of four college freshmen. In R. Beach 81 L. S. Birdwell (Eds), New directions in composition research (pp. 362380). New York: Guilford. Notes
1. A portion of this article was presented at the annual meeting of

the American Psychological Association, Atlanta, GA, August
1988. Z. We thank Gloria Collins, Lynette Derrickson, Donna Kleppin,

and Tamra Matlock for their help with the study and lodi Madigan and Susan Johnson for helpful comments on an earlier draft of this article. 3. Requests for reprints should be sent to Robert Madigan, Dee partment of Psychology, University of Alaska Anchorage, An~
chorage, AK 99508.
Teaching Hypothesis Testing by Debunking a Demonstration of Telepathy

John A. Bates
Department of Educational Foundations 8 Curriculum Georgia Southern University
Introductory psychology students were told that their instructor had tekpathic powers. The instructor demonstrated these psy~ chic" abilities by transmitting various images into the minds of
the students and a confederate. Students generated alternative

hypotheses to account for the phenomena and designed methods
94
to test their hypotheses. This article describes the methods used to perform the psychic acts and outlines the structure of the hy~ pothesistesting activity. By allowing students to experience firsthand the importance of the rules of science, this instruc tional method may encourage greater scientific skepticism than Teaching of Psychology

هذه النسخة بصيغة html للملف

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

هذه النسخة بصيغة html للملف

Hochgeladen von

Copyright:

Verfügbare Formate

html http://clal.org.cn/personal/testing/MA/writing/madigan1991.pdf . G o o g l e html .

METHODS AND TECHNIQUES

availability of multiple~choice questions from textbook pub~

from large classes. Apart from the difficulties of grading,

v01. 18, No. 2, April 1991

our study because the graders had no prior training in the

chology TAs can make valid judgments about the writing

Another difference between our approach and White's

Essay Test Scoring

the test by writing and revising trial answers to the 10 ques~

swers. The essays were graded by giving up to 20 points for

aideas, supporting facts, and reasoning. t>C)rganizing and expressing ideas.

option is discussed here: two 500' to LOGO-word papers

student's life. These two papers were structured around Hole

The two short papers described the students experiences

ment. Content areas are ignored or misconstrued, or

for the assignment and shows a clear understanding

Writing-Skill Score Criteria

gression from one part of the assignment to another.

move through it, but there may be disiointedness or

scriptions of the chosen exercise and the psychological principles

was placed on their judgments except that they assign at

each writing sample. These values were correlated with the

judgments of the TAs and those of experienced composition

Results When the English instructors sorted the same writing

recorded over a semester. Scoring times were also obtained

for a sample of 26 of the short papers.

required to score it was 58 s. The median short'paper length

as totaling points and recording scores.

terscorer reliabilities. As a point of reference, the reliability

judge writing skill were made deliberately simple, and it may

count when they judged writing quality. They had been

papers may explain why there was high reliability between

essaytest answers and 8 min for two 750eword short papers).

directed at the philosophy or mechanics of the grading syse

be in place and that each student in the course continues to

McLeod, S. H. (Ed). (1988). Strengthening programs for writing

issue]. Teaching of Psychology, 17(4).

ments. Teaching of Psychology, 17, 1013.

1. A portion of this article was presented at the annual meeting of

1988. Z. We thank Gloria Collins, Lynette Derrickson, Donna Kleppin,

Teaching Hypothesis Testing by Debunking a Demonstration of Telepathy

the students and a confederate. Students generated alternative

Das könnte Ihnen auch gefallen