Sie sind auf Seite 1von 7

html http://clal.org.cn/personal/testing/MA/writing/madigan1991.pdf . G o o g l e html .

Page 1

METHODS AND TECHNIQUES


Holistic Grading of Written Work in Introductory Psychology: Reliability, Validity, and Efficiency
Robert]. Madigan
University of Alaska Anchorage

Iames I. Brosamer
Department of English University of Alaska Anchorage Written assignments in large classes can be made practical by the use of holistic scoring. Data are Presented to show the reliability, validity, and efficiency of adapting holistic scoring to written work in an introductory psychology course in which the grading was
done entirely by a teaching assistant (TA). Separate criteria were

used to score the content and writing skill of paragraph/length essaytest answers and shortpaper assignments. judgments of writing skill made by Psychology TAs using holistic criteria were reliable and compared favorably with writingskill judgments made by experienced English composition instructors We con~ clude that holistic scoring is a highquality, l0w~cost approach to scoring written work in large classes.
Many college faculty agree that writing assignments are valuable additions to college classes. For Boyer (1987), the

ability to communicate well in writing and speaking is one of the hallmarks of an educated person. He listed the develop' ment of writing skills among the major objectives of a college education. The growth and influence of the writing~across'
the~curriculum movement is well attested (Griffin, 1982; Maimon, 1982; McLeod, 1988). Its relevance to psychology courses was explored in a special issue of this journal (Nodine, 1990). In addition to developing writing skills, there are other reasons to include written work in courses. Writing is a special mode of learning (Emig, 1977); it can improve critical thinking (Fulwiler, 1982) by encouraging writers to scrutinize the language expressing their thoughts. Written examinations may produce better mastery of course material than do multiplechoice tests (Frederiksen, 1984). The cost of written work is that it must be read and graded. The prospect of the grading task, coupled with the

availability of multiple~choice questions from textbook pub~


lishers, prompts many teachers to eliminate written work

from large classes. Apart from the difficulties of grading,


there are legitimate questions about the quality of grading by faculty members whose formal training in writing may not exceed that of their students. White (1989) reported that many faculty members overemphasize mechanical aspects of writing at the expense of the more important conceptual and
organizational components.

Holistic grading may offer a way to address issues of work load and quality in evaluating written work. In holistic scor~

v01. 18, No. 2, April 1991


ing, the grader places a writing sample in one of a small number of descriptive categories (typically five or six) based
on the graders overall impression of the work (White,

1985). Each category is described by three or four sentences that identify its salient characteristics. This article describes our adaptation of holistic scoring to
written work in a medium'sized introductory psychology

class in which all grading was done by teaching assistants (TAs). This situation required an efficient system that could easily be taught to new TAs. But a useful scoring system must also provide students with reliable and valid judgments of
their written work. lntergrader reliability was a concern in

our study because the graders had no prior training in the


assessment of writing. It is natural to question whether psy~

chology TAs can make valid judgments about the writing


skills of freshmen.

In the following sections, we describe a holistic scoring system developed for this course and present data on its
efficiency, reliability, and validity. The grading system was

created for a General Psychology course taught in a lecture format to about 100 students. During the semester, a typical student wrote 4,000 words on essay exams and paper assignments. The primary grading was done by a TA and super~ vised by the course instructor.

The Scoring Approach The holistic scoring methods developed for the course
evolved from examples presented by White (1985) but differ from his in order to meet the pedagogical objectives of Gen eral Psychology. The course instructor intended the writing component of the course to foster content mastery and to support the development of students writing skills, whereas White was concerned only with the assessment of writing skill. New holistic scoring categories were developed to al~ low each writing sample to be given two scores, one for content and another for quality of expression. This distinc tion between content and writing skill tells the students that the ability to write well is important, and it provides them with direct feedback about the quality of their written work. The relative magnitude of the content and writing~skill
91

Page 2
scores was determined by the instructor's sense of their rela' tive pedagogical importance for the course. Content was weighted 80%, and writing skill was weighted 20%.

Another difference between our approach and White's


(1985) was to represent each holistic category by a range of numeric values for grading purposes. The specific numerical values associated with each category were determined by the instructor based on experience with the course and grading

preferences. The TA assigned a score from the range to indicate the quality of the students work within the category.
The course generated two different types of written work: (a) responses to short~answer essay'test questions and (b) short papers. Scoring criteria were developed for each.

Essay Test Scoring

Weekly tests consisted of eight multiple'choice and two essay questions. Ten essay questions were distributed in ad vance on a weekly study guide; two of tne questions were selected for the weekly test. Students also received handouts describing the grading system and suggesting ways of doing well on the tests. Students were encouraged to prepare for

the test by writing and revising trial answers to the 10 ques~


tions. A typical answer was 80 words. Madigan and Brosamer (1990) described the testing system in more detail. Table 1 presents the scoring system for these essay-test an-

swers. The essays were graded by giving up to 20 points for


Table 1. Grading Criteria for Essay Test Questions
Content Scorea Criteria 05 The answer is largely irrelevant to the question or shows major conceptual confusions. 61 3 The answer does not address significant portions of the question or contains significant factual errors. 141 9 The answer responds appropriately to all parts of the question but does not include sufficient depth, or it includes irrelevant material, or it contains minor factual errors.

20 The answer addresses the question completely. It is concise and may contain original observations or examples. There is no irrelevant material.

Writing-Skill Scoreb Criteria 0 The answer is clearly inappropriate to the question. i The answer is so lacking in coherence and unity that it is very difcult to follow, or the answer is so brief that it says almost nothing.
2 The answer does not develop in a manner appropriate to the question; in addition, the material is not presented in a unified, coherent paragraph. 3 Either the answer does not develop in a manner appropriate to the question or the answer is a poorly constructed paragraph. 4 The answer is a unified, coherent paragraph that addresses the question appropriately. lt adequately de velops and expresses the writers thoughts.

5 The answer is a well developed, unified, and coherent paragraph that skillfully addresses the question. The writing is more than adequate; it is good.

aideas, supporting facts, and reasoning. t>C)rganizing and expressing ideas.

92

the content otthe answer and up to 5 points tor writing skill. The grader read the student's work and assigned points for content and writing skill according to the categories given in Table l.

Short-Paper Scoring
Students could also earn course points by writing optional homework assignments. Qnly the most commonly selected

option is discussed here: two 500' to LOGO-word papers


describing the application of psychological principles to the

student's life. These two papers were structured around Hole


lands (1985) manual, which suggests self'improyernent ex, ercises based on psychological principles in such areas as study skills, progressive relaxation. and relating to children.

The two short papers described the students experiences


with these projects. Table 2 shows the scoring system for the papers. Each paper was given up to 60 points for its content and 15 points for writing skill. The TA was discouraged from writing comments, on the Table 2. Grading Criteria for Short-Paper Assignments
Content Score Criteriaa

044 The paper does not come to terms with the assign-

ment. Content areas are ignored or misconstrued, or


there are major errors in the interpretation of psychological principles central to the assignment,

4549 The paper ignores several content areas or shows a misunderstanding of the principles involved.
5054 The paper ignores more than one content area of the assignment but shows the writers ability to appreci-

ate key ideas and principles. 5559 The paper slights or ignores one of the content areas of the assignment. The writer demonstrates a clear understanding of the assignment and the psychological principles involved.
60 The paper deals fully with all content areas specified

for the assignment and shows a clear understanding


of the underlying psychological principles.

Writing-Skill Score Criteria


18 There is little development of ideas or no clear pro

gression from one part of the assignment to another.


There may be serious, frequent errors in sentence structure, usage, and mechanics. 911 The paper is organized enough to allow the reader to

move through it, but there may be disiointedness or


lack of focus in some sections. The paper may contain errors in mechanics, usage, and sentence structure. 1214 The paper is not as carefully organized or reasoned

as the full credit paper, but it is organized into unified, coherent sections and is largely free from serious errors in mechanics, usage, and sentence structure.
15 The paper shows careful organization and orderly

thinking. The transitions between content areas are smooth; the paper may organize the content areas
creatively to improve the readability of the paper;

and the author may make thoughtful comments. it is virtually free from errors in mechanics, usage, and
sentence structure. aContent requirements included an appropriate introduction, de-

scriptions of the chosen exercise and the psychological principles


involved, and an evaluation of the experience.

Teaching of Psychology

Page 3
papers. Evidence supporting the efficacy of instructor comments to improve student writing comes from studies in which students revised earlier drafts based on instructor
feedback (Willingham, 1990; Ziv, 1984). This was not the case here. Furthermore, handwritten comments are exceed' ingly time'consuming. Students did receive a model answer

when their tests were handed back, and they were encour'
aged to discuss tests and papers with the TA.

Scoring Reliability Method Reliability was assessed by comparing the grades assigned by a firstayear graduate student and an undergraduate psy~
chology senior, inexperienced as a TA. The graduate stu~

dent had served as the TA for the course during two se~ mesters. The TA spent about 4 hr training the psychology
senior in the scoring methods. Two essayatest answers writ, ten in a previous class were available for 30 students. The

answers were typed on separate sheets of paper and scored by the two assistants using the criteria in Table 1.
To assess the reliability of Table 2's scoring criteria for

short papers, a set of 23 papers graded during the semester by the TA were also graded by the psychology senior. All papers were typed.

Results The reliabilities for content scores given to essay~test an~ swers were .91 and .95 for the two questions analyzed. The writing/skill scores for these questions had reliabilities of .61

and .75. The shortpaper scores yielded a content reliability of .68 and a writing'skill reliability of .71. Scoring Validity Method
The validity analysis focused on whether the writing'skill

scores given by the TAs were consistent with judgments of writing skill made by better trained evaluators of written work. Two experienced instructors of freshman composition read the same essay~test answers and short papers that had been previously scored by the TAs. The English instructors worked independently of each other and judged the essays separately from the papers. They were told to assign each writing sample to one of five categories based on its quality. Writing quality was defined as a subjective term that gen~ erally refers to a writers ability to express thoughts in a clear,
organized, and interesting manner. No other constraint

was placed on their judgments except that they assign at


least one essay to each of the five quality categories. They were unfamiliar with the grading systems presented in Tables 1 and 2. Data from the English instructors were analyzed to determine interscorer reliability, and then their judgments were pooled to arrive at a mean writing~quality judgment for

each writing sample. These values were correlated with the


pooled writing~quality scores assigned by the TAs. This analysis assessed the agreement between the writing~quality

judgments of the TAs and those of experienced composition


instructors.
Vol. 18, No. 2, April 1991

Results When the English instructors sorted the same writing

samples into five writingaquality levels, their interscorer re~ liability was .68 for the test essays and .65 for the short
papers. The average of the TAs writing'skill scores were correlated with the corresponding judgments of the English

instructors and yielded .73 for the essay answers and .48 for the short papers. Scoring Efficiency Method and Results
The third area of interest was the efficiency of the scoring
system. Time required to score each essay~test answer was

recorded over a semester. Scoring times were also obtained

for a sample of 26 of the short papers.


The typical answer was 79.9 words. The average time

required to score it was 58 s. The median short'paper length


was 741 words; short papers required an average of 4 min, 5 s to grade. These grading times do not include activities such

as totaling points and recording scores.


General Discussion The scoring systems presented here have acceptable in~

terscorer reliabilities. As a point of reference, the reliability


of the system developed for writing samples collected in the California State University English Equivalency Examinaa tion is estimated to be about .78 (White, 1985). That system

has been carefully refined and is administered by experi~ enced, trained judges.

The writingskill judgments of the essay~test answers made by psychology TAs had impressive validity. We con'

clude that the psychology TAs are able to use holistic scoring criteria to make writing~skill judgments about paragraph' length test answers that are comparable to the judgments of professional raters who are better trained and more
experienced. However, the agreement between the TAs and the reach ers of freshman composition was not as high for the short papers. We attribute this to the more complex nature of the short~paper assignments. The criteria used by the TAs to

judge writing skill were made deliberately simple, and it may


be that the English instructors took more factors into ac;

count when they judged writing quality. They had been


instructed to judge writing quality based on the extent to which the writers thoughts were expressed in clear, orga1

nized, and interesting ways. They may have understood this to imply aspects of diction and style not captured by the
writingskill categories in Table 2. The use of somewhat different criteria to judge the short

papers may explain why there was high reliability between


the two psychologists and between the two English instruc~ tors, yet only moderate agreement across the two disciplines. The holistic scoring system of Table 2 focuses primarily on

the organization of the short papers. We do not know if the criteria could be successfully expanded to include other
important features of writing without compromising reliabil'
93

Page 4
ity. We suspect it would be difficult. Although the validity coefficient was not as high as we had hoped, judgments of the TAs were in substantial agreement with those of sophis ticated professionals in English. Students received reliable and useful information about at least one major dimension of their writing.
The data show that grading efficiency, the third major con cem of the project, is satisfactory. The total amount of time spent grading about 4,000 words written by a typical student in our introductory course is about 38 min (30 min to score 30

essaytest answers and 8 min for two 750eword short papers).


Our holistic grading approach appears to have reasonable

reliability, validity, and cost efficiency. It has functioned well for 3 years, serving almost 600 students under three
different TAs. Student feedback about the course has been solicited each semester, and not one complaint has been

directed at the philosophy or mechanics of the grading syse


tern. Perhaps the strongest testimony supporting the prace ticality of the grading system is the fact that it continues to

be in place and that each student in the course continues to


write about 4,000 words during the term. Because the grad ing burden is shifted to a TA, the instructor serves as a supervisor, consultant, and qualitycontrol monitor. The possibility that the demands of grading will lead the instruce tor to eliminate written work from the course is, therefore,

reduced. Competent, committed TAs are essential, but they have not been difficult to find.
References

Boyer, E. L. (1987). College: The undergraduate experience in Amer ica. New York: Harper & Row. Emig, ]. (1977). Writing as a mode of learning. College Composition
and Communication, 28, 122128. Frederiksen, N. (1984). The real test bias. American Psychologist,

39, l93~ZOZ.
Fulwiler, T. (1982). Writing: An act ofcognition. In C. W. Griffin (Ed. ), Teaching writing in all disciplines (pp. 1526), San Francis; co: JosseyeBass. Griffin, C. W. (Ed). (1982). Teaching writing in all disciplines. San Francisco: losseyBass. Holland, M. K. (1985). Usingpsychology (3rd ed.). Boston: Little, Brown. Madigan, R., & Brosamer, J. (1990). Improving the writing skills of students in introductory psychology. Teaching of Psychology,

17, 2730.
Maimon, E. P. (1982). Writing across the curriculum: Past, present, and future. In C. W. Griffin (Ed), Teaching writing in all disciplines (pp. 67~74). San Francisco: lossey-Bass.

McLeod, S. H. (Ed). (1988). Strengthening programs for writing


across the curriculum. San Francisco: losseyeBass. Nodine, B. F. (Ed). (1990). Psychologists teach writing [Special

issue]. Teaching of Psychology, 17(4).


White, E. M. (1985). Teaching and assessing writing San Francisco: JosseyeBass. White, E. M. (1989). Developing successful writing programs. San

Francisco: lossey'Bass.
Willingham, D. B. (1990). Effective feedback on written assign,

ments. Teaching of Psychology, 17, 1013.


Ziv, N. D. (1984). The effect of teacher comments on the writing of four college freshmen. In R. Beach 81 L. S. Birdwell (Eds), New directions in composition research (pp. 362380). New York: Guilford. Notes

1. A portion of this article was presented at the annual meeting of


the American Psychological Association, Atlanta, GA, August

1988. Z. We thank Gloria Collins, Lynette Derrickson, Donna Kleppin,


and Tamra Matlock for their help with the study and lodi Madigan and Susan Johnson for helpful comments on an earlier draft of this article. 3. Requests for reprints should be sent to Robert Madigan, Dee partment of Psychology, University of Alaska Anchorage, An~

chorage, AK 99508.

Teaching Hypothesis Testing by Debunking a Demonstration of Telepathy


John A. Bates
Department of Educational Foundations 8 Curriculum Georgia Southern University
Introductory psychology students were told that their instructor had tekpathic powers. The instructor demonstrated these psy~ chic" abilities by transmitting various images into the minds of

the students and a confederate. Students generated alternative


hypotheses to account for the phenomena and designed methods
94

to test their hypotheses. This article describes the methods used to perform the psychic acts and outlines the structure of the hy~ pothesistesting activity. By allowing students to experience firsthand the importance of the rules of science, this instruc tional method may encourage greater scientific skepticism than Teaching of Psychology

Das könnte Ihnen auch gefallen