Beruflich Dokumente
Kultur Dokumente
test. These sources are the basis of our data collection with regard to matching the TLU domain
in this assessment development project.
The rest of this paper is organized by a description of the test, including the tests
purpose, type, interpretation of scores, TLU domain, construct definition, Table of
Specifications, and description of test tasks. Next, we discuss the pilot test procedure, including
participants, administration, and scoring procedures. Our test results section will explain our data
using item statistics, descriptive statistics, reliability, and a description of the criterion-referenced
interpretation. Finally, in the discussion, we evaluate the test, including item performance, an
evaluation of test usefulness, an overall estimation of whether the test achieved its purpose, and a
reflection on the personal significance of the test. Following this, we provide references and
appendices of our actual test, instructions, Table of Specification, scoring key, teacher feedback
rubric and data, which is organized in tables.
Description of the Assessment
Purpose
The purpose of this assessment was to provide evidence of student proficiency on essay
writing according to criteria that parallel those used by the Accuplacer WritePlacer and the
newer Community College Placement Test (CCPT) as closely as possible, as both tests require
an essay component that students will be expected to perform well on in order to be admitted to
and placed in courses at community colleges in Colorado. The scores and feedback on this
formative assessment will positively impact the students by helping them to recognize their
strengths and weaknesses in essay writing and to develop the necessary skills for passing the
Accuplacer WritePlacer and CCPT examinations. It will also assist instructors of the course by
highlighting ways to improve curriculum design based on these same considerations.
essay that had a narrow, direct, non-reciprocal response to the input. The scale of the current test
being administered at the local community college (CCPT) ranges from zero-six, with zero
preventing admittance and six achieving admittance into mainstream composition courses.
However, scores of less than six, but higher than zero place students into non-accredited
remedial writing courses at the college (Informant in Testing Field, personal communication,
March 14, 2016). See Appendix A for a table that analyzes our TLU domain in greater detail.
Construct Definition
The construct that we intended to measure with this test was the ability to type a 300-600
word, proctored, untimed essay that exhibits the characteristics of: Purpose and Focus,
Organization and Structure, Development and Support, Sentence Variety and Style, Mechanical
Conventions, and Critical Thinking (Accuplacer WritePlacer Guide, 2008, p. 1). Since the
prompts are either persuasive, reflective or argumentative, this construct also included
knowledge of manipulative language functions and the academic register. The ability to read and
comprehend WritePlacer and CCPT prompts is also an element of this construct. Speaking and
listening were not tested, nor were most aspects of functional or sociolinguistic knowledge, such
as cultural references, idiomatic expression, or dialect.
Design of Test
This essay test includes instructions, a prompt, a scoring rubric, and a feedback guide.
The scoring rubric and feedback guide are based on the following table of specifications, which
is also included in Appendix B.
Table 1
Table of Specifications for Essay Design
Task
Essay
Scoring 0-6
Organization
+ Structure
Development
+ Support
Sentence
Style
Mechanical
Conventions
Critical
Thinking
Total
0-6
0-6
0-6
0-6
0-6
0-30
something positive you can do with it. Can any obstacle or disadvantage be turned into
something good? (Accuplacer WritePlacer Guide, 2008, p. 2).
Expected Response. The expected response was a typed reflective essay in English on
computers at the Adult Basic Education center. Responses should be a well-organized three- to
six-paragraph essay with a minimum of 300 words. The essay should have a clear introduction,
body, and conclusion paragraph (Informant in Testing Field, personal communication, March 14,
2016). Specific keywords related to the topic of the prompt (e.g., obstacle, disadvantage) were
assessed along with mastery of the following six elements of performance criteria: Purpose and
Focus; Organization and Structure; Development and Support; Sentence Variety and Style;
Mechanical Conventions; and Critical Thinking (Accuplacer WritePlacer Guide, 2008, p. 1).
Relationship between Input and Expected Response. The relationship between the
input and response was direct, narrow in scope, and nonreciprocal. Students were expected to
address the prompt fully but concisely, and they were not allowed to ask questions or otherwise
interact with the input. Some background knowledge about the topic was necessary to complete
the essay test. The Accuplacer WritePlacer and CCPT essay prompts are free of technical or
specific literary references requiring specialized knowledge, but they cover a number of general
fields and interests (Accuplacer WritePlacer Guide, 2008). Students must draw on a broad range
of experiences, learning and ideas to support their point of view on the issue in questions
(Accuplacer WritePlacer Guide, 2008, p. 1).
Scoring. The assessment was collected via printer by the tester and copied for a second
rater. Essays were scored according to the rater rubric in Appendix D. Student performance on
the six WritePlacer dimensions were enhanced by analytic criteria that were developed to better
explain each dimension to students and to improve scoring reliability. Performance on criteria
was assessed as having earned full credit, half credit, or no credit. Allowing for half credit
improves the accuracy of the assessment in differentiating performance, but we decided not to
break down the criteria beyond whole-half-zero in order to maintain good practicality of the
assessment by saving the rater some time. Full definitions for performance that earns full credit,
half credit, and no credit on each criterion is included as part of the rater rubric.
Structure and Time of the Assessment. This assessment had only one part, with only
one item. When students took this assessment, they did not have access to the grading rubric, so
it was important that they were able to write an essay that satisfied all grading criteria despite the
fact that they were contained within this single item, the essay. Although the essay was untimed,
it was recommended to take a maximum of two hours to administer (Informant in Testing Field,
personal communication, March 14, 2016). All ELL participants completed the assessment in
less than an hour and a half. The assessment took the raters about ten minutes per essay to assign
a score.
Item Development. On the actual CCPT and Accuplacer essay exams, the computer
software rates the essays holistically, from zero-six, but the software places equal emphasis on
the six elements of the essay mentioned above (Accuplacer WritePlacer Guide, 2008). In our
assessment, this objective of equal emphasis was ensured through the use of an analytic rubric
that weighs the dimensions equally. Each element was scored on its own from zero-six, and then
these scores were averaged for the final essay score. The advantages of analytic rubrics over
holistic ones are well documented, particularly when used for formative purposes. Miller, Lynn,
and Gronlund (2009) suggest that analytic rubrics enhance assessment reliability and fairness
(p. 249), whereas Crusan (2015) emphasizes that feedback given on analytic rubrics tends to be
more organized and easier for learners to make use of. Since this assessment is for formative
purposes, usefulness of feedback for students is crucial to the success of this assessment.
The six elements are each quite broad, so they were difficult to systematize for the
purposes of giving useful formative feedback to our students. And while they may serve
adequately for a computer rater, in our classroom, they were likely to involve a great deal of rater
idiosyncrasy, which was detrimental to the reliability of our assessment. For this reason, we
analyzed trends among the WritePlacer and CCPT sample essays and developed some narrower
criteria within each element. Each specific assessment criteria was rated from zero-two, so they
add up to zero-six for the total dimension score. For example, the element of Mechanical
Conventions comprises the criteria of: Indentation for paragraphs; Few or no grammatical errors;
Few or no spelling + punctuation errors. These issues occurred consistently within WritePlacer
example essays that received low scores for Mechanical Conventions but were consistently
absent on sample essays that received high scores on this dimension.
Deriving the criteria inductively, from samples, rather than deductively, from research,
meant that the design of this rubric still maintained a degree of subjectivity from us as raters.
Nevertheless, we reviewed the essays together and agreed that these were the criteria being
implicitly evaluated. The criteria on all other dimensions were derived in this same way. As
Nation (2009) points out, any assessment based on one piece of writing is not likely to be
reliable, so with this limitation in mind, the best we could do was to systematize our rating
process as much as the material allowed and to remain aware of the extant possibility of human
and computer program idiosyncrasy in assessing writing.
10
11
Administration
Administration of the assessment to the ELL participants was conducted during a
morning class by their ESL instructor in the programs computer lab. The tester (instructor) read
the instructions, passage and prompt aloud as well as gave each participant the handout with
instructions, passage and prompt taken from the WritePlacer Guide (see Appendix C for
handout). The tester next told participants they had two hours to complete the assignment.
Scratch paper was provided for students, but cell phones and dictionaries were not available for
use during the test. All three participants completed the assessment in less than one hour and
thirty minutes.
Scoring Procedures
Scoring procedures were conducted by two raters; one was the administrator of the
assessment and English instructor of the participants, and the other was a fellow English
instructor working on the test project. Both raters scored the participants essays as well as
sixteen sample essays from the WritePlacer Guide using the Rater Scoring Key (see Appendix
D). This scoring key included the following six performance criteria: Purpose and Focus,
Organization and Structure, Development and Support, Sentence Variety and Style, Mechanical
Conventions, and Critical Thinking (Accuplacer WritePlacer Guide, 2008). Raters scored the
essays analytically based on full credit, half credit, or no credit. Full definitions for performance
that earns full credit, half credit, and no credit on each criterion is included as part of the rater
rubric (see Appendix D). The scores from the sixteen sample essays were used to compare with
the essay scores from the three ELL participants. This allowed the testers to compare and
calculate inter-rater reliability between each other as well as the automated ratings of the samples
from the WritePlacer Guide. This provided two reliability statistics.
12
Results of Assessment
Item Statistics and Descriptive Statistics
The results of our essay exam pilot--organized by scoring dimension--are included in
Table 2, below. The scores provided by each rater were included as separate data points but used
together to calculate the descriptive statistics, as averaging them could decrease variation in
cases when the raters disagreed, and treating them totally separately would give an indication of
rater tendencies but not how the rubric performed overall.
Table 2
Descriptive Statistics for Essay Dimension Scores and Overall Scores
Dimension
Mean
Standard Deviation
Range
3.7
2.1
3.9
2.1
3.3
2.2
2.9
2.1
Mechanical Conventions
2.8
2.0
Critical Thinking
3.3
2.1
OVERALL
3.4
2.0
Given that sixteen of our nineteen sample essays were from the Accuplacer WritePlacer
Guide and represented each possible score twice, it was highly likely that our overall mean, SD,
and range would indicate significant variation across essays. However, since Accuplacer
WritePlacer reports scores holistically, the mean, SD, and range on individual dimensions was
important for establishing the measurement validity of this rubric. It can be seen that all
dimensions were used at both their maximum and minimum, indicating that they were useful for
13
differentiating essays based on quality. Their SDs were also all very similar, indicating that no
one dimension tended to produce more homogenous or heterogeneous scores than the others.
However, the dimension means range from 2.8-3.9, indicating that the Organization and
Structure of these essays tended to be scored more than an entire point higher than Mechanical
Conventions did. This may indicate that raters tended to assess the criteria in some dimensions
more harshly than others, which would need to be addressed by rater training for future uses of
this exam.
Rater Effects and Standard Error of Measurement
Our overall inter-rater differentials were 0.89 for the final (rounded) scores and 0.92 for
the process (unrounded) scores. Given that the rounding had little effect on overall reliability,
and Hallgren (2012) recommends using final scores for calculating inter-rater reliability (IRR),
we decided to do so. Hallgren (2012) also recommends using the statistical method of intra-class
correlation (ICC) for calculating IRR with interval data, as ICC allows for consideration of the
magnitude of disagreement more clearly than Cohens (1960) kappa does and is not affected
by chance agreement the way that rater percentage of agreement does.
The ICC was calculated through SPSS for Mac v. 23 using a two-way mixed model for
absolute agreement, as following the recommendations by Landers (2015) for calculating ICC
with consistent raters across the data set and data representing a sample of raters rather than a
fixed population. Given that our two raters were the same for all essays but that these findings
are intended to be useful for other raters as well, this seemed the most appropriate solution. The
result was = .923, indicating strong mean inter-rater reliability across all items. This indicates
that the rubric was effective for standardizing scores across raters.
14
Next, in order to calculate our rating accuracy against the WritePlacer sample essays, it
was necessary to make the grading scales equivalent. Since our scale was zero-six, and the
WritePlacer scale is one-eight, three steps were necessary to make the scores comparable. First,
we multiplied our raw (unrounded) scores by 1.17, which turned our seven-point continuous
scale into to an eight-point continuous scale. Second, we rounded the scores, changing our
continuous scale into an interval scale. Finally, we added one point to each of our scores,
changing our minimum score from zero to one. This translated our scores from a seven-point
continuous scale that began at zero into scores on an eight-point interval scale beginning at one,
allowing us to meaningfully compare our scores to the WritePlacer scores and determine our
standard error of measurement (SEM). Given that we were using this rubric to approximate
WritePlacer automated scores, we used the Writeplacer scores as true scores and calculated the
average SEM of both raters against the WritePlacer scores as 0.89 on the eight-point scale,
indicating that (assuming normally distributed scores) our scores using the zero-six rubric are
0.76 points from the true score at a 68% confidence interval or 1.52 points at a 90%
confidence interval. The complete data results of this test can be viewed in Appendix G.
Description of Mastery
All the ELL participants didnt achieve the high score of six, but scored high enough on
the pilot test to be admitted into non-accredited remedial writing classes at the community
college, since essay scores less than six require remedial placement (Informant in Testing Field,
personal communication, March 14th, 2016). Scores of six permit students admittance into
accredited composition courses, while scores of zero limit a students admission to the
community college. Because the purpose of this test was for formative assessment, complete
mastery of writing ability was not a consideration for the participants. The test results were used
15
to provide valuable feedback to the ELL participants regarding their writing abilities, as well as
to highlight to the instructor gaps in knowledge or recommendations in developing materials for
writing instruction.
Discussion
Critique of Item Performance
Since the essay test included only one item, but was rated analytically, we can discuss
item performance in terms of the prompt used and in terms of the analytic criteria used. Since
most of our samples were provided in a stratified set provided by Accuplacer WritePlacer, we
cant necessarily say whether or not this prompt succeeded in eliciting student responses that
could meaningfully differentiate students based on the intended construct. Our own students all
scored fours, which would be problematic with a larger and demonstrably heterogeneous pilot
group, but with only three students, we cant draw any conclusions.
The test prompt used in our pilot was problematic for a number of reasons. One example
involves the passage that accompanied the prompt, which was quite lengthy and somewhat
unrelated to the prompt. A result of this problem led to some of the Accuplacer WritePlacer
sample essays responding to the passage, rather than the actual prompt. By including a long,
disconnected passage as part of the essay prompt test-takers may experience confusion in
knowing which reading to respond to, the passage or the prompt. A second hindrance of the
passage was the topic specific vocabulary related to acting (e.g., cue, producer, drama, comedy),
which not all people are familiar with. All three ELL participants had a hard time comprehending
the vocabulary in the passage. A third example was the prompt itself, which was too open ended
and ambiguous in asking respondents Can any obstacle or disadvantage be turned into
something good? (Accuplacer WritePlacer Guide, 2008, p. 2). By using the words any and
16
something good, the prompt left open a wide range of options for respondents to address,
making it easier for respondents to go off topic. The occurrence of problematic test prompts
should encourage instructors to familiarize their students with all kinds of essay prompts, since
many being used by the WritePlacer and CCPT are confusing and problematic.
For the analytic scoring dimensions, based on the descriptive statistics provided above, it
appears that the dimensions differed slightly in average score but were all useful for the full
proficiency scale of zero-six and furthermore had very similar standard deviations. This indicates
that the dimensions performed as intended, meaningfully differentiating essays at all levels.
Evaluation of Test Usefulness
Reliability. Since this assessment consisted of only one item, and is intended to be used
formatively, but could be rated by a different rater in any given administration, the most
meaningful way to establish reliability for this assessment was to calculate the level of agreement
between raters on a common set of sample essays. As described above, the ICC was calculated at
= .923, indicating strong inter-rater reliability. This calculation is limited by both the small
number of cases (samples) and the small number of variables (raters), but for the present data set,
the reliability is strong.
Construct Validity. Given that this test was administered as formative preparation for
the WritePlacer or CCPT exam, and the construct it measures is essentially the ability to succeed
on the WritePlacer or CCPT (write an untimed, proctored, 300+ word essay that shows mastery
of the six WritePlacer assessment dimensions), our correspondence with the TLU-domain is
extremely strong.
Measurement Validity. As mentioned above, the raters together were able to use this
rubric to predict automated WritePlacer essay scores within 0.76 points at a 68% confidence
17
interval or 1.52 points at a 90% confidence interval. A SEM below one at 90% would certainly
be preferable, but given that these raters were untrained and had not previously calibrated
themselves with this rubric, this SEM can be seen as acceptable for indicating valid
measurement. There is room for improvement in the design of the rubrics sub-criteria, and in the
way that the raters interpret that criteria, but the present scores are not markedly invalid. See the
personal reflection, below, for how the criteria could possibly be improved for future use.
Consequential Validity / Impact. The test proved useful in examining the challenges
that ELLs have in writing (typing) coherent and cohesive 300 word essays in response to a
prompt. It revealed, for example, that all three ELL participants struggled with developing
introductory and concluding paragraphs. This fact shows the instructor that more time should
focus on teaching ELLs how to write complete introductory and concluding paragraphs complete
with hooks, details and support, main claims, and points of view.
The Essay Feedback and Scoring Rubric (see Appendix E) provided ELL participants
with helpful feedback regarding specific performance criteria they could improve upon. For
example, some students recognized their lack of stating a clear point of view on the topic and
will hopefully remember to do so in future writings. All three ELL participants said they were
glad to receive feedback on their essay scores and found the entire experience positive and
insightful. Although we didnt set up a pre-test post-test administration of the proper WritePlacer
exam to the impact of our formative assessment on their writing abilities, the subjective
impressions of the participants indicate that this assessment was valid for its purpose.
Practicality. This assessment was practical to administer and grade. Although the test is
technically untimed, all pilot students finished in under an hour and a half. Since they were able
to take the test at the Adult Basic Education center, which has a computer lab, there was no
18
financial cost associated with the administration of this assessment. The scoring took about ten
minutes per essay, which is short, yet the students still found the feedback provided quite helpful.
Overall Success of Assessment. Despite the low number of participants, the pilot
administration of this assessment was quite successful. The assessment was practical to
administer, showed good inter-rater reliability and strong construct validity, and resulted in
positive washback with the students. The ultimate test of washback will be the ability of the
students and instructor to make use of the assessment feedback to improve students essay
writing, but based on preliminary discussions, the students found the feedback specific, clear,
and useful.
Personal Reflection. Although this assessment was a success, there is still room for
improvement. Unfortunately, the prompt was outside of our control, but the rubric could be
improved as well. Most notably, there were a number of criteria for which the descriptions were
difficult to interpret for short versus long essays. On the criterion of logical sequencing of
paragraphs, for example, a student who wrote only a single paragraph could more easily score
full marks than a student who had written many paragraphs. The same is true for the descriptors
of having many or few errors--if raters are to count based on score alone, they are likely to
wind up penalizing longer essays even if those essays show a lower occurrence of errors
compared to total word count. For the criterion of staying on topic, again, a student who wrote a
single sentence response could theoretically score full marks, whereas a student who wrote a
much longer and more detailed piece that veered from time to time might be penalized.
It did not affect the present set of essays, but it also seemed questionable that attention to
topic was ultimately only worth 1/18 of the total essay score. This means that, theoretically, a
student could memorize a well-crafted essay on any topic they desired and replicate it during the
19
assessment to receive an aggregate score of 17/18, which would round to a perfect 6/6. Its
unclear from the WritePlacer literature whether the automated essay rater would be as bound by
the grading rubric in this way as human raters--the suggestion that a topic that does not address
the prompt would be disqualified does not appear anywhere in the WritePlacer literature or in the
assessment instructions.
Nevertheless, these issues can mostly be solved by carefully rewording the rubric to give
raters a framework for evaluating essays without accidentally binding them to illogical scoring
methods, such as rating a single-sentence essay full marks for being on topic 100% of the time.
In general, this assessment was informative to develop and useful to administer, and with luck, it
will continue to prove beneficial for the ELL instructors involved, especially for as long as the
WritePlacer and CCPT essay examinations still exist in their present form.
20
References
Accuplacer Website (2008). College Board WritePlacer Guide with Sample Essays. Retrieved
from https://secure-media.collegeboard.org/digitalServices/pdf/accuplacer/accuplacer-tsiwriteplacer-sample-essays.pdf
Bachman, L., & Palmer, A. (2010). Language assessment in practice. Oxford University Press.
Community College of Aurora Testing Center (n. d.). CCPT Study Workbook. Retrieved from
https://www.ccaurora.edu/sites/default/files/cca_files/file/Getting_Started/Testing/CCPTstudyworkbook-1.16.pdf
Crusan, D. (2015). Dance, ten; looks, three: Why rubrics matter. Assessing Writing 26, 14.
Hallgren, K. (2012). Computing inter-rater reliability for observational data: An overview and
tutorial. Tutorials in Quantitative Methods for Psychology, 8, 2334.
Landers, R. N. (2015). Computing intraclass correlations (ICC) as estimates of interrater
reliability in SPSS. The Winnower 2, 1-4.
Miller, M. D., Linn, R. L., & Gronlund, N. E. (2009). Measurement and assessment in teaching
(10th ed.). Upper Saddle River, NJ: Merrill, Prentice Hall.
Nation, I. S. P. (2008). Teaching ESL/EFL reading and writing. Abingdon-on-Thames, U.K.:
Routledge.
Weigle, S. C. (2007). Teaching writing teachers about assessment. Journal of Second Language
Writing 16, 194209.
21
Appendices
Appendix A
Revised TLU Domain Description
Setting
TLU Characteristic
Assessment Characteristic
Testing room
Unknown participants
Proctored by stranger
Scoring Rubric
Input
Holistic, 8-point
(WritePlacer) or 6-point
(CCPT) rubric
Untimed administration
Computer lab
Regular classmates as
participants
Proctored by regular
instructor
Analytic, 6-point rubric with
sub-categories
Untimed administration
Expected Response
Relationship between
Input and Expected
Response
Direct
Narrow
Non-Reciprocal
Direct
Narrow
Reciprocal (Teacher may
also provide clarification as
needed)
22
Appendix B
Revised Table of Specifications
Task
Essay
Scoring 0-6
Organization +
Structure
Development +
Support
Sentence
Style
Mechanical
Conventions
Critical
Thinking
Total
0-6
0-6
0-6
0-6
0-6
0-30
23
Appendix C
Copy of Revised Essay Prompt and Instructions
Instructions:
Type a 300-600 word essay in response to the following prompt. You will have as much time as
you need to complete this essay. You will be allowed to use scratch paper during test in case you
wish to complete brief outlines or notes. Dictionaries and mobile phones are not allowed to be
used in the testing room. Do not ask for assistance from the tester or from your fellow test-takers.
Prepare a multiple-paragraph writing sample of about 300-600 words on the topic below. You
should use the time available to plan, write, review and edit what you have written. Read the
assignment carefully before you begin to write.
Passage:
An actor, when his cue came, was unable to move onto the stage. He said, I cant get in, the
chair is in the way. And the producer said, Use the difficulty. If its a drama, pick the chair up
and smash it. If its comedy, fall over it. From this experience the actor concluded that in any
situation in life that is negative, there is something positive you can do with it.
Prompt:
Can any obstacle or disadvantage be turned into something good?
24
Appendix D
Copy of Rater Scoring Key
Full Credit: Paragraphs are in logical order and clearly separated by topic.
Half Credit: Paragraphs are mostly in logical order and mostly separated by topic.
No Credit: Paragraphs lack logical order and topical foci.
__ 2
__ 1
__ 0
Full Credit: Sentences are complete, connect ideas, and progress in logical order
Half Credit: Sentences are mostly complete, mostly connect ideas, and mostly progress in logical order
No Credit: Order of sentences is consistently jumbled and difficult to follow throughout essay
__ 2
__ 1
__ 0
__ 2
__ 1
__ 0
Full Credit: Arguments are supported with specific details and relevant examples.
Half Credit: Arguments have little support or few relevant examples provided.
No Credit: Few or no arguments presented.
__ 2
__ 1
__ 0
Full Credit: Sentences are varied, involving both simple and compound/complex structures.
Half Credit: Sentences are often simple, robotic and/or formulaic.
No Credit: Little or no sentence variation.
__ 2
__ 1
__ 0
__ 2
__ 1
__ 0
__ / 6 TOTAL for Sentence variety and Style (add section criteria scores)
Mechanical Conventions
__ 2
__ 1
__ 0
__ 2
__ 1
__ 0
__ 2
__ 1
__ 0
25
Critical Thinking
__ 2
__ 1
__ 0
__ 2
__ 1
__ 0
__ 2
__ 1
__ 0
26
27
Appendix E
__ 2 Full Credit: Clear point of view addresses purpose, audience and topic
__ 1 Half Credit: Purpose, audience, and topic are usually clear but may be inconsistent at times.
__ 0 No Credit: Point of view is confusing or never established.
__ 2 Full Credit: Topic and conclusion sentences are stated clearly
__ 1 Half Credit: Clear topic and conclusion sentences are sometimes but not always present.
__ 0 No Credit: There are very few clear topic or conclusion sentences.
__ 2 Full Credit: Stays on topic.
__ 1 Half Credit: Essay is mostly on topic but makes substantial digressions.
__ 0 No Credit: Essay is frequently off-topic.
__ / 6 TOTAL for Purpose and Focus (add section criteria scores)
Full Credit: Paragraphs are in logical order and clearly separated by topic.
Half Credit: Paragraphs are mostly in logical order and mostly separated by topic.
No Credit: Paragraphs lack logical order and topical foci.
__ 2
__ 1
__ 0
Full Credit: Sentences are complete, connect ideas, and progress in logical order
Half Credit: Sentences are mostly complete, mostly connect ideas, and mostly progress in
logical order
No Credit: Order of sentences is consistently jumbled and difficult to follow throughout essay
__ 2
__ 1
__ 0
28
__ 2
__ 1
__ 0
Full Credit: Arguments are supported with specific details and relevant examples.
Half Credit: Arguments have little support or few relevant examples provided.
No Credit: Few or no arguments presented.
__ 2
__ 1
__ 0
Full Credit: Sentences are varied, involving both simple and compound/complex structures.
Half Credit: Sentences are often simple, robotic and/or formulaic.
No Credit: Little or no sentence variation.
__ 2
__ 1
__ 0
__ 2
__ 1
__ 0
__ / 6 TOTAL for Sentence variety and Style (add section criteria scores)
Mechanical Conventions
__ 2
__ 1
__ 0
__ 2
__ 1
__ 0
__ 2
__ 1
__ 0
29
Critical Thinking
__ 2
__ 1
__ 0
__ 2
__ 1
__ 0
__ 2
__ 1
__ 0
30
Appendix F
Rater Scores for Pilot Essay Group
Essay Final +
Process
Scores
(Rater 1)
Final +
Process
Scores
(Rater 2)
Inter-rater
differential on Final
Integer Score
Inter-rater differential on
Decimal (Process) Score
#1
4 (4.2)
5 (4.7)
0.5
#2
6 (6.0)
6 (5.5)
0.5
#3
2 (2.2)
3 (2.7)
0.5
#4
6 (6.0)
6 (5.7)
0.3
#5
2 (1.8)
3 (3.3)
1.5
#6
0 (0.0)
0 (0.3)
0.3
#7
6 (6.0)
5 (5.2)
0.8
#8
3 (3.0)
3 (2.8)
0.2
#9
6 (5.6)
6 (5.7)
0.1
#10
0 (0.0)
2 (1.8)
1.8
#11
1 (1.0)
1 (1.3)
0.3
#12
0 (0.0)
0 (0.3)
0.3
#13
6 (5.8)
4 (3.7)
2.1
#14
3 (3.2)
1 (1.2)
0.9
#15
5 (4.6)
3 (3.0)
1.6
#16
5 (4.8)
4 (3.5)
1.3
4 (3.5)
4 (3.8)
0.3
4 (3.5)
4 (3.5)
0.0
4 (4.1)
3 (2.7)
1.4
Mean = 0.89
Mean = .92
31
Appendix G
Standardized Scores for Pilot Essay Group (all scores out of 1-8)
Essay
Score
(Krista)
Score
(John)
WritePlacer
Rater
2.0
0.5
2.0
0.0
1.0
0.0
0.5
0.5
1.0
10
1.0
11
0.5
12
0.0
13
1.5
14
1.5
15
0.5
16
1.0
Mean = .84