Running Head: Essay Test For Community College Admission 1

Running head: ESSAY TEST FOR COMMUNITY COLLEGE ADMISSION
Essay Test for Community College Admission

John Whalen and Krista Boddy
Colorado State University
ESSAY TEST FOR COMMUNITY COLLEGE ADMISSION
Essay Test for Community College Admission

For the adult English language learners (ELLs) at a local Adult Basic Education center
who are preparing to apply to community college, one of the most daunting admissions tasks
required of them is writing a 300-600 word timed essay. Since many ELLs in non-credit
community English language programs struggle with writing essays, we have developed a
practice essay test and rubric to assist these students in preparing for success on the essay portion
of their community college admissions test.
Some community colleges in Colorado still use the Accuplacer WritePlacer assessment
for admissions and placement purposes; however, many are transitioning to the Community
College Placement Test (CCPT), which utilizes virtually the same assessment criteria as the
WritePlacer but has a different automated rater and a slightly different rating scale. The CCPT
has been used experimentally in Colorado since fall 2015, and its pool of essays for scoring has
not been as fully developed as the pool for the Accuplacer WritePlacer, but is meant to
ultimately replace the Accuplacer WritePlacer (Informant in Testing Field, personal
communication, March 14th, 2016). Both admissions tests are computer-adaptive and score
essays automatically from a pool of sample essays in the software. A human rater then checks
the essay score against a holistic rubric to determine whether students should be placed into a
regular composition course or a remedial non-credit writing course (Informant in Testing Field,
personal communication, March 14, 2016). Most of our target population is interested in
attending a local community college, which uses the CCPT for admissions and placement.
Because there are presently few resources for the CCPT, and it is so closely related to the
WritePlacer, we have combined what we have learned from an interview with a Testing
Specialist with the WritePlacer materials available online to develop our formative practice essay
test. These sources are the basis of our data collection with regard to matching the TLU domain
in this assessment development project.
The rest of this paper is organized by a description of the test, including the tests
purpose, type, interpretation of scores, TLU domain, construct definition, Table of
Specifications, and description of test tasks. Next, we discuss the pilot test procedure, including
participants, administration, and scoring procedures. Our test results section will explain our data
using item statistics, descriptive statistics, reliability, and a description of the criterion-referenced
interpretation. Finally, in the discussion, we evaluate the test, including item performance, an
evaluation of test usefulness, an overall estimation of whether the test achieved its purpose, and a
reflection on the personal significance of the test. Following this, we provide references and
appendices of our actual test, instructions, Table of Specification, scoring key, teacher feedback
rubric and data, which is organized in tables.
Description of the Assessment
Purpose
The purpose of this assessment was to provide evidence of student proficiency on essay
writing according to criteria that parallel those used by the Accuplacer WritePlacer and the
newer Community College Placement Test (CCPT) as closely as possible, as both tests require
an essay component that students will be expected to perform well on in order to be admitted to
and placed in courses at community colleges in Colorado. The scores and feedback on this
formative assessment will positively impact the students by helping them to recognize their
strengths and weaknesses in essay writing and to develop the necessary skills for passing the
Accuplacer WritePlacer and CCPT examinations. It will also assist instructors of the course by
highlighting ways to improve curriculum design based on these same considerations.

Type of Test
The untimed essay exam was a formative proficiency test intended to match the TLU
domain of the admissions and placement tests currently being used at community colleges
throughout Colorado. To fit the TLU domain as closely as possible, we used a test prompt from
the Accuplacer WritePlacer Study Guide. For our purposes, this was a formative use of a
proficiency test which community college testing centers use for placement and admissions into
their programs. The results of our assessment provided useful feedback to the instructor and
participants in emphasizing areas that need further development.
Interpretation of Scores
The interpretation of scores from this test were criterion-referenced, as this test was a
formative assessment of essay writing ability which matches the TLU domain of community
college admissions and placement essays (e.g., CCPT and WritePlacer). Scores were rated for
individual students on a scale from zero to six based on six performance criteria developed from
both the CCPT Study Guide and Accuplacer WritePlacer Guide. Criteria included Purpose and
Focus, Organization and Structure, Development and Support, Sentence Variety and Style,
Mechanical Conventions, and Critical Thinking (Accuplacer WritePlacer Guide, 2008, p. 1).
Description of TLU Domain
The TLU task for this essay test involved typing a three-hundred-word essay in a
proctored, untimed, computerized testing room (Informant in Testing Field, personal
communication, March 14, 2016). The input was a brief written prompt on a reflective topic
which was taken from the WritePlacer Guide. This prompt asked the test-taker to write a
response about whether an obstacle or disadvantage can be turned into something good
(Accuplacer WritePlacer Guide, 2008, p. 2). The expected response was a 300-600 word typed
essay that had a narrow, direct, non-reciprocal response to the input. The scale of the current test
being administered at the local community college (CCPT) ranges from zero-six, with zero
preventing admittance and six achieving admittance into mainstream composition courses.
However, scores of less than six, but higher than zero place students into non-accredited
remedial writing courses at the college (Informant in Testing Field, personal communication,
March 14, 2016). See Appendix A for a table that analyzes our TLU domain in greater detail.
Construct Definition
The construct that we intended to measure with this test was the ability to type a 300-600
word, proctored, untimed essay that exhibits the characteristics of: Purpose and Focus,
Organization and Structure, Development and Support, Sentence Variety and Style, Mechanical
Conventions, and Critical Thinking (Accuplacer WritePlacer Guide, 2008, p. 1). Since the
prompts are either persuasive, reflective or argumentative, this construct also included
knowledge of manipulative language functions and the academic register. The ability to read and
comprehend WritePlacer and CCPT prompts is also an element of this construct. Speaking and
listening were not tested, nor were most aspects of functional or sociolinguistic knowledge, such
as cultural references, idiomatic expression, or dialect.
Design of Test
This essay test includes instructions, a prompt, a scoring rubric, and a feedback guide.
The scoring rubric and feedback guide are based on the following table of specifications, which
is also included in Appendix B.
Table 1
Table of Specifications for Essay Design
Task
Essay
Objective / Area of Focus

Purpose
+ Focus
Scoring 0-6
Organization
+ Structure
Development
+ Support
Sentence
Style
Mechanical
Conventions
Critical
Thinking
Total
0-6
0-6
0-6
0-6
0-6
0-30
Description of Test Tasks

Instructions. The instructions for the test were read to the students in live, spoken
English by the tester and then presented in written, hard copy form on a handout (see Appendix
C for instructions and prompt). The tester stated that test-takers were to type a 300-600 word
essay response to the prompt, which was also on the handout. Students had as much time as
needed to complete the essay, within the recommended two-hour window mentioned by an
informant in the testing field (Personal communication, March 14, 2016). Test-takers were
allowed to use scratch paper during the test in case they wanted to complete brief outlines or
notes. Dictionaries and mobile phones were not allowed in testing room or computer lab to
match the TLU domain (Informant in Testing Field, personal communication, March 14,
2016). Test-takers were not allowed to ask for assistance from the tester or fellow test-takers.
Input. The input format was in typed written English which was provided as a handout to
each test-taker. Appendix C shows the actual essay prompt that students were given. The
selected prompt comes from the Accuplacer WritePlacer Guide with Sample Essays.
Essay Prompt: An actor, when his cue came, was unable to move onto the stage. He
said, I cant get in, the chair is in the way. And the producer said, Use the difficulty. If
its a drama, pick the chair up and smash it. If its comedy, fall over it. From this
experience the actor concluded that in any situation in life that is negative, there is
something positive you can do with it. Can any obstacle or disadvantage be turned into
something good? (Accuplacer WritePlacer Guide, 2008, p. 2).
Expected Response. The expected response was a typed reflective essay in English on
computers at the Adult Basic Education center. Responses should be a well-organized three- to
six-paragraph essay with a minimum of 300 words. The essay should have a clear introduction,
body, and conclusion paragraph (Informant in Testing Field, personal communication, March 14,
2016). Specific keywords related to the topic of the prompt (e.g., obstacle, disadvantage) were
assessed along with mastery of the following six elements of performance criteria: Purpose and
Focus; Organization and Structure; Development and Support; Sentence Variety and Style;
Mechanical Conventions; and Critical Thinking (Accuplacer WritePlacer Guide, 2008, p. 1).
Relationship between Input and Expected Response. The relationship between the
input and response was direct, narrow in scope, and nonreciprocal. Students were expected to
address the prompt fully but concisely, and they were not allowed to ask questions or otherwise
interact with the input. Some background knowledge about the topic was necessary to complete
the essay test. The Accuplacer WritePlacer and CCPT essay prompts are free of technical or
specific literary references requiring specialized knowledge, but they cover a number of general
fields and interests (Accuplacer WritePlacer Guide, 2008). Students must draw on a broad range
of experiences, learning and ideas to support their point of view on the issue in questions
(Accuplacer WritePlacer Guide, 2008, p. 1).
Scoring. The assessment was collected via printer by the tester and copied for a second
rater. Essays were scored according to the rater rubric in Appendix D. Student performance on
the six WritePlacer dimensions were enhanced by analytic criteria that were developed to better
explain each dimension to students and to improve scoring reliability. Performance on criteria
was assessed as having earned full credit, half credit, or no credit. Allowing for half credit
improves the accuracy of the assessment in differentiating performance, but we decided not to
break down the criteria beyond whole-half-zero in order to maintain good practicality of the
assessment by saving the rater some time. Full definitions for performance that earns full credit,
half credit, and no credit on each criterion is included as part of the rater rubric.
Structure and Time of the Assessment. This assessment had only one part, with only
one item. When students took this assessment, they did not have access to the grading rubric, so
it was important that they were able to write an essay that satisfied all grading criteria despite the
fact that they were contained within this single item, the essay. Although the essay was untimed,
it was recommended to take a maximum of two hours to administer (Informant in Testing Field,
personal communication, March 14, 2016). All ELL participants completed the assessment in
less than an hour and a half. The assessment took the raters about ten minutes per essay to assign
a score.
Item Development. On the actual CCPT and Accuplacer essay exams, the computer
software rates the essays holistically, from zero-six, but the software places equal emphasis on
the six elements of the essay mentioned above (Accuplacer WritePlacer Guide, 2008). In our
assessment, this objective of equal emphasis was ensured through the use of an analytic rubric
that weighs the dimensions equally. Each element was scored on its own from zero-six, and then
these scores were averaged for the final essay score. The advantages of analytic rubrics over
holistic ones are well documented, particularly when used for formative purposes. Miller, Lynn,
and Gronlund (2009) suggest that analytic rubrics enhance assessment reliability and fairness
(p. 249), whereas Crusan (2015) emphasizes that feedback given on analytic rubrics tends to be
more organized and easier for learners to make use of. Since this assessment is for formative
purposes, usefulness of feedback for students is crucial to the success of this assessment.
The six elements are each quite broad, so they were difficult to systematize for the
purposes of giving useful formative feedback to our students. And while they may serve
adequately for a computer rater, in our classroom, they were likely to involve a great deal of rater
idiosyncrasy, which was detrimental to the reliability of our assessment. For this reason, we
analyzed trends among the WritePlacer and CCPT sample essays and developed some narrower
criteria within each element. Each specific assessment criteria was rated from zero-two, so they
add up to zero-six for the total dimension score. For example, the element of Mechanical
Conventions comprises the criteria of: Indentation for paragraphs; Few or no grammatical errors;
Few or no spelling + punctuation errors. These issues occurred consistently within WritePlacer
example essays that received low scores for Mechanical Conventions but were consistently
absent on sample essays that received high scores on this dimension.
Deriving the criteria inductively, from samples, rather than deductively, from research,
meant that the design of this rubric still maintained a degree of subjectivity from us as raters.
Nevertheless, we reviewed the essays together and agreed that these were the criteria being
implicitly evaluated. The criteria on all other dimensions were derived in this same way. As
Nation (2009) points out, any assessment based on one piece of writing is not likely to be
reliable, so with this limitation in mind, the best we could do was to systematize our rating
process as much as the material allowed and to remain aware of the extant possibility of human
and computer program idiosyncrasy in assessing writing.
10
Pilot Test Procedure

Participants
The participants in this pilot were split into two groups. The first group included sixteen
test-takers who took the WritePlacer examination at a registered location and whose essays
Accuplacer makes available as samples for teachers, students and researchers. Since the purpose
of this assessment was to prepare students for success on the WritePlacer, using our rubric to rate
the WritePlacer samples blindly gave us the opportunity to check our rating accuracy against the
WritePlacer automated rater. If our ratings were consistently very different from those produced
by the WritePlacer automated rater, then it would be impossible for our assessment to achieve its
purpose without either revising the rubric or training the raters.
Second, it is also important to pilot this assessment with actual students so that we could
see how they respond to the testing environment, the input, and the feedback form. Participants
were three advanced-level ELLs who attend English classes four days a week at an Adult Basic
Education program in Northern Colorado. All three female participants tested above a level five
on the TABE Reading test which placed them in the advanced ESL class. The student from
China, who is in her 30s, speaks Cantonese and Mandarin. She intends to enroll in a community
college in the near future. The participant from Dominican Republic, who is in her 40s, speaks
Spanish as an L1. Having a young child at home, she doesnt intend to enroll in higher education
for a few years. The third participant comes from Brazil, is 24-years-old, and speaks Portuguese
as an L1. She is currently an Engineering university student in her home country and temporarily
living the United States to learn English.
11
Administration
Administration of the assessment to the ELL participants was conducted during a
morning class by their ESL instructor in the programs computer lab. The tester (instructor) read
the instructions, passage and prompt aloud as well as gave each participant the handout with
instructions, passage and prompt taken from the WritePlacer Guide (see Appendix C for
handout). The tester next told participants they had two hours to complete the assignment.
Scratch paper was provided for students, but cell phones and dictionaries were not available for
use during the test. All three participants completed the assessment in less than one hour and
thirty minutes.
Scoring Procedures
Scoring procedures were conducted by two raters; one was the administrator of the
assessment and English instructor of the participants, and the other was a fellow English
instructor working on the test project. Both raters scored the participants essays as well as
sixteen sample essays from the WritePlacer Guide using the Rater Scoring Key (see Appendix
D). This scoring key included the following six performance criteria: Purpose and Focus,
Organization and Structure, Development and Support, Sentence Variety and Style, Mechanical
Conventions, and Critical Thinking (Accuplacer WritePlacer Guide, 2008). Raters scored the
essays analytically based on full credit, half credit, or no credit. Full definitions for performance
that earns full credit, half credit, and no credit on each criterion is included as part of the rater
rubric (see Appendix D). The scores from the sixteen sample essays were used to compare with
the essay scores from the three ELL participants. This allowed the testers to compare and
calculate inter-rater reliability between each other as well as the automated ratings of the samples
from the WritePlacer Guide. This provided two reliability statistics.
12
Results of Assessment
Item Statistics and Descriptive Statistics
The results of our essay exam pilot--organized by scoring dimension--are included in
Table 2, below. The scores provided by each rater were included as separate data points but used
together to calculate the descriptive statistics, as averaging them could decrease variation in
cases when the raters disagreed, and treating them totally separately would give an indication of
rater tendencies but not how the rubric performed overall.
Table 2
Descriptive Statistics for Essay Dimension Scores and Overall Scores
Dimension
Mean
Standard Deviation
Range
Purpose and Focus
3.7
2.1
Organization and Structure
3.9
2.1
Development and Support
3.3
2.2
Sentence Variety and Style
2.9
2.1
Mechanical Conventions
2.8
2.0
Critical Thinking
3.3
2.1
OVERALL
3.4
2.0
Given that sixteen of our nineteen sample essays were from the Accuplacer WritePlacer
Guide and represented each possible score twice, it was highly likely that our overall mean, SD,
and range would indicate significant variation across essays. However, since Accuplacer
WritePlacer reports scores holistically, the mean, SD, and range on individual dimensions was
important for establishing the measurement validity of this rubric. It can be seen that all
dimensions were used at both their maximum and minimum, indicating that they were useful for
13
differentiating essays based on quality. Their SDs were also all very similar, indicating that no
one dimension tended to produce more homogenous or heterogeneous scores than the others.
However, the dimension means range from 2.8-3.9, indicating that the Organization and
Structure of these essays tended to be scored more than an entire point higher than Mechanical
Conventions did. This may indicate that raters tended to assess the criteria in some dimensions
more harshly than others, which would need to be addressed by rater training for future uses of
this exam.
Rater Effects and Standard Error of Measurement
Our overall inter-rater differentials were 0.89 for the final (rounded) scores and 0.92 for
the process (unrounded) scores. Given that the rounding had little effect on overall reliability,
and Hallgren (2012) recommends using final scores for calculating inter-rater reliability (IRR),
we decided to do so. Hallgren (2012) also recommends using the statistical method of intra-class
correlation (ICC) for calculating IRR with interval data, as ICC allows for consideration of the
magnitude of disagreement more clearly than Cohens (1960) kappa does and is not affected
by chance agreement the way that rater percentage of agreement does.
The ICC was calculated through SPSS for Mac v. 23 using a two-way mixed model for
absolute agreement, as following the recommendations by Landers (2015) for calculating ICC
with consistent raters across the data set and data representing a sample of raters rather than a
fixed population. Given that our two raters were the same for all essays but that these findings
are intended to be useful for other raters as well, this seemed the most appropriate solution. The
result was = .923, indicating strong mean inter-rater reliability across all items. This indicates
that the rubric was effective for standardizing scores across raters.
14
Next, in order to calculate our rating accuracy against the WritePlacer sample essays, it
was necessary to make the grading scales equivalent. Since our scale was zero-six, and the
WritePlacer scale is one-eight, three steps were necessary to make the scores comparable. First,
we multiplied our raw (unrounded) scores by 1.17, which turned our seven-point continuous
scale into to an eight-point continuous scale. Second, we rounded the scores, changing our
continuous scale into an interval scale. Finally, we added one point to each of our scores,
changing our minimum score from zero to one. This translated our scores from a seven-point
continuous scale that began at zero into scores on an eight-point interval scale beginning at one,
allowing us to meaningfully compare our scores to the WritePlacer scores and determine our
standard error of measurement (SEM). Given that we were using this rubric to approximate
WritePlacer automated scores, we used the Writeplacer scores as true scores and calculated the
average SEM of both raters against the WritePlacer scores as 0.89 on the eight-point scale,
indicating that (assuming normally distributed scores) our scores using the zero-six rubric are
0.76 points from the true score at a 68% confidence interval or 1.52 points at a 90%
confidence interval. The complete data results of this test can be viewed in Appendix G.
Description of Mastery
All the ELL participants didnt achieve the high score of six, but scored high enough on
the pilot test to be admitted into non-accredited remedial writing classes at the community
college, since essay scores less than six require remedial placement (Informant in Testing Field,
personal communication, March 14th, 2016). Scores of six permit students admittance into
accredited composition courses, while scores of zero limit a students admission to the
community college. Because the purpose of this test was for formative assessment, complete
mastery of writing ability was not a consideration for the participants. The test results were used
15
to provide valuable feedback to the ELL participants regarding their writing abilities, as well as
to highlight to the instructor gaps in knowledge or recommendations in developing materials for
writing instruction.
Discussion
Critique of Item Performance
Since the essay test included only one item, but was rated analytically, we can discuss
item performance in terms of the prompt used and in terms of the analytic criteria used. Since
most of our samples were provided in a stratified set provided by Accuplacer WritePlacer, we
cant necessarily say whether or not this prompt succeeded in eliciting student responses that
could meaningfully differentiate students based on the intended construct. Our own students all
scored fours, which would be problematic with a larger and demonstrably heterogeneous pilot
group, but with only three students, we cant draw any conclusions.
The test prompt used in our pilot was problematic for a number of reasons. One example
involves the passage that accompanied the prompt, which was quite lengthy and somewhat
unrelated to the prompt. A result of this problem led to some of the Accuplacer WritePlacer
sample essays responding to the passage, rather than the actual prompt. By including a long,
disconnected passage as part of the essay prompt test-takers may experience confusion in
knowing which reading to respond to, the passage or the prompt. A second hindrance of the
passage was the topic specific vocabulary related to acting (e.g., cue, producer, drama, comedy),
which not all people are familiar with. All three ELL participants had a hard time comprehending
the vocabulary in the passage. A third example was the prompt itself, which was too open ended
and ambiguous in asking respondents Can any obstacle or disadvantage be turned into
something good? (Accuplacer WritePlacer Guide, 2008, p. 2). By using the words any and
16
something good, the prompt left open a wide range of options for respondents to address,
making it easier for respondents to go off topic. The occurrence of problematic test prompts
should encourage instructors to familiarize their students with all kinds of essay prompts, since
many being used by the WritePlacer and CCPT are confusing and problematic.
For the analytic scoring dimensions, based on the descriptive statistics provided above, it
appears that the dimensions differed slightly in average score but were all useful for the full
proficiency scale of zero-six and furthermore had very similar standard deviations. This indicates
that the dimensions performed as intended, meaningfully differentiating essays at all levels.
Evaluation of Test Usefulness
Reliability. Since this assessment consisted of only one item, and is intended to be used
formatively, but could be rated by a different rater in any given administration, the most
meaningful way to establish reliability for this assessment was to calculate the level of agreement
between raters on a common set of sample essays. As described above, the ICC was calculated at
= .923, indicating strong inter-rater reliability. This calculation is limited by both the small
number of cases (samples) and the small number of variables (raters), but for the present data set,
the reliability is strong.
Construct Validity. Given that this test was administered as formative preparation for
the WritePlacer or CCPT exam, and the construct it measures is essentially the ability to succeed
on the WritePlacer or CCPT (write an untimed, proctored, 300+ word essay that shows mastery
of the six WritePlacer assessment dimensions), our correspondence with the TLU-domain is
extremely strong.
Measurement Validity. As mentioned above, the raters together were able to use this
rubric to predict automated WritePlacer essay scores within 0.76 points at a 68% confidence
17
interval or 1.52 points at a 90% confidence interval. A SEM below one at 90% would certainly
be preferable, but given that these raters were untrained and had not previously calibrated
themselves with this rubric, this SEM can be seen as acceptable for indicating valid
measurement. There is room for improvement in the design of the rubrics sub-criteria, and in the
way that the raters interpret that criteria, but the present scores are not markedly invalid. See the
personal reflection, below, for how the criteria could possibly be improved for future use.
Consequential Validity / Impact. The test proved useful in examining the challenges
that ELLs have in writing (typing) coherent and cohesive 300 word essays in response to a
prompt. It revealed, for example, that all three ELL participants struggled with developing
introductory and concluding paragraphs. This fact shows the instructor that more time should
focus on teaching ELLs how to write complete introductory and concluding paragraphs complete
with hooks, details and support, main claims, and points of view.
The Essay Feedback and Scoring Rubric (see Appendix E) provided ELL participants
with helpful feedback regarding specific performance criteria they could improve upon. For
example, some students recognized their lack of stating a clear point of view on the topic and
will hopefully remember to do so in future writings. All three ELL participants said they were
glad to receive feedback on their essay scores and found the entire experience positive and
insightful. Although we didnt set up a pre-test post-test administration of the proper WritePlacer
exam to the impact of our formative assessment on their writing abilities, the subjective
impressions of the participants indicate that this assessment was valid for its purpose.
Practicality. This assessment was practical to administer and grade. Although the test is
technically untimed, all pilot students finished in under an hour and a half. Since they were able
to take the test at the Adult Basic Education center, which has a computer lab, there was no
18
financial cost associated with the administration of this assessment. The scoring took about ten
minutes per essay, which is short, yet the students still found the feedback provided quite helpful.
Overall Success of Assessment. Despite the low number of participants, the pilot
administration of this assessment was quite successful. The assessment was practical to
administer, showed good inter-rater reliability and strong construct validity, and resulted in
positive washback with the students. The ultimate test of washback will be the ability of the
students and instructor to make use of the assessment feedback to improve students essay
writing, but based on preliminary discussions, the students found the feedback specific, clear,
and useful.
Personal Reflection. Although this assessment was a success, there is still room for
improvement. Unfortunately, the prompt was outside of our control, but the rubric could be
improved as well. Most notably, there were a number of criteria for which the descriptions were
difficult to interpret for short versus long essays. On the criterion of logical sequencing of
paragraphs, for example, a student who wrote only a single paragraph could more easily score
full marks than a student who had written many paragraphs. The same is true for the descriptors
of having many or few errors--if raters are to count based on score alone, they are likely to
wind up penalizing longer essays even if those essays show a lower occurrence of errors
compared to total word count. For the criterion of staying on topic, again, a student who wrote a
single sentence response could theoretically score full marks, whereas a student who wrote a
much longer and more detailed piece that veered from time to time might be penalized.
It did not affect the present set of essays, but it also seemed questionable that attention to
topic was ultimately only worth 1/18 of the total essay score. This means that, theoretically, a
student could memorize a well-crafted essay on any topic they desired and replicate it during the
19
assessment to receive an aggregate score of 17/18, which would round to a perfect 6/6. Its
unclear from the WritePlacer literature whether the automated essay rater would be as bound by
the grading rubric in this way as human raters--the suggestion that a topic that does not address
the prompt would be disqualified does not appear anywhere in the WritePlacer literature or in the
assessment instructions.
Nevertheless, these issues can mostly be solved by carefully rewording the rubric to give
raters a framework for evaluating essays without accidentally binding them to illogical scoring
methods, such as rating a single-sentence essay full marks for being on topic 100% of the time.
In general, this assessment was informative to develop and useful to administer, and with luck, it
will continue to prove beneficial for the ELL instructors involved, especially for as long as the
WritePlacer and CCPT essay examinations still exist in their present form.
20
References
Accuplacer Website (2008). College Board WritePlacer Guide with Sample Essays. Retrieved
from https://secure-media.collegeboard.org/digitalServices/pdf/accuplacer/accuplacer-tsiwriteplacer-sample-essays.pdf
Bachman, L., & Palmer, A. (2010). Language assessment in practice. Oxford University Press.
Community College of Aurora Testing Center (n. d.). CCPT Study Workbook. Retrieved from
https://www.ccaurora.edu/sites/default/files/cca_files/file/Getting_Started/Testing/CCPTstudyworkbook-1.16.pdf
Crusan, D. (2015). Dance, ten; looks, three: Why rubrics matter. Assessing Writing 26, 14.
Hallgren, K. (2012). Computing inter-rater reliability for observational data: An overview and
tutorial. Tutorials in Quantitative Methods for Psychology, 8, 2334.
Landers, R. N. (2015). Computing intraclass correlations (ICC) as estimates of interrater
reliability in SPSS. The Winnower 2, 1-4.
Miller, M. D., Linn, R. L., & Gronlund, N. E. (2009). Measurement and assessment in teaching
(10th ed.). Upper Saddle River, NJ: Merrill, Prentice Hall.
Nation, I. S. P. (2008). Teaching ESL/EFL reading and writing. Abingdon-on-Thames, U.K.:
Routledge.
Weigle, S. C. (2007). Teaching writing teachers about assessment. Journal of Second Language
Writing 16, 194209.
21
Appendices
Appendix A
Revised TLU Domain Description
Setting
TLU Characteristic
Assessment Characteristic
Testing room
Unknown participants
Proctored by stranger
Scoring Rubric
Input
Holistic, 8-point
(WritePlacer) or 6-point
(CCPT) rubric
Untimed administration
1-2 sentence typed prompt

presented on computer
Computer lab
Regular classmates as
participants
Proctored by regular
instructor
Analytic, 6-point rubric with
sub-categories
Untimed administration
1-2 sentence typed prompt

presented on printed paper
Teacher may also provide
verbal clarification as needed
Expected Response
300-600 word essay

Typed
Academic register
Ideational and manipulative
300-600 word essay

Typed
Academic register
Ideational and manipulative
Relationship between
Input and Expected
Response
Direct
Narrow
Non-Reciprocal
Direct
Narrow
Reciprocal (Teacher may
also provide clarification as
needed)
22
Appendix B
Revised Table of Specifications
Task
Essay
Objective / Area of Focus

Purpose +
Focus
Scoring 0-6
Organization +
Structure
Development +
Support
Sentence
Style
Mechanical
Conventions
Critical
Thinking
Total
0-6
0-6
0-6
0-6
0-6
0-30
23
Appendix C
Copy of Revised Essay Prompt and Instructions
Instructions:
Type a 300-600 word essay in response to the following prompt. You will have as much time as
you need to complete this essay. You will be allowed to use scratch paper during test in case you
wish to complete brief outlines or notes. Dictionaries and mobile phones are not allowed to be
used in the testing room. Do not ask for assistance from the tester or from your fellow test-takers.
Prepare a multiple-paragraph writing sample of about 300-600 words on the topic below. You
should use the time available to plan, write, review and edit what you have written. Read the
assignment carefully before you begin to write.
Passage:
An actor, when his cue came, was unable to move onto the stage. He said, I cant get in, the
chair is in the way. And the producer said, Use the difficulty. If its a drama, pick the chair up
and smash it. If its comedy, fall over it. From this experience the actor concluded that in any
situation in life that is negative, there is something positive you can do with it.
Prompt:
Can any obstacle or disadvantage be turned into something good?
(Accuplacer WritePlacer Guide, 2008, p. 2)
24
Appendix D
Copy of Rater Scoring Key
Essay Scoring Rubric

Instructions for raters: Performance on the 6 major elements of this essay is assessed
according to a set of specific criteria within each element. Score descriptors are used to
indicate the characteristics of a typical essay that earns full, half, or no credit on each criteria.
Check the score corresponding to the description that best matches the essay you are rating.
Once you have assessed all of the criteria within a major element, add the corresponding
scores to get the TOTAL score for that element. Once you have a total for all elements,
average those scores to get the overall essay score.
This is not the feedback rubric that you will distribute to students; this is only for your
reference as a rater.
Purpose and Focus

__ 2 Full Credit: Clear point of view addresses purpose, audience and topic
__ 1 Half Credit: Purpose, audience, and topic are usually clear but may be inconsistent at times.
__ 0 No Credit: Point of view is confusing or never established.
__ 2 Full Credit: Topic and conclusion sentences are stated clearly
__ 1 Half Credit: Clear topic and conclusion sentences are sometimes but not always present.
__ 0 No Credit: There are very few clear topic or conclusion sentences.
__ 2 Full Credit: Stays on topic.
__ 1 Half Credit: Essay is mostly on topic but makes substantial digressions.
__ 0 No Credit: Essay is frequently off-topic.
__ / 6 TOTAL for Purpose and Focus (add section criteria scores)

__ 2
__ 1
__ 0
Full Credit: Paragraphs are in logical order and clearly separated by topic.
Half Credit: Paragraphs are mostly in logical order and mostly separated by topic.
No Credit: Paragraphs lack logical order and topical foci.
__ 2
__ 1
__ 0
Full Credit: Sentences are complete, connect ideas, and progress in logical order
Half Credit: Sentences are mostly complete, mostly connect ideas, and mostly progress in logical order
No Credit: Order of sentences is consistently jumbled and difficult to follow throughout essay
__ 2
__ 1
__ 0
Full Credit: Intro + Conclusion are present, effective and concise.

Half Credit: Essay may be missing either the intro or the conclusion, or they may both be ineffective.
No Credit: Introduction and conclusion are absent.
__ / 6 TOTAL for Organization and Structure (add section criteria scores)

__ 2
__ 1
__ 0
Full Credit: Ideas are clearly expressed and expanded upon.

Half Credit: Most ideas are clear and presented with some support.
No Credit: Ideas are unclear and/or confusing. Lacks sufficient supportive details.
__ 2
__ 1
__ 0
Full Credit: Arguments are supported with specific details and relevant examples.
Half Credit: Arguments have little support or few relevant examples provided.
No Credit: Few or no arguments presented.
__ 2
__ 1
__ 0
Full Credit: Plenty of supportive evidence.

Half Credit: Some supportive evidence is present, but more is needed.
No Credit: Few or no evidence support.
__ / 6 TOTAL for Development and Support (add section criteria scores)

__ 2
__ 1
__ 0
Full Credit: Sentences are varied, involving both simple and compound/complex structures.
Half Credit: Sentences are often simple, robotic and/or formulaic.
No Credit: Little or no sentence variation.
__ 2
__ 1
__ 0
Full Credit: Academic vocabulary expressed.

Half Credit: Essay presents some academic vocabulary, but could use more specific terms.
No Credit: Few or no academic vocabulary expressed.
__ 2
__ 1
__ 0
Full Credit: Use of transitions.

Half Credit: A few examples of transitions throughout essay, although more are needed.
No Credit: One or no transitions used.
__ / 6 TOTAL for Sentence variety and Style (add section criteria scores)
__ 2
__ 1
__ 0
Full Credit: Indentation for paragraphs.

Half Credit: Paragraphs are mostly indented, with only a couple exceptions.
No Credit: Several paragraphs are missing indents.
__ 2
__ 1
__ 0
Full Credit: Few or no grammatical errors.

Half Credit: There are a handful of grammatical errors in the essay.
No Credit: Essay contains a significant number of grammatical errors.
__ 2
__ 1
__ 0
Full Credit: Few or no spelling, punctuation, and capitalization errors.

Half Credit: Essay contains some spelling errors, but they do not interfere with comprehensibility.
No Credit: Enough spelling errors exist in the essay to significantly impact comprehensibility.
__ / 6 TOTAL for mechanical Conventions (add section criteria scores)
25
Critical Thinking
__ 2
__ 1
__ 0
Full Credit: Topic is fully addressed.

Half Credit: Topic is partially addressed, but missing key points.
No Credit: Topic is unclear or briefly mentioned.
__ 2
__ 1
__ 0
Full Credit: Clear reasoning and complexity of thought expressed.

Half Credit: Essay presents some logical reasoning, but lacks complexity of ideas.
No Credit: Essay presents little or no logical reasoning/complex ideas.
__ 2
__ 1
__ 0
Full Credit: Clear relationships between ideas.

Half Credit: Ideas are sometimes connected, but some relationships are not clear.
No Credit: Ideas have little or no clear relationships.
__ / 6 TOTAL for Critical Thinking (add section criteria scores)
Overall Essay Score (Average of section totals) __/6
26
27
Appendix E
Essay Feedback & Scoring Rubric
Essay Feedback + Scoring Rubric
Instructor Comments and Feedback
Purpose and Focus
The extent to which you present information in a unified

and coherent manner, clearly addressing the issue.
__ 2 Full Credit: Clear point of view addresses purpose, audience and topic
__ 1 Half Credit: Purpose, audience, and topic are usually clear but may be inconsistent at times.
__ 0 No Credit: Point of view is confusing or never established.
__ 2 Full Credit: Topic and conclusion sentences are stated clearly
__ 1 Half Credit: Clear topic and conclusion sentences are sometimes but not always present.
__ 0 No Credit: There are very few clear topic or conclusion sentences.
__ 2 Full Credit: Stays on topic.
__ 1 Half Credit: Essay is mostly on topic but makes substantial digressions.
__ 0 No Credit: Essay is frequently off-topic.
__ / 6 TOTAL for Purpose and Focus (add section criteria scores)

__ 2
__ 1
__ 0
Full Credit: Paragraphs are in logical order and clearly separated by topic.
Half Credit: Paragraphs are mostly in logical order and mostly separated by topic.
No Credit: Paragraphs lack logical order and topical foci.
__ 2
__ 1
__ 0
Full Credit: Sentences are complete, connect ideas, and progress in logical order
Half Credit: Sentences are mostly complete, mostly connect ideas, and mostly progress in
logical order
No Credit: Order of sentences is consistently jumbled and difficult to follow throughout essay
__ 2
__ 1
__ 0
Full Credit: Intro + Conclusion are present, effective and concise.

Half Credit: Essay may be missing either the intro or the conclusion, or both are ineffective.
No Credit: Introduction and conclusion are absent.
The extent to which you order and connect ideas.
28
__ / 6 TOTAL for Organization and Structure (add section criteria scores)

__ 2
__ 1
__ 0
Full Credit: Ideas are clearly expressed and expanded upon.

Half Credit: Most ideas are clear and presented with some support.
No Credit: Ideas are unclear and/or confusing. Lacks sufficient supportive details.
__ 2
__ 1
__ 0
Full Credit: Arguments are supported with specific details and relevant examples.
Half Credit: Arguments have little support or few relevant examples provided.
No Credit: Few or no arguments presented.
__ 2
__ 1
__ 0
Full Credit: Plenty of supportive evidence.

Half Credit: Some supportive evidence is present, but more is needed.
No Credit: Few or no evidence support.
The extent to which you develop and support ideas.
__ / 6 TOTAL for Development and Support (add section criteria scores)

__ 2
__ 1
__ 0
Full Credit: Sentences are varied, involving both simple and compound/complex structures.
Half Credit: Sentences are often simple, robotic and/or formulaic.
No Credit: Little or no sentence variation.
__ 2
__ 1
__ 0
Full Credit: Academic vocabulary expressed.

Half Credit: Essay presents some academic vocabulary, but could use more specific terms.
No Credit: Few or no academic vocabulary expressed.
__ 2
__ 1
__ 0
Full Credit: Use of transitions.

Half Credit: A few examples of transitions throughout essay, although more are needed.
No Credit: One or no transitions used.
__ / 6 TOTAL for Sentence variety and Style (add section criteria scores)
The extent to which you craft sentences and paragraphs

demonstrating control of vocabulary, voice and structure.
__ 2
__ 1
__ 0
Full Credit: Indentation for paragraphs.

Half Credit: Paragraphs are mostly indented, with only a couple exceptions.
No Credit: Several paragraphs are missing indents.
__ 2
__ 1
__ 0
Full Credit: Few or no grammatical errors.

Half Credit: There are a handful of grammatical errors in the essay.
No Credit: Essay contains a significant number of grammatical errors.
__ 2
__ 1
__ 0
Full Credit: Few or no spelling, punctuation, and capitalization errors.

Half Credit: Essay contains some spelling errors, but they do not interfere with
comprehensibility.
No Credit: Enough spelling errors exist in the essay to significantly impact comprehensibility.
29
The extent to which you express ideas using Standard

Written English.
__ / 6 TOTAL for mechanical Conventions (add section criteria scores)
Critical Thinking
__ 2
__ 1
__ 0
Full Credit: Topic is fully addressed.

Half Credit: Topic is partially addressed, but missing key points.
No Credit: Topic is unclear or briefly mentioned.
__ 2
__ 1
__ 0
Full Credit: Clear reasoning and complexity of thought expressed.

Half Credit: Essay presents some logical reasoning, but lacks complexity of ideas.
No Credit: Essay presents little or no logical reasoning/complex ideas.
__ 2
__ 1
__ 0
Full Credit: Clear relationships between ideas.

Half Credit: Ideas are sometimes connected, but some relationships are not clear.
No Credit: Ideas have little or no clear relationships.
__ / 6 TOTAL for Critical Thinking (add section criteria scores)
Overall Essay Score (Average of section totals) __/6
The extent to which you communicate a point of view

and demonstrate reasoned relationships among ideas.
30
Appendix F
Rater Scores for Pilot Essay Group
Essay Final +
Process
Scores
(Rater 1)
Final +
Process
Scores
(Rater 2)
Inter-rater
differential on Final
Integer Score
Inter-rater differential on
Decimal (Process) Score
#1
4 (4.2)
5 (4.7)
0.5
#2
6 (6.0)
6 (5.5)
0.5
#3
2 (2.2)
3 (2.7)
0.5
#4
6 (6.0)
6 (5.7)
0.3
#5
2 (1.8)
3 (3.3)
1.5
#6
0 (0.0)
0 (0.3)
0.3
#7
6 (6.0)
5 (5.2)
0.8
#8
3 (3.0)
3 (2.8)
0.2
#9
6 (5.6)
6 (5.7)
0.1
#10
0 (0.0)
2 (1.8)
1.8
#11
1 (1.0)
1 (1.3)
0.3
#12
0 (0.0)
0 (0.3)
0.3
#13
6 (5.8)
4 (3.7)
2.1
#14
3 (3.2)
1 (1.2)
0.9
#15
5 (4.6)
3 (3.0)
1.6
#16
5 (4.8)
4 (3.5)
1.3
4 (3.5)
4 (3.8)
0.3
4 (3.5)
4 (3.5)
0.0
4 (4.1)
3 (2.7)
1.4
Mean = 0.89
Mean = .92
31
Appendix G
Standardized Scores for Pilot Essay Group (all scores out of 1-8)
Essay
Score
(Krista)
Score
(John)
WritePlacer
Rater
SEM of Raters vs. Writeplacer

Automated Scores
2.0
0.5
2.0
0.0
1.0
0.0
0.5
0.5
1.0
10
1.0
11
0.5
12
0.0
13
1.5
14
1.5
15
0.5
16
1.0
Mean = .84

Running Head: Essay Test For Community College Admission 1

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Running Head: Essay Test For Community College Admission 1

Hochgeladen von

Copyright:

Verfügbare Formate

Running head: ESSAY TEST FOR COMMUNITY COLLEGE ADMISSION

Essay Test for Community College Admission

ESSAY TEST FOR COMMUNITY COLLEGE ADMISSION

Essay Test for Community College Admission

ESSAY TEST FOR COMMUNITY COLLEGE ADMISSION

ESSAY TEST FOR COMMUNITY COLLEGE ADMISSION

ESSAY TEST FOR COMMUNITY COLLEGE ADMISSION

ESSAY TEST FOR COMMUNITY COLLEGE ADMISSION

Objective / Area of Focus

Description of Test Tasks

ESSAY TEST FOR COMMUNITY COLLEGE ADMISSION

ESSAY TEST FOR COMMUNITY COLLEGE ADMISSION

ESSAY TEST FOR COMMUNITY COLLEGE ADMISSION

ESSAY TEST FOR COMMUNITY COLLEGE ADMISSION

Pilot Test Procedure

ESSAY TEST FOR COMMUNITY COLLEGE ADMISSION

ESSAY TEST FOR COMMUNITY COLLEGE ADMISSION

Purpose and Focus

Organization and Structure

Development and Support

Sentence Variety and Style

ESSAY TEST FOR COMMUNITY COLLEGE ADMISSION

ESSAY TEST FOR COMMUNITY COLLEGE ADMISSION

ESSAY TEST FOR COMMUNITY COLLEGE ADMISSION

ESSAY TEST FOR COMMUNITY COLLEGE ADMISSION

ESSAY TEST FOR COMMUNITY COLLEGE ADMISSION

ESSAY TEST FOR COMMUNITY COLLEGE ADMISSION

ESSAY TEST FOR COMMUNITY COLLEGE ADMISSION

ESSAY TEST FOR COMMUNITY COLLEGE ADMISSION

ESSAY TEST FOR COMMUNITY COLLEGE ADMISSION

1-2 sentence typed prompt

1-2 sentence typed prompt

300-600 word essay

300-600 word essay

ESSAY TEST FOR COMMUNITY COLLEGE ADMISSION

Objective / Area of Focus

ESSAY TEST FOR COMMUNITY COLLEGE ADMISSION

(Accuplacer WritePlacer Guide, 2008, p. 2)

ESSAY TEST FOR COMMUNITY COLLEGE ADMISSION

Essay Scoring Rubric

Purpose and Focus

Organization and Structure

Full Credit: Intro + Conclusion are present, effective and concise.

__ / 6 TOTAL for Organization and Structure (add section criteria scores)

ESSAY TEST FOR COMMUNITY COLLEGE ADMISSION

Development and Support

Full Credit: Ideas are clearly expressed and expanded upon.

Full Credit: Plenty of supportive evidence.

__ / 6 TOTAL for Development and Support (add section criteria scores)

Sentence Variety and Style

Full Credit: Academic vocabulary expressed.

Full Credit: Use of transitions.

Full Credit: Indentation for paragraphs.

Full Credit: Few or no grammatical errors.

Full Credit: Few or no spelling, punctuation, and capitalization errors.

__ / 6 TOTAL for mechanical Conventions (add section criteria scores)

ESSAY TEST FOR COMMUNITY COLLEGE ADMISSION

Full Credit: Topic is fully addressed.

Full Credit: Clear reasoning and complexity of thought expressed.

Full Credit: Clear relationships between ideas.

__ / 6 TOTAL for Critical Thinking (add section criteria scores)

Overall Essay Score (Average of section totals) __/6