Sie sind auf Seite 1von 37

Running Head: ASSESSMENT DEVELOPMENT FOR ESL COMPOSITION 1

Assessment Development for ESL Composition: An Achievement Assessment for Formative

Purposes

Yuanyuan Sun, Courtney Van Evera & Kiley Miller

Colorado State University


ASSESSMENT DEVELOPMENT FOR ESL COMPOSITION 2

Introduction

Writing is among the most important skills that ESL students need to develop. Meanwhile,

the ability to teach writing is a significant part of the expertise of a well-trained language teacher

(Hyland, 2003). As current and future teachers of writing, we value and administer reliable and

valid assessment in order to successfully teach students and to measure students writing

development. More specifically, in this paper, we focus on developing a take-home achievement

test for formative purposes to benefit teaching and learning in an ESL composition course.

According to Weigle (2012), students who learn to write in a second language context

generally need to write for school purposes, and they have an immediate need to master the

genres and conventions of writing in the target language. Assessments of writing vary widely

and may serve different functions. The four main purposes for language tests in academic

settings are proficiency, placement, diagnostic, and achievement (Weigle, 2012). Among them,

achievement tests are mostly classroom-based assessments which help to determine whether

students have mastered specific skills or knowledge theyve learned (Weigle, 2012). Miller,

Linn, & Gronlund (2009) describe the achievement test design as one which demonstrates

student learning and success and typical performance assessment as students typical behavior,

referring to what they can do now, rather than what they will do in the future. This rationale and

design is most appropriate for our assessment, since the take-home test is designed to evaluate

students performance demonstrating their learning outcomes by a certain point of instruction in

the course and to check their knowledge of course content so far, rather than predict their

achievement at the end of the course.

Furthermore, Cumming (2002) pointed out a significant issue in academic-purpose

assessments of second-language writing, that is, to improve the formative value of assessment
ASSESSMENT DEVELOPMENT FOR ESL COMPOSITION 3

for students learning. Formative assessment is defined in many ways and is also known as

Assessment for Learning which helps us begin to clarify the purpose and reason for

administering such assessments (Burner, 2016). In writing, the focus tends to be on the process,

as opposed to the product, and there is strong emphasis on writing in multiple disciplines,

especially in American contexts (Reising, 1997). While Burner (2016) acknowledges the content

difficulties that accompany any writing-based curriculum, value remains in teaching and

formatively assessing writing. Central to formative assessment are providing feedback,

understanding the purpose and trajectory of assessment, and involving student learners in the

development process (Burner, 2016; Becker, 2016). Such approaches support constructivist

theories of learning where students are more active in the learning process (Edens & Shields,

2015). First-year ESL composition courses at universities with homogeneous ability student

groups, then, are prime candidates for such assessments as they utilize writing skills to integrate

and demonstrate understanding of rhetorical content, and employ higher levels of Blooms

Taxonomy in synthesizing sources and conducting evaluations through rhetorically critiquing

texts, by conducting peer review and providing constructive criticism, and performing self-

reflections and evaluations; writing serves as the medium to accomplish this wide variety of

tasks.

In respect to the form of assessment, compared to traditional selected response items, more

and more researchers begin to investigate the effectiveness of integrated writing tasks, which are

increasingly used in L2 writing assessment (Gebril & Plakans, 2014). Many researchers claim

that integrated writing tasks replicate the actual practices in academic contexts where discourse

synthesis is a common exercise in university writing, since they require learners to synthesize

information from external sources in their writing product (Gebril & Plakans, 2014). Therefore,
ASSESSMENT DEVELOPMENT FOR ESL COMPOSITION 4

this kind of integrated assessment methods can augment authenticity and help to elicit the

academic writing construct in a better way (Gebril & Plakans, 2014).

This paper provides a detailed overview of an achievement assessment for formative

purposes that was developed for an ESL composition class at Colorado State University (CSU).

To begin with, a description of the test is provided. The description includes the purpose of the

test, the type of the test, the interpretation of scores, TLU domain, the construct of the test, the

Table of Specifications and the description of test tasks. Next, the section of the test procedure

provides information about the participants, administration and scoring procedures. The test

results are presented in the next section, followed by the section of discussion on the overall

critique and evaluation of the test. In the discussion, an overall estimation of the tests

effectiveness as well as the reflection on personal significance of test development process are

also provided.

Description of the Test

Purpose of Test

We developed the take-home assessment specifically for CO 150-I, the international

section of first-year composition course offered by CSU as an option but not a requirement for

ESL multilingual students. The take-home assessment works especially well with the CO 150-I,

since the major writing assignments are scaffolded and build on each other. The CO 150-I is

divided into four units of A1, A2, A3 and A4 based on syllabus, and the units are concluded with

different writing assignments which play the role of assessing the instructed content of each unit:

A2 prioritizes research and results in an annotated bibliography; A3 focuses on the people

(stakeholders) involved in the issues that students selected for research and results in a

stakeholder analysis; and A4 focuses on making an informed argument to one of these groups
ASSESSMENT DEVELOPMENT FOR ESL COMPOSITION 5

using all the students prior knowledge, resulting in a researched argument. These writing

assessments, while summative in that they encompass the main ideas of the unit, are also

formative in that they build on each other and provide the instructor with information regarding

the gaps in instruction and can inform areas of focus for the next assessment (Miller, Linn &

Gronlund, 2009). The take-home test developed by my colleagues and I focuses on bridging A3

and A4 and is informed by student achievement in this first take-home assessment that bridges

A2 to A3, following a similar format and serving a similar purpose to increase validity and

reliability. While this type of assessment could serve to link any summative assessment to

another, this is particularly relevant to scaffolded summative assessments, and it serves to

reinforce the course objectives as well as prepare students for their next unit.

The CO 150-I class has two major assignments which are also used as assessments:

assignment A3 is a stakeholder analysis and assignment A4 is an argumentative research

assignment. Our formative assessment is a bridge between A3 and A4, with the directive to

identify a stakeholder to whom to pitch an argument. The purpose of this assessment centers on

the CO 150-I course objective to understand writing as a rhetorical practice, and therein, choose

effective strategies for addressing purpose, audience, and context.

This assessment has several traceable impacts on students and teachers. Firstly, this is a

low-stakes assessment, so student anxiety will therefore be low. Students will be able to self-

assess by viewing the rubric prior to the test and reviewing their work afterwards. There will be

positive washback of rhetorical concepts for students, as this assessment focuses on concepts

presented and utilized in class, it prompts students to review class content and ensures that

students mastered the major rhetorical concepts. And the teacher will use this data to adjust
ASSESSMENT DEVELOPMENT FOR ESL COMPOSITION 6

instruction to make it more useful, and she will use this assessment and the results thereof to

connect assignments A3 and A4, adding to the cohesion of the course.

Type of Test

The take-home assessment is designed to be an alternative assessment and an achievement

test based on the CO 150-I class syllabus. It is used to measure and evaluate students knowledge

and understanding of class content and instructions of units A1, A2 and A3, which are required

for students to accomplish the upcoming unit A4 of writing an argument essay successfully.

Furthermore, the assessment is also used formatively for pinpointing students major errors as

well as identifying students potential problems and difficulties. Since the class units build on

each other, the problems identified should be beneficial to help both students and the instructor

to avoid possible failure in assignments following the test.

Score Interpretation

The results of the take-home assessment is dealt with using criterion-referenced

interpretation. More specifically, the assessment focuses on limited and clearly defined tasks.

Students individual performance on conducting specific tasks is described and evaluated

according to rubrics based on concrete and specific class objectives and writing skills outlined

above. Meanwhile, students performance isnt compared against each other. These points

factors into the Process Work category of assessments, making up 10% of the overall course

score.

Specific Description of the TLU Domain

The target language use (TLU) domain for this assessment is the CO 150-I classroom.

Typical tasks therein encompass pieces of rhetorical writing that are often take-home

assignments, such as homework or drafting segments of larger projects. Major assessments are
ASSESSMENT DEVELOPMENT FOR ESL COMPOSITION 7

scored by rubrics, and align with course objectives; most homework assignments are evaluated

based on rubrics with varieties of point scales. Our writing assessment had components of all the

cognitive skills of Blooms Taxonomy (1956). First, knowledge, comprehension, and analysis

were assessed, where students defined terms and provided limited responses to concept-centered

questions. For synthesis and evaluation, students responded to a writing prompt, where they

applied course concepts by choosing an appropriate or inappropriate stakeholder for a situation

and providing a rationale. This assessment requires the rhetorical use of audience, voice, and

context, which are part of the course objectives and practiced often in in-class and take-home

assignments. A table with the break down of TLU task characteristics for a typical writing task is

offered in Appendix A.

Construct Definition

CO 150-I course focuses primarily on understanding rhetorical elements to compose a

variety of texts effectively; writing is used to assess this rhetorical knowledge. The course

objectives include, Developing critical reading practices to support research and writing;

understanding writing as a rhetorical practice, i.e. choosing effective strategies for addressing

purpose, audience, and contexts; learning important elements of academic discourse[to write]

effective...arguments; [and] developing effective research and writing processes. Thus many

aspects of writing were assessed, though assumptions were made and some skills were excluded

in the test.

Assessed skills. This assessment specifically evaluated writing skills, which was

evidenced by a variety of aspects of communicative language ability.

Organizational knowledge. Students grammatical knowledge and understanding of

vocabulary items in the selected response questions was assessed; these vocabulary items related
ASSESSMENT DEVELOPMENT FOR ESL COMPOSITION 8

to rhetorical concepts (e.g. stakeholder, purpose, etc.) and not to the students selected topics for

research. Students syntax through their writing was assessed, though this was not the primary

focus as the rubric demonstrates. Within the short and extended responses, students textual

knowledge was assessed through cohesion within sentences. Coherence was demonstrated in

their ability to present a topic sentence, related evidence, and an explanation which should

connect the topic sentence and evidence to fully answer the prompt, also outlined in the rubric.

Pragmatic knowledge. Functional knowledge was assessed through their ability to

engage in ideational functions. For example, defining and utilizing rhetorical terms and concepts,

which are required tasks in the classroom, or TLU domain (see Appendix A). Knowledge of

manipulative functions was assessed as rhetoric inherently affects the audience and stakeholders

of the writing, and the world at large. A knowledge of heuristic functions was assessed. In this

case, problem solving is the function at hand.

Assumed knowledge. Students were expected to write in formal, academic English, so

sociolinguistic knowledge (for this particular take-home assessment) was assumed since the

students had been writing for the instructor all semester and should understand these

expectations. Additionally, reading skills and topical knowledge of their chosen areas of research

was also assumed since students conducted their own research, independent by each other. While

all the topics chosen by students were related to higher education such as tuition, this knowledge

was assumed since students had been working with these topics for several weeks at the point in

the semester. Students used this assumed knowledge to articulate their understanding of the

rhetorical concepts of argumentation. This assessment served as one aspect of the writing

process: helping students to evaluate appropriate stakeholders to begin organizing their research
ASSESSMENT DEVELOPMENT FOR ESL COMPOSITION 9

and eventually craft an effective argument. Listening skill was assumed, as part of the lecture

was given orally with the support of written and textual aids.

Elements not evaluated. Students speaking skills were not assessed, as the instructor only

evaluated their written output.

Table of Specifications

The Table of Specifications offers the design for the assessment to ensure a

representative sample of tasks is included in the process of developing an assessment (Jamieson,

2011). The Table of Specifications for this take-home assessment is a two-way chart that relates

the instructional objectives to tasks (see Appendix B). The table indicates the total points and the

percentage of points allotted to various instructional objectives which are more widely

categorized according to the levels of Blooms taxonomy in relation to each task. The percentage

indicates the amount of emphasis on each area in the assessment as well.

The Table of Specifications lists first the level of Blooms taxonomy (1956), further

divided by the objectives the assessment is intended to measure, across the top row, with the

tasks down the first column. The objectives are rephrased from the course objectives according

to the application in this assignment and are matched with three main categories based on

Blooms taxonomy of educational objectives including knowledge and understanding,

application and analysis, synthesis and evaluation. Each main category from the taxonomy is

demonstrated by two subcategories of instructional objectives based on the course; for example,

knowledge and understanding includes rhetorical concepts and style and convention. The last

row at the bottom shows the percentage allocation of points for each objective. Generally

speaking, the points are spread relatively evenly among objectives: rhetorical concepts covers

20% of the assessment, style and convention covers 10%, stakeholder values covers 20%,
ASSESSMENT DEVELOPMENT FOR ESL COMPOSITION 10

applied knowledge of audience (manipulative functions) covers 20%, organization (cohesion,

coherence) covers 10% and development of evidence and explanations covers 20%. The last

column on the right shows the percentage allocation of points for the tasks of the assessment.

Altogether three tasks are designed to measure objectives, and they are weighted quite

differently. Definition makes up 10% of the assessment, T-charts make up 30%, and the

extended response makes up 60% of the score.

There are several implications for the assessment based on the Table of Specifications.

To begin with, the table indicates that the tasks measure different levels of complexity of

learning outcomes. While the task of definitions focuses on assessing students knowledge of

rhetorical concepts, the extended response item is developed to measure the majority of the

objectives listed in the chart. Moreover, the table indicates the time allocation for each items for

students. Taking the distinct weight into consideration, students are implicitly encouraged to

spend most of their time constructing answers for the last two tasks. The extended response item

requires the most time investment particularly, as it accounts for the majority of the points and

largest percentage of the score.

Description of Test Tasks

This take-home assessment (see Appendix C) consisted of three tasks which essentially

comprise the three different parts: a definition, a graphic organizer to be filled in (more specifically,

T-charts), and finally a writing prompt where students produce an extended response. The first two

tasks were limited production items, while the third was an extended production item. Tasks were

specifically constructed to follow Blooms Taxonomy, with the tasks progressing to become more

difficult which also necessarily engage students in lower levels of thinking to enact higher levels

of thought processes (Jensen, McDaniel, Woodard, & Kummer, 2014). The first task, the definition,
ASSESSMENT DEVELOPMENT FOR ESL COMPOSITION 11

assessed fundamental knowledge and matched the first and second tiers of the taxonomy focused

on knowledge and understanding; the second task assessed comprehension through identifying

influences and consequences via the graphic organizer; the final task reached the highest levels

of the taxonomy as students defend and evaluate their thought processes and choices which can be

categorized as analysis, synthesis, and application (Usova, 1997, p. 103). These tasks similarly

reflected the difficulty level.

The directions for the whole test and individual task were provided in written English. This

first task, defining a term from the course objectives, required students to recall knowledge, or at

least to review course materials to find the answer, as students are specifically referred to these

materials in the instructions. The second task required students to first report information they

developed from the previous summative assessment. Filling the T-charts requires students to

compare the gains and losses of each stakeholder group by applying knowledge of their material

and analyzing the situation by thinking critically about the hypothetical situation they have been

researching. In the extended response, students had to choose a stakeholder and develop two well-

reasoned paragraphs to defend their choices. All the tasks were non-reciprocal. The relationship

between input and response was fairly indirect, because the tasks built upon previous tasks and

topical knowledge.

When it comes to scoring, the first two tasks allowed partial credits based on the test key

(see Appendix C). The extended response item was scored according to a specific rubric (see

Appendix C) outlining questions that specify the criteria and are provided to guide the students

self-evaluation and serve as prompts, fitting the description of both holistic and analytic rubrics

(Becker, 2017). Mathematically, the general descriptors in the rubric equate to: excellent

earning full points; good earning around an 83% or a B-equivalent score; satisfactory equaling
ASSESSMENT DEVELOPMENT FOR ESL COMPOSITION 12

around 66%, which roughly indicates the cut-score of 60%; and finally unsatisfactory

receiving a 50% or lower, likely indicating a lack of attention or a completely missing feature.

The specific scales are provided in the rubric within each descriptor and are scored according to

the evaluation of each criteria in the rubric, which further outline the categories outlined in the

Table of Specifications. For example, Synthesis and Evaluation is further defined as

Development of Evidence and Explanations, worth two points. These are expressed in the

section of the rubric, Development and Evidence, as the criteria Do paragraphs have clear

topic sentences, evidence, and explanations? and Are references used appropriately to help

develop the paragraphs? Table 1 outlines this sample scenario. If a student has used thoughtful

topic sentences, supplied evidence, and very clearly explained the connection between these

features, as indicated in the Expected Response, the student would likely earn excellent. If the

student uses multiple references in a single paragraph and one in another paragraph, this may still

be considered a good use of references, but not excellent since the student has not

demonstrated synthesis in both paragraphs. Each category is assessed in this way, using the

marginal commentary as a basis for completing the rubric, leading the evaluators to mark an X

following the criteria under and the descriptor. Scoring for these markings are outlined according

to the scale, or statistically averaged between two categories in the case of an even distribution of

descriptors. The example above would appear for the student as indicated below in Table 1.

Table 1
Modified Rubric Sample
E G S U Score
2 1.66 1.33 <1
Organization Do paragraphs have clear topic X
and sentences, evidence, and 1.83
Development explanations?
(4) Are references used X
appropriately to help develop the
paragraphs?
ASSESSMENT DEVELOPMENT FOR ESL COMPOSITION 13

This even distribution of the evaluation necessitates an average of the two scores, resulting in the

average of the two point distributions for this category of organization, which is 1.83 points.

Commentary precedes the completion of the rubric, giving the evaluator an indication of the

strength of the response and clearly evaluate and respond in fairly and thoroughly.

Test Procedure

Participants

There were 19 student participants with at least mid-intermediate proficiency, since they

have tested into first-year composition or passed pre-requisite classes, like an IEP program or

additional intensive composition classes. Students were between the ages of 18 and 22. Students

came from a variety of countries, speaking a variety of languages, including: 1 student from

South Korea (L1 Korean), 1 Ethiopian student (L1 Amharic), 1 student from Kuwait (L1

Arabic), 2 Omani students (L1 Arabic), 1 Saudi Arabian student (L1 Arabic), and 12 Chinese

students (L1 Chinese). There were 7 male students, and 12 female students. 17 students came to

the U.S. specifically for college within the last two years, and 2 completed intensive English

programs before or concurrent with their CSU studies, while others completed at least one year

of college in their home countries before transferring to the U.S. One student moved to the U.S.

and completed high school in an American setting, and 1 student was born and lived in the U.S.

until age 3 when she moved back to her home country, then back to the U.S. for college.

Administration

The hard copies of the assessment were assigned on a Wednesday at the end of class and

collected the next class period on Friday afternoon. The instructor informed students of the

purpose and procedure of the test orally in class. Student were also asked to read directions

carefully. Students was allowed three days (72 hours) in total to complete the test. This
ASSESSMENT DEVELOPMENT FOR ESL COMPOSITION 14

timeframe allowed students time to produce, review, and revise, which is emphasized in course

objectives. Students submitted the test with hard copies at the beginning of the class on Friday.

Scoring Procedures

We then scored the tests, and each of us scored six tests, using the same answer key and

rubrics for the extended response item (see Appendix C). After the test had been piloted, raters

met together, where the instructor of the CO 150-I class had prepared graded samples of the test

in terms of high, medium, and low grades. We discussed the grading method, asked questions

about interpretations of the rubric, and graded a couple more papers as a group. Then we graded

the rest of tests separately. The instructor collected all the copies of test attached with scoring

reporting forms (see Appendix C) and returned them to students.

Test Results

Students were assigned three tasks for the take-home assessment, which came to a total of 20

points. Table 2 below outlines the task statistics for all three tasks, organized by student and

arranged according to task, with the students individual final score. The numbers in parenthesis

next to the task description indicate the total points possible for each task. 18 student scores were

reported. Scores for the definition ranged from zero to two (M = 1.28, SD = 0.65), the T-charts

ranged from 3.33 to six (M = 5.57, SD = 0.71), and the extended response question ranged from

8.41 to 12 (M = 9.98, SD = 1.12). Final scores ranged from 14.07 to 19.51 (M = 16.87, SD =

1.45).

Table 2

Individual Task Score Report


Student Definition (2) T-Chart (6) Response (12) Final (20)
1 1 6 9.33 16.33
2 2 5.66 11.16 19.51
ASSESSMENT DEVELOPMENT FOR ESL COMPOSITION 15

3 1.5 4.66 10.08 16.24


4 1.5 6 9.08 16.58
5 1.5 6 11 18.5
6 0.5 3.33 11.75 15.58
7 2 6 11.49 19.49
8 1.5 5 10.32 16.82
9 1 6 8.41 15.41
10 1.5 5.33 8.82 15.65
11 1.5 6 9.65 17.15
12 2 6 8.56 16.56
13 1.5 5.33 9.76 16.56
14 0 6 12 18
15 2 6 10.49 18.49
16 1.5 6 9.32 16.82
17 0 5 9.07 14.07
18 0.5 6 9.32 15.82

Each assessment was reviewed by a single reviewer due to time constraints and students

needs for feedback for the next assignment. Each rater scored six assessments, with Rater 1 (who

is also the instructor of the course) completing the first six and distributing a high (a score

around 19), medium (score of around 17-18), and a low (score of 16 or below). Scores are

presented in Table 3 below and divided by rater with mean and standard deviation provided for

each rater and task. Rater 1 scored the highest most consistently, for the extended response and

final scores, while Rater 3 scored the lowest most consistently, for the extended response and

final scores. Rater 2 only averaged the highest score for the definition, and was otherwise the

middle scorer in each category. However, nearly all raters averages fell within one standard

deviation of the other raters scores, with the only exception being Rater 1 and Rater 3s
ASSESSMENT DEVELOPMENT FOR ESL COMPOSITION 16

extended response scores which differed. Additionally, Rater 1s final scores do not fall within

one standard deviation of Rater 3s scores.

Table 3

Rater-Based Individual Scoring


Student Definition T-charts Response Final
Rater 1 2 2 5.66 11.16 19.51
7 2 6 11.49 19.49
14 0 6 12 18
8 1.5 5 10.32 16.82
13 1.5 5.33 9.76 16.56
17 0 5 9.07 14.07
M 1.17 5.5 10.63 17.41
SD 0.93 0.46 1.11 2.06
Rater 2 5 1.5 6 11 18.5
15 2 6 10.49 18.49
4 1.5 6 9.08 16.58
3 1.5 4.66 10.08 16.24
10 1.5 5.33 8.82 15.65
6 0.5 3.33 11.75 15.58
M 1.42 5.22 10.2 16.84
SD 0.50 1.07 1.12 1.33
Rater 3 11 1.5 6 9.65 17.15
16 1.5 6 9.32 16.82
12 2 6 8.56 16.56
1 1 6 9.33 16.33
18 0.5 6 9.32 15.82
9 1 6 8.41 15.41
M 1.25 6 9.1 16.35
SD 0.52 0 0.49 0.64
ASSESSMENT DEVELOPMENT FOR ESL COMPOSITION 17

Pearson correlation scores were analyzed for each rater, and while the individual tasks,

specifically the definition and response showed little correlation, the overall final scores

indicated very high correlation, with a Pearson correlation coefficient of 0.92. The T-chart task

also resulted in relatively high correlation at 0.75. Table 4 below outlines all coefficient scores.

Table 4

Pearson Correlation Coefficients by Task


Task Pearson Correlation Coefficient
Definition 0.26
T-Chart 0.75
Response -0.08
Final 0.92

With so few students, no overlap between raters, and so few items, the standard error of

measurement could not be calculated due to the limited data. However, the extended response

question elicited data in several categories and were comparable, as Table 5 illustrates the

statistics within each category of the rubric. The numbers alongside the scoring category indicate

the total possible score for each area, which corresponds with the Table of Specifications. When

compared, these scores demonstrated a Cronbachs Alpha of 0.804, indicating high reliability for

the extended response.

Table 5

Extended Response: Item Statistics

Scoring Category Mean Standard Deviation

Audience and Rhetoric (6) 4.98 0.58


ASSESSMENT DEVELOPMENT FOR ESL COMPOSITION 18

Organization (4) 3.27 0.40

Style and Conventions (2) 1.73 0.29

Overall, students performed well as none scored below the cut score (60%, or 12 points),

which was determined based on the university grading policy. These results may be expected for

a formative assessment in which students were allowed to use their notes. This would indicate

that each student achieved mastery of the content.

Discussion

The results from the pilot assessment reveal many important implications, which can be

discussed around various characteristics of assessment procedures.

Critique of Task Performance

Ultimately, the limited number of participants and data reduces the generalizability for

these results, but these tasks still produced relevant and useful data that informed the formative

purposes of this assessment and will inform the redesigning of the assessment. The first task, the

definition, showed a very wide range of scores within one SD of the mean, indicating that

students scored across the entire range possible. The T-charts demonstrated greater range, but

also had a larger possible score and was the overall highest scoring task in terms of percentage

(0.93), which may be because students drew this information not from the class lecture or notes,

but from their own previous summative assessment. This task also required only knowledge and

understanding (Bloom, 1956), compared to the definition which was dependent on students notes

or finding the information in the course lectures or PowerPoints, or the extended response which

required synthesis and evaluation skills, which are more difficult and sophisticated (Bloom,
ASSESSMENT DEVELOPMENT FOR ESL COMPOSITION 19

1956). Still, the scores demonstrated a range that seemed appropriate and is largely consistent

with the range of overall scores for students in the course.

When scores were divided by rater, there were clear differences between the scores that

each rater assigned and demonstrating interrater reliability would not only have yielded more

data, but would have provided an indication of what scores were being inflated or too harshly

graded. Due to the time constraints and practicality, this step was not followed and may affect

the range of data represented in this pilot study. For future implementations, this would be a

useful step for both student scores and for instructors to be able to ensure consistency.

Evaluation of Test Usefulness

The test was overall evaluated to be useful, primarily because of its adherence to

assessment constructs, including reliability, validity, and impact. Together with the results and

performance across each task, this task successfully achieved its purpose too.

Reliability. The reliability of the extended response portion of the assessment in terms of

the Cronbachs alpha was .804, which is reasonably high. Our largest goal in making our

assessment reliable was having good interrater reliability. First of all, we three collectively

designed the rubric for the assessment, which increased the plausibility of our grading being

similar. Additionally, scoring procedures mentioned above also ensure our interrater reliability.

While a more analytical rubric with distinct categories for each rating within the extended

response would likely have resulted in higher consistency among raters, constructing such a

rubric for a formative assessment is not practical. While we werent able to evaluate interrater

reliability statistically by scoring the same assessments, we were able to calculate internal

consistency for the assessment between raters, scoring a Pearson correlation coefficient of 0.92,

despite varying correlation coefficients for the individual tasks. More communication between
ASSESSMENT DEVELOPMENT FOR ESL COMPOSITION 20

raters and interrater reliability, if raters are able to score the same assessments for comparison,

may be the most practical solution to achieve higher reliability. Additionally, if other similar

bridge assignments, that connect two summative assessments, were added for additional tasks

to score, or more students were available for the pilot, a better understanding of reliability could

be achieved.

Validity. Constructs for this assessment are derived from the following course objectives

from the source syllabus: Developing critical reading practices to support research and writing;

Understanding writing as a rhetorical practice, i.e. choosing effective strategies for addressing

purpose, audience, and contexts; Learning important elements of academic discourse[to write]

effective...arguments; [and] Developing effective research and writing processes. Constructs

assessed were writing skills, particularly in the realms of organizational and pragmatic

knowledge. Since the assessment modeled past and future assignments, the measure of students

abilities within these constructs was consistent between performances on similar assignments,

which could support and indicate validity. The assessment had a high level of interactiveness

because the construct involved organizational and pragmatic linguistic knowledge and

knowledge of a certain topic of research and stakeholders. The TLU domain for this assessment

was the course itself, and future assignments therein. The assessment mirrored past assignments

which had taken place, and was intended to mirror future assignments and writing assessments,

particularly the A4 Academic Argument, which was the largest summative assessment of the

course.

The instructions of the test tasks resembled instructions of other assignments and utilized

prior knowledge and skills gained in class. No problems were detected. Regarding input, one

area for review may have been the vocabulary utilized for tasks one and two. For task one, since
ASSESSMENT DEVELOPMENT FOR ESL COMPOSITION 21

a definition was required, some students simply Googled the definition, rather than relying on the

class-specific definition provided, so several incomplete responses were given. This potential

confusion was supported in that the definition had the lowest average percentage score (0.64) of

any task. Similar definition tasks have been used before, increasing the validity of the

assessment, and students were referred to the notes in the instructions, so further investigation

may be needed to determine the reason for difficulty with this task. For the T-chart task for a

small number of students, there seemed to be some ongoing confusion about what stakeholders

gains and losses were. For formative purposes, the assessment did what it needed to by revealing

this problem of understanding vocabulary. Considering summative purposes for the larger A3

and A4 assessments that this bridge assessment links, this aspect of the assessment did not allow

a small number of students struggled with the differentiation between gains and losses which

may have caused them not to perform optimally. The relationship between input and response

proved to be very smooth, as predicted, because of the amount of in-class conditioning, and that

class was the TLU-domain.

One feature of the assessment that was potentially damaging to the assessment validity

was the generous scoring system. The first two tasks regarded class knowledge and skills which

were predicted to be reasonable for the students to recall or find in the class notes. The extended

response portion was the largest indicator of success on the assessment overall, which may be in

part because it was also the most heavily weighted part of the assessment. The rubric allotted six

points for Development and Rhetorical Knowledge, four points for Organization and

Development, and two points for Style and Convention. Further analysis of these categories

may reveal patterns in scoring and performance that would be beneficial to students and

developing appropriate instruction accordingly. With the lowest student score on this portion at
ASSESSMENT DEVELOPMENT FOR ESL COMPOSITION 22

8.41 out of 12 possible points, this task would be worth investigating. With all students receiving

between 6 and 8 points on the first two sections, the lowest total score was a 14.07 out of 20,

which is 70%, a C-. The cut score was 12 out 20, so all students proved proficient. Its possible

that these scores were skewed to the high end because of the generous rubric. The time given for

tasks mirrored past and future tasks, and there were no detected difficulties regarding length of

time allotted. All in all, through the circumstance of a CO 150-I class at our disposal, our

assessment was highly authentic.

Impact. Assessments were returned to students at the end of class approximately 10 days

after they submitted the assignment, which is a longer timeframe than most electronically

submitted assignments with a single rater. This may have been frustrating to students who

demonstrate high priority on their schoolwork and are accustomed to receiving grades for similar

formative assessments within the week. However, no negative impacts were directly relayed to

the instructor. Since the assignments included considerable feedback and a clear rubric, students

were able to see exactly what points were missed and areas for improvement, which should have

had a positive effect on the highly motivated group of students. Students proved their ability to

receive and incorporate feedback and commentary in previous assignments, so its likely that this

feedback and the relatively detailed rubric provided areas for students to improve and continue to

learn.

In terms of instruction, this formative assessment provided valuable positive impacts on

teaching and the instructor was able to incorporate more instruction regarding gains and losses as

they relate to stakeholders and argumentative, persuasive writing techniques. The instructor was

able to remind students to rely on the notes provided in class and reviewed where to find the

information, since students performed most poorly on the item that required their notes or
ASSESSMENT DEVELOPMENT FOR ESL COMPOSITION 23

resourcefulness to find the class PowerPoint lectures. 77.77% of students did not score full points

on this definition portion, so the instructor also reviewed synthesis as a rhetorical concept and

how to apply and demonstrate synthesis in writing and using resources. The style section of the

writing rubric was also an area for improvement for a lot of the students, so the instructor

reviewed MLA format and basic assignment requirements, like typing when required, as this was

also a factor that caused students to lose points. This deficiency also related to the instructions,

which may indicate that students were simply not reading the instructions carefully, which may

be an additional aspect to review. With the exception of the possible negative affective impact

regarding the length of time to return the assessment, the impact is believed to be largely

positive. Students were provided with additional learning opportunities through the use of a

rubric and the commentary received, and the instructor adapted instruction specifically to address

the short-comings revealed by this assessment.

Test Purpose

Our assessment was designed to be an achievement assessment used for formative

purposes. The summative purpose of the assessment was to measure linguistic skills and

knowledge from the A3 stakeholder analysis assignment before moving onto the A4 research

paper assignment. The formative purpose of the assessment was to reveal what areas need more

instruction between the A3 and A4 assignments. This assessment revealed that all students were

above the cut score, which was 12 out of 20 possible points, with the lowest score being 14.07

out of 20 possible points. This means that all students showed they had gained the linguistic

skills and knowledge from the A3 assignment through performance on this assessment. We were

also able to pinpoint some areas that needed further instruction. Our estimation is that our

assessment achieved its formative purpose.


ASSESSMENT DEVELOPMENT FOR ESL COMPOSITION 24

Reflection

My colleagues and I developed this assessment as required by the class E 638 Assessment

of English Language Learners that we took for a TEFL/TESL graduate program in 2017. I have

learned a lot from the process of assessment development in many aspects through the steps of

test proposal, development, implement and results analysis. First of all, at the beginning the

semester, I barely knew anything about what the important concepts in assessment are and how

to create one. Its been very helpful for me to go through all the possible steps to develop a valid

and reliable test, especially when this type of curriculum-based achievement test is very

commonly seen in the EFL language classrooms in China that I expect to work in. Secondly, I

benefited a lot from the test development process by putting theory into action: I learned to use

TLU task tables to make sure my assessment is authentic; I learned to match the test constructs

to course goals and objectives; I learned to use the Table of Specifications to ensure tasks

operationalize the constructs well. On the other hand, it was very challenging and time

consuming to consider all these elements in the test at the same time so that they were valid

respectively and work together effectively. An example is when we tried to create the rubrics for

the extended response item, we spent lots of time together making sure all the criteria in the

rubrics showing what the task intends to measure match the test constructs. We also had to make

sure the points assigned to the criteria matches the points allocated in the Table of specifications.

I also learned many things from the process of piloting the test and scoring. I never

thought much about the instructional aspects of tests and now I realize instructions are extremely

important to help with students performance and guide them to study effectively and efficiently.

However, even after I thought we created good enough written instructions for the test, it still

surprised me when I graded students tests and realized how many of them tended to overlook
ASSESSMENT DEVELOPMENT FOR ESL COMPOSITION 25

them. It encouraged me to think about the ways to make students realize the significance of

understanding instructions. As for scoring, since compared to my group members, I had little

teaching and grading experience before, I did learn a lot from them. For example, I practiced

how to give students corrective and constructive feedback without discouraging them. In

addition, I learned that while quantative description of the test could help to measure students

achievement, qualitive evaluation such as feedback and comments on the margin are very crucial

to help achieve the formative purposes of a test and make a test a real assessment for learning.

Last but not least, it was very helpful for me to learn not only how to analyze data using

technology tools such as SPSS (Statistical Package for the Social Sciences), but more

significantly, how to interpret the results to make them really useful and meaningful to tell the

effectiveness of test as well as inform about instruction decisions or learning.


ASSESSMENT DEVELOPMENT FOR ESL COMPOSITION 26

References

Becker, A. (2016). Student-generated scoring rubrics: Examining their formative value for

improving ESL students writing performance. Assessing Writing, 29, 15-24.

Becker, T. (2017, February 15). E 638 Assessment of English Language Learners [Class

handout]. Fort Collins, CO. Author.

Bloom, B. S. (1956). Taxonomy of educational objectives, Handbook I: The cognitive domain.

New York, NY: David McKay Co, Inc.

Burner, T. (2016). Formative assessment of writing in English as a foreign language.

Scandinavian Journal of Educational Research, 60(6), 626-648.

Cumming, A. (2002). Assessing L2 writing: Alternative constructs and ethical dilemmas.

Assessing Writing, 8(2), 73-83.

Edens, K, & Sheilds, C. (2015). A Vygotskian approach to promote and formatively assess

academic concept learning. Assessment & Evaluation in Higher Education, 40(7), 928-

942.

Gebril, A., & Plakans, L. (2014). Assembling validity evidence for assessing academic writing:

Rater reactions to integrated tasks. Assessing writing, 21, 56-73.

Hyland, K. (2003). Second Language Writing. New York, NY: Cambridge University Press.

Miller, David M., Linn, Robert L., Gronlund, Norman E. (2009). Measurement and Assessment

in Teaching. Upper Saddle River: Pearson.

Reising, B. (1997). The formative assessment of writing. The Clearing House: A Journal of

Education Strategies, 71(2), 71-72.

Jamieson, J. (2011). Handbook of second language teaching and research. E. Hinkel. (Ed.). New

York, NY: Routledge.


ASSESSMENT DEVELOPMENT FOR ESL COMPOSITION 27

Jensen, J., McDaniel, M., Woodard, S., & Kummer, T. (2014). Teaching to the test...or testing to

teach: Exams requiring higher order thinking skills encourage greater conceptual

understanding. Educational Psychology Review, 26(2), 307-329. doi:10.1007/s10648-

013-9248-9

Usova, G. M. (1997). Effective test item discrimination using Blooms taxonomy. Education,

118(1), 100-100.

Weigle, S. C. (2012). Assessing writing. In C. Coombe, B. OSullivan, P. Davidson, and S.

Stoynoff (Eds.), The Cambridge Guide to Language Assessment. Cambridge: Cambridge

University Press.
ASSESSMENT DEVELOPMENT FOR ESL COMPOSITION 28

Appendix A

TLU Task Characteristics

Characteristics of the setting

physical characteristics Take home - students choice of environment

participants CO 150.404

time of task Friday after class-Monday class-time (72


hours)

Characteristics of the test rubric

instructions

language English (target language)

channel Written, visual with brief oral introduction

specification of procedures and tasks Selected response, written, brief and extended
response

structure Typical writing tasks

time allotment Days depending on the length of writing

scoring method

criteria for correctness Selected response: 0 = wrong, 1 = right


Constructed response: rubric

procedures for scoring the response 0-3 scale

explicitness of criteria and procedures Given, explicit rubric provided

Characteristics of the input

format

channel Written, oral lecture, visual


ASSESSMENT DEVELOPMENT FOR ESL COMPOSITION 29

form Language

language English

length 50 minute class (MWF, during week)

type Verbal and non-verbal lecture, written, visual

degree of speededness Moderate, varied based on lesson and medium


(lecture content could serve as input)

vehicle Reproduced

language of the input English

language characteristics Academic

organizational characteristics Verbal with written and support depending on


lesson needs; written homework and reading to
supplement

grammatical Written instructions

textual Verbal and written instructions, Socratic


questioning, textbook readings, group verbal
exchanges, note-taking

pragmatic characteristics

functional Heuristic, ideational, manipulative,

sociolinguistic formal, colloquial, natural, polite, academic

topical characteristics Controversial issues in higher education such


as tuition, rhetoric and college composition

Characteristics of the expected response

format

channel Written

form Language, selected response

language English, target

length 72 hours (1-1.5 hours suggested)


ASSESSMENT DEVELOPMENT FOR ESL COMPOSITION 30

type Selected response, short and extended response

degree of speededness Moderate

language of the expected response English

language characteristics Academic, formal, natural, polite, written

organizational characteristics Topic sentence and supported evidence

grammatical College-level vocabulary, standard English,


some specialized vocabulary

textual Paragraph organization, complete sentences

pragmatic characteristics

functional Idealistic, heuristic, manipulative

sociolinguistic Formal, natural, polite, academic

topical characteristics Controversial issues in higher education,


rhetoric and college composition

Relationship between input and response

reactivity Non-reciprocal, written, prompted, synthesis


of information

scope of relationship Moderate

directness of the relationship Direct


Extended Research Definition Tasks
Response Answer and

# Points

% Points
T-Charts

Rhetorical concepts

4
2
2

20
Style and Convention

2
2
Understanding

10
Knowledge and

Stakeholder Values

4
2
2

20
Applied Knowledge of
Analysis

4
4
Audience

20
Application and

(manipulative functions)
Appendix B

Organization

2
2
(cohesion, coherence)

10
Table of Specifications
ASSESSMENT DEVELOPMENT FOR ESL COMPOSITION

Development of
Evaluation

4
2
2
Evidence and
Synthesis and

20
Explanations
6
2

20
12
# Points

60
30
10

100
% Points
31
ASSESSMENT DEVELOPMENT FOR ESL COMPOSITION 32

Appendix C

A3-A4 Bridge Take-Home Assessment

A3-A4 Bridge Take-Home Assessment (20 points)

INSTRUCTIONS: For the majority of the semester, you have been working closely with an issue in

higher education, reading background information in the A2 Annotated Bibliography then analyzing

relevant stakeholders who are affected by the issue in the A3 Stakeholder Analysis. This bridge

assessment is intended to help you identify important pieces from A3 and begin to assemble the A4

Academic Argument. These items are meant to test your knowledge and understanding of the previous

material, as well as help you further analyze your stakeholder options to help you evaluate and make your

choice of stakeholder for A4.

You are encouraged to use your class notes, my PowerPoint lectures, and your previous assignments to

complete this assessment. You have until class-time on Friday, April 7 to complete and submit a hard

copy of this assignment. You should write your responses for the first two items on the booklet provided.

1. Provide a definition of synthesis. Your response should not be a direct copy of course materials. (2

points)

2. List the answer to your research question from A3. In the T-charts provided for you below, name the

three stakeholders you analyzed from A3 and compare their stakes in the issue. On the left side of the

T-chart, list at least 2 positive aspects (gains), and on the right side list at least 2 negative aspects

(losses) for the stakeholder. Fill out the chart with 5 reasons total for each stakeholder. (6 points)
ASSESSMENT DEVELOPMENT FOR ESL COMPOSITION 33

Answer to the Research Question (1 point):

_____________________________________________________________________________________

_________________________________________________________________

T-Charts (stakeholder: .33 point each, reasons: 1.33 points each):

Example:

Stakeholder: Aliens who come to Earth Stakeholder:

GAINS LOSSES GAINS LOSSES


-Find new -Potential war
allies

-Explore -Risk lives if


new land humans are
violent

-Become
famous

Stakeholder: Stakeholder:

GAINS LOSSES GAINS LOSSES


ASSESSMENT DEVELOPMENT FOR ESL COMPOSITION 34

3. Based on the T-charts, choose the most appropriate stakeholder and the least appropriate
stakeholder for your issue. Compose one paragraph for each stakeholder (2 paragraphs,
which should be around one page total) explaining your choices. You may consider the
power the stakeholders have over the issue, how the stakeholder will be influenced by the
issue, how resistant they may be to your potential arguments, and/or the evidence you already
have that the stakeholder would find convincing. Refer to class notes, PowerPoints, and
handouts as necessary using MLA format for any in-text citations. A Works Cited page is not
needed. Type this response and submit with the previous two steps of this assignment. (12
points)


ASSESSMENT DEVELOPMENT FOR ESL COMPOSITION 35

Key for A3-A4 Bridge Take Home Assessment

1. Provide a definition of synthesis. (2 points)

Partial credit possible


Supporting a similar claim using multiple sources
Demonstrating a connection between the information provided

2. List your research question and the stakeholders you analyzed from A3. Construct a T-chart (3

total) for each stakeholder, comparing their stakes in the issue. On the left side of the T-chart, list positive

aspects (gains), and on the right side list negative aspects (losses) for the stakeholder. Fill out the chart

with 5 reasons total for each stakeholder. (6 points)

Answer to the Research Question (1 point):

Answers will vary, based on A3 topics and issues. Partial Credit Possible for:
Full sentence required
Clarity and appropriateness of answer

T-Charts (stakeholder: .33 point x 3 = 1, reasons: 1.33 points/T-chart x 3 = 4 points):

Partial credit possible


Stakeholder: .33 point
Reasons (5 total): .27 points each, distribution between gains/losses does not matter
though the table distribution will likely lead to a maximum of 4 gains or 4 losses, forcing
at least one in each column

3. Based on your T-charts, choose the most appropriate stakeholder and the least appropriate stakeholder

for your issue. Compose one paragraph for each stakeholder (around one page total) explaining your

choices. You may consider the power the stakeholders have over the issue, how resistant they may be to

your potential arguments, and/or the evidence you already have that the stakeholder would find

convincing. Refer to class notes, PowerPoints, and handouts as necessary using MLA format for any in-

text citations. A Works Cited page is not needed. Type and print this response. (12 points)

Partial credit possible according to rubric.


ASSESSMENT DEVELOPMENT FOR ESL COMPOSITION 36

Rubric for task 3, Extended Response

Question 3 Grading Criteria (12 points)

Unsatisfactory
Your instructor will ask the following questions when evaluating your work:

Satisfactory
Excellent

Score
Good
3 2.5 2 <1.5
Are stakeholder values stated clearly?
Audience and Rhetorical

Are the values for both stakeholders described logically?


Are the justifications for both stakeholders relevant to the issue?
Knowledge (6)

3 2.5 2 <1.5
Are the stakeholders appropriate in terms of power and capability to
affect change regarding the issue?
Are the explanations compelling and related to the issue?
Are rhetoric and audience appeals applied appropriately to justify the
choice of stakeholders?

2 1.66 1.33 <1


Are ideas logically organized within the paragraph?
Organization and
Development (4)

Is appropriate transitional language used within sentences?

2 1.66 1.33 <1


Do paragraphs have clear topic sentences, evidence, and explanations?
Are references used appropriately to help develop the paragraphs?

1 .83 .66 <0.5


How well does the essay follow MLA conventions?
Convention (2)
Style and

1 .83 .66 <0.5


How well has the writer proofread and edited for grammar, sentence-
structure and punctuation errors to make the essay clear and easily
readable?
*See the back of this page and the margins for additional commentary.
Final Score: ____/ 12
ASSESSMENT DEVELOPMENT FOR ESL COMPOSITION 37

Score Reporting Form

1. Definition ___/2 points

2. T-charts

Answer to research question: __/1 point

Stakeholders: __/1 point

T-chart reasons: __/4 points

3. Extended Response ___/12 points

Question 3 Grading Criteria (12 points)

Unsatisfactory
Satisfactory
Your instructor will ask the following questions when evaluating your work:

Excellent

Score
Good
Are stakeholder values stated clearly? 3 2.5 2 <1.5
Audience and Rhetorical

Are the values for both stakeholders described logically?


Are the justifications for both stakeholders relevant to the issue?
Knowledge (6)

Are the stakeholders appropriate in terms of power and capability 3 2.5 2 <1.5
to affect change regarding the issue?
Are the explanations compelling and related to the issue?
Are rhetoric and audience appeals applied appropriately to justify
the choice of stakeholders?

Are ideas logically organized within the paragraph? 2 1.66 1.33 <1
Organization

Development

Is appropriate transitional language used within sentences?


and

(4)

Do paragraphs have clear topic sentences, evidence, and 2 1.66 1.33 <1
explanations?
Are references used appropriately to help develop the paragraphs?

How well does the essay follow MLA conventions? 1 .83 .66 <0.5
Convention (2)
Style and

How well has the writer proofread and edited for grammar, 1 .83 .66 <0.5
sentence-structure and punctuation errors to make the essay clear
and easily readable?

*See the back of this page and the margins for additional commentary.

4. Final Score: ____/ 20


(Cut Score = 60%, 12/20)

Das könnte Ihnen auch gefallen