Beruflich Dokumente
Kultur Dokumente
John E. Kruse
University of Maryland
2
THE EFFECTIVENESS OF L2 WRITTEN GRAMMAR CORRECTION
Table of Contents
Abstract ........................................................................................................................................... 3
Introduction ..................................................................................................................................... 4
Synthesis of Research ..................................................................................................................... 7
State of the Field in the 1990s ..................................................................................................... 7
Truscott Ignites Controversy ..................................................................................................... 10
Ferris’s Response ...................................................................................................................... 15
Truscott Strikes Back ................................................................................................................ 17
Ferris - It’s 2004 and We’re Still at Square One ....................................................................... 19
Growing Body of Empirical Research ...................................................................................... 20
What Can We Say For Certain? ................................................................................................ 28
Suggestion for Further Research and Conclusions ....................................................................... 29
References ..................................................................................................................................... 34
3
THE EFFECTIVENESS OF L2 WRITTEN GRAMMAR CORRECTION
Abstract
Truscott’s (1996) article, “The Case Against Grammar Correction in L2 Writing Classes”
was nothing short of a bombshell calling into question revered and long-held views of both
students and teachers on the big question—whether or not written error feedback helps students
to improve written accuracy over time. It took three years for Ferris (1999) to come up with
researched-based reasons why correction should not be abandoned. The polemical debate that
ensued has become a reference point for studies up until today. Recent research has looked at the
such as direct and indirect feedback. There has been evidence to support these practices in
circumscribed settings but the broader application still lacks empirical support. Recommended
areas for future research include computer-assisted language learning, peer review written
feedback
4
THE EFFECTIVENESS OF L2 WRITTEN GRAMMAR CORRECTION
Introduction
Problem Statement
Corrective feedback (CF) has been called “the most contentious issue in second language
(L2) writing research” (Liu & Brown, 2015). The purpose of this paper is to provide a review
and analysis of the research relating to the effectiveness of written corrective feedback (WCF) in
improving the English language learners’ (ELLs) mastery of written English in a formal
classroom setting.
During an especially prominent period of debate on the subject (Ferris, 1999; 2004;
Truscott, 1996, 1999), gaps in the research were proposed by Ferris (1999) and validated by
Truscott (1999). These were: the value of teacher training and practice, whether certain types of
errors are more amenable to correction than others, the role of individual student variables, and
the types of error correction that might lead to long-term improvement versus more easily-
studies, Liu and Brown recorded “a dramatic increase in WCF studies starting after about 2004,”
attributing it to Ferris’s (2004) critique of the research base as incomplete and inconsistent.
The problem is then, is there enough evidence from the last 15 years to allow formulating
conclusions beyond simply suggesting directions for future research? Recent empirical studies,
literature reviews, meta-analyses, and edited book chapters will be examined. An analytic
framework for this review will ask, does this evidence meet Truscott’s (1996) two criteria for
effectiveness (p. 329)? First, can specific cases that can be found where grammar correction
5
THE EFFECTIVENESS OF L2 WRITTEN GRAMMAR CORRECTION
improves L2 writing? And second, do students who receive error correction improve in accuracy
over time? If neither of these research questions have been answered, then educators still need
valid and reliable research, and well-designed case studies to guide their practice.
For the review of the post-debate articles, resources were initially collected using the
online Education Resources Information Center (ERIC) database and limited to peer-reviewed
dissertations in the Proquest database. The process of searching the ERIC database for germane
studies involved inputting the following Boolean-phrased search terms: L2 writing OR second
language writing AND written corrective feedback (OR error correction) AND efficacy (OR
effectiveness) AND grammar. The search had a posterior cut-off date of 1996 to capture the state
of the field beginning with Truscott (1996). There was no anterior cut-off date.
The most relevant results were then cross-checked for the number of citations as
measured by the PlumX Metrics analytic tool embedded within ERIC. This was a proxy for the
impact of the article on the field. Of the articles that appeared in results, most were uncited or
lacked even abstract views. Uncited or lightly-cited articles were eliminated as were those that
As there appears to have been continued efforts toward at least some incremental
progress in clarifying the question of the effectiveness of written error correction, research is
Accordingly, the literature review starts with three representative works on teaching grammar in
an ESL setting that helps set the stage for the Truscott-Ferris debate: Long (1991) Silva (1993),
Fotos (1994). It then proceeds to examine four articles representing a bellicose exchange
6
THE EFFECTIVENESS OF L2 WRITTEN GRAMMAR CORRECTION
between Truscott (1996, 1999) and Ferris (1999, 2004) on the subject of CF in L2 writing.
Subsequently, five widely-cited: Bitchener, Young and Cameron (2005); Ellis, Sheen, Murakami
& Takashima (2008); Sheen (2010); Liu and Brown (2015); and Benson and DeKeyser (2018)
are analyzed in this review. Admittedly, it is possible that this delimiting may have omitted some
pertinent works, but those chosen should still be representative of studies addressing the two
Terminology
bilinguals— have not been universally applied and will be further described here. Other terms,
on which there is greater agreement, such as focus and unfocused feedback, direct and indirect
Corrective Feedback.
Nassaji & Kartchava (2017) began their review of recent research on corrective feedback
by accepting Chaudron’s definition as “any teacher behavior following an error that minimally
attempts to inform the learner of the fact of the error” (p. ix). They differentiated among oral
corrective feedback, computer-mediated feedback, and written corrective feedback. This paper
A distinction is sometimes made in the field between English as a foreign language (EFL)
and English as a second language (ESL). Citing Gass and Selinker (2001), Solano-Flores (2016)
defined EFL as learned (emphasis by the author) “formal instructional content” usually in an
environment where the student’s L1 is spoken, and ESL as acquired (emphasis by author)
through “social interaction” in an environment where the L2 is predominant (p. 67). This
7
THE EFFECTIVENESS OF L2 WRITTEN GRAMMAR CORRECTION
categorization does not fully take into account other conditions such as studying a foreign
language in a structured setting in a location where the L2 is the native language. ESL will be
used in the more general sense in this paper to include a number of environments.
Mahoney (2017) considered emergent bilinguals as “the preferred term for students who
are in the process of learning English as a new language” (p. 190). She concedes; however, the
prevalence of English language learners (p. 4). The term of art, Emergent bilinguals appears to
Synthesis of Research
At this point, it is useful to briefly review some articles that characterize prevalent
thinking in the field during the period immediately leading up to the Truscott-Ferris debate. Long
(1991) represented the school of thought that was moving back toward some teaching of
grammar after being discarded in favor of an approach where students learned the linguistic
features of an L2 incidentally, much in the same way they acquired them as a child learning their
L1. Silva (1993) built on the idea of L2 learning as a distinct process and made
recommendations for accommodating and scaffolding L2 writing tasks. Fotos (1994) advocated
Within the framework of the pedagogical theories of task-based language teaching and
grammar teaching. He explicitly rejected the premise that the L1 and L2 acquisition processes
are identical and that grammar was best absorbed in an immersion-like process with no role for
8
THE EFFECTIVENESS OF L2 WRITTEN GRAMMAR CORRECTION
CF. On the other hand, he advocated for a return to some sort of formal grammar instruction,
teaching grammar in meaning-based lessons. He saw this as coming back to center after “drastic
In setting his approach apart from traditional teacher-fronted grammar lessons of the past,
he differentiated between focus on forms and focus on form. He used the labels to make a
distinction between intentionally teaching grammar according to a structured syllabus and that of
incidentally in communicative activities or tasks. Based in part on what Schmidt (1990) called
noticing, he advocated for a focus on form, which he considered promoting a more learner-
centric environment where grammatical forms were not targeted but addressed as they arose in a
One potential problem with Long’s initial recommendations is that teachers might not
necessarily be able to anticipate and prepare in advance to explain grammatical points where the
rules are absent or very complex such as the adjectival order before a noun, usually given as:
rectangular green French silver whittling knife” highlighted in BBC Trending (2016).
In his 1993 meta-analysis, Silva made a case for the distinct nature of L2 writing.
Looking at 72 reports of empirical research that compared English as a second language (ESL)
a number of factors to include research design, sample size, English proficiency, and writing
proficiency. Most of the writing tasks called for expository essays, and a much smaller number
Silva found the assumption “untenable” that L1 and L2 writing are the same at a
theoretical level (pp. 688-689). Citing studies that compared accuracy with verbs, prepositions,
articles, and nouns, he found strong evidence that L2 writers, even those at advanced proficiency,
made more grammatical errors than their NES counterparts (p. 663). As a result of this and other
areas—such as lexical and semantic—where L2 writers struggled, he went as far to suggest that
practitioners need a different evaluation criterion for L2 writing” (p. 670). He also recommended
they implement sequential revisions in stages focusing exclusively on either content or grammar.
Consciousness-Raising Tasks
promote proficiency gains in L2 acquisition. Like Long (1991), she sought to move accepted
practice within ESL pedagogy toward a middle ground between teacher-fronted explanations of
grammar and incidental acquisition of grammar. These activities focus on a form of grammar
within an interactive communicative activity. Unlike a pure meaning-focused activity, the task
has a grammatical component and aims at raising learner noticing of the grammatical feature in
subsequent interaction in the target language. Mastery of the grammatical structure is gained
Fotos (1994) selected syntactic features that Japanese learners had difficulty with. These
involved word order for three tasks: adverb placement, indirect object placement, and relative
clause usage. The researchers administered tests to three groups: a grammar (teacher-fronted)
lesson group, a grammar task group, and communicative (lacking grammatical content) task
group. The tasks required interactions (negotiations) resulting in agreement among the members
of the groups.
10
THE EFFECTIVENESS OF L2 WRITTEN GRAMMAR CORRECTION
The study design comprised three 90-minute classes one time a week over a three-week
period. Proficiency was measured with a cloze test that Fotos (1994) judged to be “valid and
reliable” (p. 329). The study also compared the negotiations (interactions) transacted in the
groups to reach consensus in performing the task. While the length and quantities of the
negotiations varied among tasks, Fotos (1994) found that there were significant gains in accuracy
across the three grammatical structures (p. 339). She cautiously concluded that the study
supported the use of grammar consciousness-raising tasks as one possible method for the
Truscott (1996) took a bold and unconventional position that teachers of second language
writing classes should discontinue error correction of students’ written assignments. In support
of his argument, he made the sweeping claim that no research had shown it to be helpful. He
further posited that there should be no theoretical or practical expectation that it would be of any
value, and went as far to say that correcting grammar actually has detrimental effects. Truscott
acknowledged that his views were contrary to conventional wisdom and common practice in the
field. He was careful then again, not to discount the need for grammatical accuracy in writing.
Notwithstanding, he rejected arguments that written corrective feedback could contribute to its
development.
Truscott began his “evidence against grammar correction” of L2 writing with citing “a
great deal of evidence” (p. 329) from two studies on first language (L1) writing. This literature
review will examine two of them in-depth as they were not later scrutinized in the Ferris (2004)
11
THE EFFECTIVENESS OF L2 WRITTEN GRAMMAR CORRECTION
rebuttal. The first pair of researchers cited, Knoblauch and Brannon (1981), examined what they
numbered as “better than two dozen studies” published in the 25 year-span from 1955 to 1980 on
the writing of students of elementary school-, secondary school-, and college-age (p. 1). In their
non-peer-reviewed article in Texas Christian University’s Freshman English News, they noted
that the majority of research at the time evaluated the relative efficacy of different forms of
teacher intervention, and they expressed skepticism observing the “implausibility of attempting
to determine degrees of effectiveness amidst such gross uncertainty about the value of any kind
of commenting” (p. 2). They further argued that “we scarcely have a shred of empirical evidence
to show that students typically even comprehend our responses to their writing” (p. 1). Of the
studies they examined, they concluded that “We have only found one study that includes, at least
embryonically, all of the features of effective instruction that might enable researchers to show
Based on this Buxton (1958) unpublished Ph.D. dissertation, they advocated for research
into the effectiveness of a multi-stage revision process with active teacher engagement in
“guided rewriting” (p. 3). Looking at the shortcomings of the Knoblauch and Brannon article, it
is possible that Truscott put too much weight on a non-refereed study that was16-years old at the
time and that dealt with written composition in general and not specifically with correcting
grammatical errors.
Hillock’s meta-analysis.
The second source of support cited by Truscott was Hillocks’ Research on Written
Composition: New Directions for Research (as reviewed by Bennett, 1986, Witte & Larson,
1987). Bennett described it as an exhaustive reference work examining over 2,000 composition
studies from between 1963 and 1982. In the review, Bennett summarized that “Hillocks
12
THE EFFECTIVENESS OF L2 WRITTEN GRAMMAR CORRECTION
substantiates that traditional grammar instruction may have an intrinsic value, but the skills do
accBoth Witte and Larson consider Hillocks’ work to be inaccessible for most English
writing teachers whom they deem would be unfamiliar with meta-analysis as a research method.
Witte takes pains to explain Hillocks’ methodology and informs readers that “Meta-analysis is a
statistical procedure that permits a researcher to compare, along several dimensions, results of
several quantitative studies to determine whether those results are homogeneous” (p. 204).
According to Witte, of the more than 500 empirical studies on composition instruction examined
by Hillocks, only 60 of them met Hillocks' criteria for comparability (p. 204). Witte’s summary
of the author’s findings presages Truscott’s own conclusion nine years later. He comments and
then quotes Hillocks at the end, “For improving the quality of student writing, every other focus
of instruction Hillocks examined is better than studying traditional grammar, which may even
have ‘a deleterious effect on students’ writing’” (p. 205). Witte adds that “The meta-analysis
indicates that neither having students revise their written texts nor giving them peer or teacher
feedback about their writing has any clear relationship to increasing the quality of student
Larson tracks with Witte’s opinion of the inaccessibility of the analysis to practitioners
remarking that “Without advanced training in quantitative techniques, one is forced to take on
faith the conceptual strategy he adopts, the detailed procedures he employs, the sufficiency of the
studies he relies on, and the categories he creates” (p. 211). And Larson describes Hillocks’
generalized and not focused on one or two matters only, are largely useless. (p. 211). If there
13
THE EFFECTIVENESS OF L2 WRITTEN GRAMMAR CORRECTION
were one critique of Truscott’s reliance on reference to Hillocks’ work, it is that he draws
broadly examines the more general category of improving the quality of student writing.
Nevertheless, Knoblauch & Brannon and Hillocks make findings that appear to have permeated
Truscott’s thinking and influenced the fervent tone of his article (1999).
Truscott’s (1996) article continued with straightforward criteria for evaluating effective
error correction which was later accepted by Ferris (2004) even though it requires a control
group of students from whom feedback is withheld, a sensitive practice for many teachers (p.
329). He wrote:
The researchers compare the writing of students who have received grammar correction
over a period of time with that of students who have not. If correction is important for
learning, then the former students should be better writers, on average, than the latter. If
the abilities of the two groups do not differ, then correction is not helpful. The third
possibility, of course, is that the uncorrected students will write better than the corrected
Truscott (1996) went on to review studies of L2 written error correction. Some of the
works he subsequently used to bolster his position hardly seem definitive. Although research by
Cohen and Robbins (1976), twenty-years old at the time, found that the corrections did not seem
to have any significant effect on the students’ writing ability, their survey consisted of only three
English as a second language (ESL) students. In Robb et al.’s (1986) study of four types of
grammar feedback, Truscott objected to the lack of a control group which would have received
no feedback, and then for comparison used a control group based on a completely different study
14
THE EFFECTIVENESS OF L2 WRITTEN GRAMMAR CORRECTION
by Frantzen and Rissel (1987). In the conclusion of this part of the article, Truscott (1996)
generalized that “Veteran teachers know there is little direct connection between correction and
He dismissed what he called the “intuitive” view that learning grammatical structures is a
“sudden discovery” (1996, p. 342) process by countering with the observation made by Long
(1997) that it is a much more gradual, and sometimes unpredictable, endeavor. Truscott (1996)
accepted the existence of an order of acquisition and the implication that if students are corrected
on grammatical structures for which they are not ready, it will be ineffective. Developmental
sequences in his view; however, were too poorly understood to allow them to serve as a reliable
As mentioned, Truscott (1996) went as far as to say that grammar correction actually has
harmful effects and is counterproductive. He did not specifically mention the need for
maintaining a low affective filter in the classroom but viewed the likely outcome of correction on
students’ attitudes as being demoralizing. He also cited the findings by Semke (1984) that the use
of the “red pen” creates stress and negatively affects students’ motivation. Additionally,
Truscott emphasized that grammar correction can cause students to avoid mastering complex
grammatical forms. This “avoidance strategy” was noted by Sheppard (1992) in his examination
of the frequency of usage of relative clauses by groups of students who received and didn’t
receive correction. Finally, Truscott (1996) questioned the opportunity costs of grammar
correction in terms of the time that could have been better spent on correction of content.
that it should be “abandoned” (1996, p. 360). As an answer to the question of what teachers
15
THE EFFECTIVENESS OF L2 WRITTEN GRAMMAR CORRECTION
should do in writing classes, he gave the arguably unsatisfying advice, “anything except
grammar correction” (1996, p. 360). Teachers, according to Truscott (1996), could help by
“doing nothing” (1996. pp. 360-361) and thereby avoiding any detrimental effects or negative
consequences. He did hint toward the potential efficacy of “comprehensible input” in stating
that the only solution to students improving grammar in writing is “extensive experience with the
Ferris (1999) critiqued Truscott’s (1996) article and identified what she called “serious
flaws” (1999, p.4). She summarized Truscott’s (1996) argument and highlighted his contention
that teachers mindlessly accept grammar correction as necessary and constructive but without
any real critical examination of its effectiveness. Recognizing that for teachers, grammar
correction is one of the most “time consuming and exhausting aspects of their job” (p. 2), she
admitted at one point to secretly hoping that Truscott (1996) was right.
Ferris (1999) opened her rebuttal by taking issue with Truscott’s (1996) lack of precision
in defining error correction. She continued with a review of the most recent research that made
the case that “selective, prioritized, and clear” (1999, p. 4) feedback can be helpful to student
writers. Ferris (1999) next disputed the accuracy of Truscott’s (1996) characterization of the
sources he cited, even accusing him of selectively interpreting and misconstruing the evidence.
She went as far to say that Truscott “overstates research findings that support his thesis and
dismisses out of hand the studies which contradict him” (1999, p. 5). Nevertheless, Ferris (1999)
conceded the scant extant evidence supporting the effectiveness of error correction, but
Ferris (1999) granted that Truscott (1996) did, in fact, make some compelling points. She
accepted his position that semantic, syntactic, lexical, and morphological errors require different
treatments in their correction. Ferris (1999) was also in agreement with Truscott (1996) on the
ill-effects resulting from the general inconsistency of teachers’ grammar correction, but she did
not see this as insurmountable. She cited a study by Ferris, Harvey, and Nuthall (1998) that
after a ten-week tutorial. Although she does not make it explicit, one could reasonably assume
that the Ferris et al. (1998) study was planned and carried out with refuting Truscott’s assertions
in mind.
In a section of her paper titled “Why Continue with Error Correction in L2 Writing
Classes,” Ferris (1999) gave three reasons for continuing the practice pending more conclusive
evidence. First, she noted that grammar correction is highly valued by students and that
withholding it could actually produce more anxiety and frustration than providing it however
imperfectly. Next, she pointed out that subject-matter instructors would regard it as gross
negligence and academic malpractice if their ESL instructor counterparts simply ignored
students’ linguistic difficulties. Lastly, Ferris (1999) saw grammar feedback as a viable method
After censoring Truscott (1996) for “potentially putting students at risk” (p.9), Ferris
(1999) suggested four areas for future research that included: the necessity for and effectiveness
of teacher training and practice, whether certain types of errors are more amenable to correction
than others, the role of individual student variables, and longitudinal studies that might validate
techniques leading to long-term improvement vice more easily-observed short-term gains (Ferris,
1999, p. 9). In the meantime, she called for restraint, reiterating her counsel that teachers should
17
THE EFFECTIVENESS OF L2 WRITTEN GRAMMAR CORRECTION
retain faith in their intuition and rely on their experiences in the classroom and personal insights
In his defense, Truscott (1999) labeled Ferris’s (1999) criticisms as “unfounded and
highly selective” (1999, p. 111). He explained his original decision to write his 1996 article was
based on his conviction that grammar correction is a “bad idea” (1999, p. 111) and the
unexamined state of the field at the time which he saw as stuck in a practice that lacked an
empirical basis. He further defended his motivation to present an alternative, albeit a negative
contended that Ferris (1999) had disapproved of his lack of a formal definition of error
correction but that she then subsequently employed the term in the same manner that he had. He
went on to clarify; however, that his call for abandonment “is valid for all forms of grammar
correction, not just for those that everyone rejects” (p. 112).
Truscott (1999) then reacted strongly to Ferris’s (1999) contention that the variations in
the types of subjects, instructional methods, and research designs that he had drawn upon
inherently invalidated their support for his thesis. He responded with the opposite view, that
“when similar results are obtained under a variety of conditions” (1999, p. 114), generalizations
should be more justifiable. Not unexpectedly, Truscott (1999) also took issue with Ferris’s
(1999) assertion of his overstating the evidence from Kepner (1991) because it involved new
writing in journal entries and did not incorporate a revision process (p. 114). Truscott (1999) did
not see this as a serious flaw with this method and noted that the students were given other
In his look at the three reasons Ferris (1999) gave for continuing to correct, Truscott
(1999) ascribed “circular reasoning” to defending continuation based on students’ preference for
grammar correction which he said was a “false faith” (p. 166). He stated that it was illogical to
justify what he viewed as a self-reinforcing practice. According to Truscott (1999), “By using
correction, teachers encourage students to believe in it; because students believe in it, teachers
must continue using it” (p. 116). In examining Ferris’s (1999) second argument that content
course teachers “are relatively unhappy about the grammar errors of non-native students,”
Truscott (1999) dismissed it as a “weak claim,” challenging the assumption that the error
corrections would have produced any improved accuracy (p. 117). Regarding Ferris’s (1999)
case for students’ self-editing with “strategy training,” Truscott (1999) viewed the practice as
“difficult to interpret” and acknowledge that he was unable to fully respond (p. 117).
Truscott (1999) called attention to the fact that Ferris (1999) left large portions of his case
against grammar unchallenged and that she accepted many of his arguments. From there he
returned to the question of which side had the burden of proof. In essence, he maintained that if a
group does something, they should be the ones who have to prove that it works, especially in
light of “the harmful effects of the practice” to be avoided “until a convincing case can be made
concluded “that the case against grammar correction remains valid” and that his case against it
was “stronger after Ferris’s discussion than it was before.” (p. 118). He then asked, “So what is
left of the case for grammar correction in L2 writing?” and answered, “Little more than the
lingering pro-correction bias” (p. 119). But in qualifying this position, he hedged “It would be
19
THE EFFECTIVENESS OF L2 WRITTEN GRAMMAR CORRECTION
plainly absurd to claim that research has proven correction can never be beneficial under any
Perhaps surprisingly, in the end, Truscott (1999) conceded that the line of research Ferris
(1999) proposed would be a productive avenue to pursue, “Ferris is certainly right that many
interesting questions remain open” (p. 119). Furthermore, Truscott (1999) stated, “I support the
sort of research program Ferris outlines in her conclusion” and he allowed that “I may even
participate in it” (p. 121). Finally, Truscott called for researchers “to acknowledge that grammar
correction is, in general, a bad idea and then to see if specific cases can be found where it is not a
In a retrospective piece, Ferris (2004) lamented that even after the published debate
between her and Truscott (1996, 1999) and “decades of research activity in this area, we are
virtually at Square One” (p. 49). By this time she had arrived at the conclusion that Truscott was
correct in insisting that the burden of proof was on those who argued in favor of error correction.
(p. 50). Furthermore, she was in agreement on the insufficiency of the research base surrounding
the question that supported the practice. Early in her article, Ferris (2004) stated, “I decided to
Ferris (2004) proceeded to offer three major observations on the “state of the art” (p. 50)
in error correction. First, she noted the ethical dilemma for teachers in allowing the separation of
a control group in their classes for whom no error correction would be given. She saw this as a
major factor contributing to the paucity of studies actually comparing the effects of students who
had received error correction with those who had not. Second, Ferris (2004) made a lengthy and
detailed effort to dissect four L2 writing studies that Truscott relied upon in order to demonstrate
20
THE EFFECTIVENESS OF L2 WRITTEN GRAMMAR CORRECTION
that they neither reported similar findings nor constituted “replications in research in different
contexts” (p. 54). Third, while admitting that research was still largely inconclusive, Ferris
(2004) detected what she deemed as enough positive indicators to justify error correction in the
In looking at prospects for further research, she identified a major gap in the lack of
longitudinal studies measuring students’ progress over time. She also called for studies that were
comparable in design and replicable. Additionally, Ferris (2004) saw the need for finely-tuned
studies addressing questions such as: “does the explicitness of teacher feedback . . . have an
impact on student uptake and long-term progress?” (p. 58). Noting that practitioners cannot wait
for researchers to offer definitive direction, she gave a number of suggestions including
exhorting teachers to better prepare themselves in both metalinguistic knowledge and the art of
differentiating error correction by student needs and the type of error in question.
Bitchener, Young & Cameron (2005) began by noting the impact that Truscott’s (1996)
claims had had on the research community and by naming Ferris (1999) as “championing the
case against Truscott’s firmly held conviction” (p. 192). Their research question was, “[t]o what
extent does the type of corrective feedback on linguistic errors determine accuracy” (p. 195). The
study consequently looked at the relative effectiveness of three types of error correction: (a)
direct, explicit written feedback with student-researcher individual conferences lasting five
minutes; (b) direct, explicit written feedback only; and (c) no corrective feedback. In their
Direct or explicit feedback occurs when a teacher identifies an error and provides the
correct form, while indirect strategies refer to a situation when the teacher indicates that
an error has been made but does not provide a correction, thereby leaving the student to
They also looked at the effects of coded and uncoded feedback which they defined as:
Coded feedback points to the exact location of an error, and the type of error involved is
indicated with a code (for example, PS means an error in the use or form of the past
simple tense). Uncoded feedback refers to instances when the teacher underlines an error,
circles an error, or places an error tally in the margin, but, in each case, leaves the student
In the article’s review of the literature, they had found support for the view that students given
coded feedback did not significantly outperform those given uncoded feedback.
In the experiment, Bitchener et al. measured the performance of 53 adult ESL students,
mostly Chinese immigrants, on 4 occasions over 12 weeks. The English language proficiency
level of the participants was described as “post-intermediate,” a group for which there was a
research gap (p. 195). In each instance, students wrote a 250-word informal letter representing a
new piece of writing. After the first exercise, the top 3 error categories out of 27 were identified
which included the use of prepositions, the past simple tense, and definite articles.
Subsequently, student progress was measured for these 3 errors on the succeeding 3
assignments. Bitchener et al. (2005) found that there was no improvement in the use of
prepositions, which in English are highly idiomatic, but there was statistically significant
progress on the more rules-based use of definite articles and the past simple tense (p. 202).
22
THE EFFECTIVENESS OF L2 WRITTEN GRAMMAR CORRECTION
Acknowledging that research at the time favored indirect feedback, Bitchener et al. (2005) called
studies from 1992 to 2001 (p. 255). He sought to refute conclusions from a growing number of
studies presenting a favorable view of error correction in L2 writing classes and stated “Readers
could thus be forgiven for this that this matter has largely been settled and that the empirical case
against correction can now be safely dismissed. Nothing, I will argue could be further from the
truth” (p. 255). His statistical analysis measured the effect that correction, his independent
The acceptable confidence level for correlation between the two variables was set high at
95% which he described as “standard practice” (p. 256). To a non-statistician, this seems a
reasonable doubt, often operationalized at 99% certainty as opposed to more likely than not,
The nine studies were selected from “a general look at published sources” (p. 257)
primarily the references in Ferris (1999, 2004). He examined six studies with control groups
receiving no correction and three “only looking at absolute gains” (p. 263). On the basis on the
empirical studies with control groups, Truscott (2008) concluded that “(a) the best estimate is
that correction has a small harmful effect on the students’ ability to write accurately, and (b) we
can be 95% confident that if it actually has any benefits, they are very small” (p. 270).
In his discussion of the studies without control groups, Truscott (2007) again maintained
that without such a reference point “one cannot determine whether observed gains resulted from
23
THE EFFECTIVENESS OF L2 WRITTEN GRAMMAR CORRECTION
correction or from other factors” (p. 263). Consequently, he averaged the confidence measures
drawn from the data in the six previously analyzed control groups and used that as a comparison
point. Using this statistical method for treatment of the three remaining studies, Truscott (2008)
He ended, not with suggestions for future research, but rather with recommendations on
what not to research. He contended that studies solely looking at the effectiveness of revising a
previous piece of writing or those focusing on correcting a single grammatical structure were
theoretically uninteresting in that they did not involve writing for realistic communicative
purposes (p. 270). Looking at the nine articles in his self-described small meta-analysis, he found
the “performance of corrected groups is in fact so poor that the question ‘How effective is
Ellis, Sheen, Murakami and Takashima (2008) began by recognizing Truscott’s (1996)
point that a control group is necessary to support hard evidence that corrective feedback can
improve students’ writing (p. 353). They made a distinction between focused and unfocused
written corrective feedback which they described in these terms: “The focused group received
correction of just article errors on three written narratives while the unfocused group received
students who had completed 6 years of English study (p. 357). They were divided into three
groups: one receiving focused written correction, a second receiving unfocused written
correction, and a third receiving no correction but just a simple general comment such as “good”
(p. 359). The specific target structure was the use of the indefinite articles a or an in first use and
24
THE EFFECTIVENESS OF L2 WRITTEN GRAMMAR CORRECTION
the definite article the in second and subsequent mentions (p. 356). The example given was
The writing tasks were narrative descriptions of pictures from stories or fables on 3
occasions over a 10-week period (p. 360). The research question was whether written corrective
feedback improved students’ accuracy and if there was a difference between the focused and
unfocused groups (p. 356). Ellis et al. (2008) found improved grammatical accuracy in both the
focused and unfocused groups receiving feedback with no progress in the control group.
However, the researchers also found that “There were no statistically significant differences
between the focused and unfocused CF (corrective feedback) groups” (p. 366).
In their discussion and conclusion, Ellis et al. granted that their results may have only
been evidence of meta-linguistic understanding and not true acquisition (p. 366). Nevertheless,
they found evidence that the focused group had longer-term benefits and that “correction
directed repeatedly at a very specific grammatical problem may well have a greater effect (p.
368). Accepting that their study only dealt with articles, they called for future research in a wider
Tailoring Feedback.
Sheen (2010) began her review with a broad survey of the divergence of theoretical
perspectives between first language and second language acquisition as to what to do with
learner errors. She then traced the pendulum swings from the behavioralists' “need for immediate
eradication” (p. 169) to the nativists' relegation of corrective feedback to a minor, if not non-
existent role. In examining the influence of the prevailing interactionist school, Sheen held that
ability to self-revise their writing and the capacity to bring this grammatical knowledge forward
in showing improved grammatical accuracy in subsequent assignments, the latter she viewed as a
truer measure of progress. She recognized the contribution of sociocultural and psycholinguistic
theories and tailoring corrective feedback toward a student’s zone of proximal development (p.
176).
limitations of written constructive feedback research due to the ethical dilemma of withholding
correction. She contrasted the higher level of complexity of written correction as compared to its
oral counterpart, but held out hope for reaching a common methodology for both. The divide
between SLA and what she called “L2 writing researchers” was noted, advising the latter to
adopt the “methodological practices of oral research in SLA” (p. 175). In this vein, Sheen saw
This division may reflect a gap between theoretical and practical approaches and the
diverging interests in finding out how something works versus whether something works. As a
way forward, Sheen (2010) saw value in future research taking greater account of student factors
Liu and Brown (2015) conducted a meta-analysis of 51 published journal articles and
doctoral dissertations which they considered “a sample of studies close to the entire population
of studies of interest” (p. 67). Their focus was exclusively on studies that investigated long-term
gains in accuracy, excluding those that only dealt with the draft revision process. But instead of
analyzing the aggregate effects of CF on increased accuracy as in Truscott (2007), they looked at
26
THE EFFECTIVENESS OF L2 WRITTEN GRAMMAR CORRECTION
the characteristics of the studies themselves in four areas: sampling features, design features,
In discussing sampling features, one finding was that the emergent bilinguals studied
were primarily at the intermediate proficiency level and that beginner and advanced students
were under-investigated. It was also found that almost half of the study participants were in post-
secondary settings indicating a need for attention to adult learners outside of university
The breadth of writing genres was considered a strength in the design features across the
studies explored in the meta-analysis. Tasks included academic writing, picture description,
personal topics, and narratives (p. 74). Liu and Brown (2015) also found variety in the CF types.
The three prevailing forms were direct correction, error coding, and error locating. “Error
locating” was synonymous with unfocused feedback in which errors were identified by
“highlighting, underlining, or circling” (p. 75). They noted that this diversity did result in a
limited number of studies “investigating a single type of CF” (p. 74) and called for a “more
concerted effort to replicate studies involving each of the different treatment types” (p. 74).
(ANOVA) and t-tests. By “reporting practices,” Liu and Brown (2015) referred to the variables
measured in the statistical analyses such as mean, standard deviation, and effect size (p. 77).
They found that only 16% measured the effect size which they considered “a problem
widespread in L2 research in general” (p. 77). Additionally, only 18% of the studies coded error
types as “grammatical, lexical, or mechanical” (p. 79) making it difficult to disaggregate purely
grammatical errors.
27
THE EFFECTIVENESS OF L2 WRITTEN GRAMMAR CORRECTION
Liu and Brown (2015) identified a tension between determining what specific types of
grammatical errors were most treatable employing focused WCF and the need to more-closely
duplicate a real-world classroom environment where unfocused feedback would be the norm
treating multiple error types, a problem they referred to as “ecological validity” (p. 75).
Benson and DeKeyser (2018) conducted an empirical study with 151 students in low-
intermediate to advanced English for academic purposes classes at a state university. In their
design, they sought to overcome shortcomings pointed out in Truscott (2007) in examining only
the revision process and not the longer-term effects to be measured in completing new writing
assignments. In addition, their study advanced beyond what they considered the previously
limited scope of research to the use of articles in English (Benson & DeKeyser, 2018).
In looking at the accuracy of students’ use of the simple past tense and past perfect over
four different essays, they examined the effects of two different types of corrective feedback,
direct and metalinguistic. The direct feedback group had errors highlighted in their essays and
the correct form of the verb supplied in the margins using Microsoft Word track changes. An
example was “In 1992 he begins to play soccer” (p. 24) with begins highlighted as incorrect and
“began” appearing in the margin. The errors of the metalinguistic feedback group were also
highlighted in the text, but instead of the correct form, just the grammatical rule was given, also
with Microsoft Word track changes in the margin. An example was, “Use the simple past tense
since this action occurred in the past and is complete” (p. 24). There was also a control group
“that received general comments on content and organization” (p. 6) but no corrective feedback
on grammatical errors. Only the simple past and present tense verb errors were noted, thus
In their findings, Benson and DeKeyser (2018) observed that both direct and
metalinguistic corrective feedback produced gains in accuracy for both the simple past tense and
the past perfect but that improvement from direct feedback was longer-lasting, but only with the
Another dimension they looked at was the potential effects of language analytic ability
(LAA) which they measured for each student before the study using a computer-based test which
quantified “the ability to induce rules of an unknown language” (p. 6). Positing that
metalinguistic feedback, with its provision of grammatical rules, would benefit higher-LAA
students, they expressed surprise that “the learners with a higher LAA in the metalinguistic
group did not have greater overall gains in accuracy” (p. 17).
In conclusion, they regarded the evidence as refuting Truscott’s (1996) claim that written
feedback “could potentially be counterproductive or harmful” (p. 19). They offered the qualified
finding that “the present study confirms a positive role for written corrective feedback in
instructed second language acquisition, at least for some learners and some structures” (p. 20).
They suggested that future studies “explore interactions with other influential variables such as
As summarized in Table 1, Bitchener (2005); Ellis et al. (2008); Sheen (2010); Nassaji
and Kartchava (2017); and Benson and DeKeyser (2018) all found specific cases where WCF
improved L2 writing accuracy. Of these five, Bitchener (2005); Ellis et al. (2008), Nassaji and
Kartchava (2017); and Benson and DeKeyser (2018) all identified instances in which the gains in
Table 1
Summary of research findings: What does recent research indicate about the effectiveness of
error correction in L2 writing classes?
Do students who receive error correction Bitchener (2005) Yes: In certain cases
improve in accuracy over time? Ellis et al. (2008) Yes: In certain cases
Sheen (2010) Unclear
Liu & Brown (2015) Not addressed
Nassaji & Kartchava (2017) Yes: in certain
cases
Benson & DeKeyser (2018) Yes: In certain
cases
Two of the edited chapters in Nassaji & Kartchova (2017) look at promising areas for
further WCF research. These are computer-assisted language learning (CALL) and peer review
corrective feedback. Although not necessarily new, Heift & Hegelheimer (2017) and Tigchelaar
& Polio (2017) each considered their respective areas of interest under-researched.
Heift and Hegelheimer (2017) traced the development of CALL since the 1960s. They
noted progress from the earlier tutorial programs that provided explicit direct CF and
metalinguistic feedback for grammar errors at the sentence level, to the present capability in
automated writing evaluation (AWE) systems for providing CF for longer essays.
30
THE EFFECTIVENESS OF L2 WRITTEN GRAMMAR CORRECTION
They pointed out that these early tutorial programs were grounded in interactionist theory
and drew “the learners’ attention to a gap between their interlanguage and the target language” (p
54). In their review of CALL research comparing the effect of direct CF versus metalinguistic
feedback, Heift and Hegelheimer (2017) found short-term benefits to both, but that the effects for
The authors review a number of AWE systems to include Criterion, CyWrite, and
Research Writing Tutor. They note that all three can provide explicit, direct feedback or what
they call implicit feedback. Their definition of explicit feedback is similar to Bitchener et al.
(2005), equating direct and explicit feedback, “Explicit or direct (emphasis is the author’s)
feedback pertains to a situation in which the computer specifies an error in student writing and
provides language learners with the correct form” (p. 57). In their terminology, implicit and
indirect feedback is also synonymous, “implicit or indirect (emphasis is the author’s) corrective
feedback also identifies and signals the error to the student; however, the AWE program does not
offer any corrections” (p. 58). While highlighting these capabilities, the authors do not make any
findings into the relative effectiveness of the two types of feedback. Additionally, they observed
that “there is a scarcity of research evidence, especially with regard to whether automated AWE
feedback results in accuracy development and retention over time” (p. 60). Leaving the reader
with more uncertainty, Heift and Hegelheimer (2017) found that measures of reliability which
Perhaps with a view toward filling these gaps in knowledge, the authors ended with emphasizing
that the “key in the future development of computer-generated feedback is to equip the tools with
mechanisms that allow for research of vast and reliable user data” (p 62).
31
THE EFFECTIVENESS OF L2 WRITTEN GRAMMAR CORRECTION
Tigchelaar and Polio (2017) saw few studies that “have focused on how well peers can
provide language-focused feedback” (p. 98). In their review, they specifically looked at studies
that included peer review of mechanical (e.g, spelling and punctuation) and grammatical issues.
They noted a general consensus that in the revision process these “surface level aspects” (p. 97)
should be treated after global issues (content and organization) but found this conventional
wisdom lacked empirical evidence to determine whether it was justified (p. 98). In examining
studies where there was peer review training, their major finding was that without training
“students seem to focus on formal aspects of writing” (pp. 103, 105), but that “there was a
significantly greater number of comments on global issues after training” (p. 103).
The authors allow for the possibility “that the training itself, as opposed to the feedback,
may result in improved writing” (p. 109). Unsatisfyingly, they concluded that the issue of
whether peer feedback improves student writing “has not been resolved” (p. 110). Tichelaar and
Polio called for future research into whether restricting student feedback to certain grammatical
Teacher Training
While Tigchelaar & Polio (2017) looked at peer training in CF, there is a dearth of
research into teacher training. Table 2 summarizes and identifies empirical research in the four
areas identified by Ferris (1999) for further research. Of the four gaps in research identified by
Ferris (1999), teacher training has seen the least attention. There may be an ethical dilemma in
creating a control group of students who had a teacher who has not received the same training in
grammatical WCF as other groups. This could be mitigated by the control group having teachers
alternatively trained to provide WCF in other areas such as content and structure.
32
THE EFFECTIVENESS OF L2 WRITTEN GRAMMAR CORRECTION
Table 2
Conclusions
Nassaji & Kartchava (2017) conclude that “research has provided increasing evidence
that corrective feedback plays a crucial role in second learning and teaching” (p. xi). Still, this
evidence applies mostly to specific grammatical features or single WCF techniques. It remains to
be seen if an “ecologically valid” approach more closely resembling actual classroom practice
Even fifteen years after Ferris’s (2004) self-admonishment to quit debating and do some
research, the state of the field has only advanced incrementally beyond “Square One.” In some
respects, the task of answering the big question--whether or not error feedback helps students to
call for many diverse but integrated studies. But the need remains pressing. As Ferris (2004)
reminded us, practitioners still need valid and reliable research, and well-designed case studies to
References
Amiryousefi, M. (2016) The differential effects of two types of task repetition on the complexity,
accuracy, and fluency in computer-mediated L2 written production: a focus on computer
anxiety. Computer Assisted Language Learning, 29(5), 1-17.
doi:10.1080/09588221.2016.1170040
BBC Trending (2016, September 6). Why the green great dragon can’t exist. Retrieved from
https://www.bbc.com/news/blogs-trending-37285796
Benson, S. & DeKeyser, R. (2018). Effects of written corrective feedback and language aptitude
on verb tense accuracy. Language Teaching Research, 1-25.
doi:10.1177/1362168818770921
Bennett, S., & Hillocks, G. (1986). Educational Horizons, 65(1), 4-5. Retrieved from
http://www.jstor.org/stable/42926843
Bitchner, J., Young, S. & Cameron, D. (2005). The effect of different types of feedback on ESL
student writing. Journal of Second Language Writing, 14, 191-205.
Buxton, E. (1958). An experiment to test the effects of writing frequency and guided practice
upon students’ skills in written expression. Unpublished Ph.D. dissertation, Stanford
University.
Cohen, A. & Robbins, M. (1976). Toward assessing interlanguage performance: The relationship
between selected errors, learners’ characteristics, and learners’ explanations. Language
Learning, 26, 45-66.
Ellis, R. (2006). Current issues in the teaching of grammar: An SLA perspective. TESOL
Quarterly, 40(1), 83-106.
Ellis, R., Sheen, Y, Murakami, M., & Takashima, H. (2008). The effects of focused and
unfocused written corrective feedback in an English as a foreign language context.
System, 36, 353-371. doi.org/10.1016/j.system.2008.02.001
Fotos, S. (1994). Integrating grammar instruction and communicative language use through
grammar consciousness‐raising tasks. TESOL Quarterly, 28, 323-351.
Ferris, D., Harvey, H., & Nuthall, G. (1998). Assessing a joint training project: Editing
strategies for ESL teachers and students. Paper presented at the American Association of
Applied Linguistics Conference. Seattle, WA.
Ferris, D. (1999). The case for grammar correction in L2 writing classes: A response to Truscott.
Journal of Second Language Writing. 8(1), 1-11.
35
THE EFFECTIVENESS OF L2 WRITTEN GRAMMAR CORRECTION
Ferris, D. (2004). The ‘grammar correction’ debate in L2 writing: Where are we, and where do
we go from here? (and what do we do in the meantime?) Journal of Second Language
Writing, 13, 49-62.
Fotos, S. (2004). Integrating grammar instruction and communicative language through grammar
consciousness-raising tasks. TESOL Quarterly, 28(2), 323-351. Retrieved from
http://www.jstor.org/stable/3587436
Franzen, D. & Rissel, D. (1987). Learner self-correction of written compositions: What does it
show us? In B. VanPatten, T.R. Dvorak, & J. F. Lee (Eds), Foreign language learning: A
research perspective (pp. 92-107). Cambridge: Newbury House.
Gass, S. & Selinker. Second language acquisition; An introduction. Mahwah, NJ: Erlbaum
Associates, Publishers.
Johnson, E. (2019). Choosing and using interactional scaffolds: How teachers’ moment-to-
moment supports can generate and sustain emergent bilinguals’engagement with
challenging English texts. Research in the Teaching of English, 53(3), 245-269.
Knoblauch, C & Brannon, L. (1981). Student commentary on student writing: The state of the
art. Freshman English News, 10(2). 1-4.
Long, M. (1991). Focus on form. A design feature in language teaching and methodology. In K.
De Bot, R. B. Ginsberg, & C Kramsch (Eds), Foreign language research in a cross-
cultural perspective (pp. 39-52).
Mahoney, K. (2017). The assessment of emergency bilinguals. Blue Ridge Summit, PA: Edwards
Brothers Malloy, Inc.
Nassaji, H., & Kartchava, E. (Eds.). (2017). Corrective feedback in second language teaching
and learning: Research, theory, applications, implications (ESL & applied linguistics
professional series). New York, NY: Routledge.
Robb, T, Ross, S, & Shortreed, I. (1986). Salience of feedback on error and its effect on EFL
writing quality. TESOL Quarterly. 20, 83-95.
Schmidt, R. (1990). The role of consciousness in second language learning. Applied Linguistics,
11. 129-158.
36
THE EFFECTIVENESS OF L2 WRITTEN GRAMMAR CORRECTION
Semke, H. (1979). Effects of the red pen. Foreign Language Annals. 17. 195-202.
Sheen, Y. (2010). The role of oral and written corrective feedback in SLA. Studies in Second
Language Acquisition. 32. 169-179.
Sheppard, K. (1992). Two feedback types: Do they make a difference? RELC Journal, 23. 103-
110.
Silva, T. (1993). Toward understanding the distinct nature of L2 writing: The ESL research and
implications. TESOL Quarterly. 27(4), 657-677.
Solano-Flores, G. (2016). Assessing English language learners: Theory and practice. New York,
NY: Taylor & Francis.
Truscott, J. (1996). The case against grammar correction in L2 writing classes. Language
Learning, 46. 327-369.
Truscott J. (1999). The case for “The case against grammar correction in L2 writing classes.
Journal of Second Language Writing, 80. 111-122.
Truscott, J. (1999). What’s wrong with oral grammar correction. Canadian Modern Language
Review, 55(4). 437-456.
Truscott, J. (2001). Selecting errors for selective error corrective. Concentric: Studies in English
Literature and Linguistics, 27(2). 93-108.
Truscott, J.& Hsu, A. (2008). Error correction, revision, and learning. Journal of Second
Language Writing, 17(2), 292-305. doi:10.1016/j.jslw.2008.05.003
Truscott, J. (2007). The effect of error correction on learners’ ability to write accurately. Journal
of Second Language Learning, 16, 255-272. doi:10.1016/j.jslw.2007.06.003
Truscott, J. (2017). Modularity, working memory, and second language acquisition: A research
program. Second Language Research. 33(3). Retrieved from
https://journals.sagepub.com/doi/full/10.1177/0267658317696127
Witte, S., Larson, R., & Hillocks, G. (1987). College Composition and Communication, 38(2),
202-211. doi:10.2307/357721
37
THE EFFECTIVENESS OF L2 WRITTEN GRAMMAR CORRECTION
Yeo, M. (2018). When less may be more: Rethinking teachers' written corrective feedback
practices--Interview with Icy Lee. RELC Journal: A Journal of Language Teaching and
Research, 49(2). 257-261.