9027207259

Processing Perspectives on Task Performance
Task-Based Language Teaching:

Issues, Research and Practice (TBLT)
Task-Based Language Teaching (TBLT) is an educational framework for the
theory and practice of teaching second or foreign languages. The TBLT book
series is devoted to the dissemination of TBLT issues and practices, and to
fostering improved understanding and communication across the various clines
of TBLT work.
For an overview of all books published in this series, please see
http://benjamins.com/catalog/tblt
Editors
Martin Bygate
University of Lancaster
John M. Norris
University of Hawaii at Manoa
Volume 5
Processing Perspectives on Task Performance
Edited by Peter Skehan
Kris Van den Branden

KU Leuven
Processing Perspectives
on Task Performance
Edited by
Peter Skehan
St. Marys University, Twickenham
John Benjamins Publishing Company

Amsterdam/Philadelphia
TM
The paper used in this publication meets the minimum requirements of

theAmerican National Standard for Information Sciences Permanence
of Paper for Printed Library Materials, ansi z39.48-1984.
Library of Congress Cataloging-in-Publication Data

Processing perspectives on task performance / Edited by Peter Skehan.
p. cm. (Task-Based Language Teaching, issn 1877-346X ; v. 5)
Includes bibliographical references and index.
1. Language and languages--Study and teaching. 2. Task analysis in education. 3. Competence
and performance (Linguistics) 4. Second language acquisition. 5. Second language
acquisition--Methodology. 6. Task analysis in education. 7. Cognitive learning. 8. Psycholinguistics. I. Skehan, Peter.
P53.82.P84 2014
418.0071--dc23
2013050660
isbn 978 90 272 0725 8 (Hb ; alk. paper)
isbn 978 90 272 0726 5 (Pb ; alk. paper)
isbn 978 90 272 7041 2 (Eb)
2014 John Benjamins B.V.

No part of this book may be reproduced in any form, by print, photoprint, microfilm, or any other
means, without written permission from the publisher.
John Benjamins Publishing Co. P.O. Box 36224 1020 me Amsterdam The Netherlands
John Benjamins North America P.O. Box 27519 Philadelphia pa 19118-0519 usa
For Daniel
Table of contents
Series editors preface to Volume 5
ix
Preface
xi
chapter 1
The context for researching a processing perspective on task performance
Peter Skehan
chapter 2
On-line time pressure manipulations: L2 speaking performance under
five types of planning and repetition conditions
Zhan Wang
chapter 3
Task readiness: Theoretical framework and empirical evidence
from topic familiarity, strategic planning, and proficiency levels
Bui Hiu Yuet Gavin
chapter 4
Self-reported planning behaviour and second language performance
in narrative retelling
Francine Pang & Peter Skehan
chapter 5
Get it right in the end: The effects of post-task transcribing
on learners oral performance
Li Qian
27
63
95
129
chapter 6
Structure, lexis, and time perspective: Influences on task performance
Zhan Wang & Peter Skehan
155
chapter 7
Structure and processing condition in video-based narrative retelling
Peter Skehan & Sabrina Shum
187
viii
Investigating a Processing Perspective on Task Performance
chapter 8
Limited attentional capacity, second language performance,
and task-based pedagogy
Peter Skehan
211
Author Biodata
261
Index
263
Series editors preface to Volume 5

It is our pleasure to introduce the fifth volume in this series, a collection edited by Peter
Skehan and entitled Investigating a Processing Perspective on Task Performance. This
book is in many ways a culmination of work initiated by Skehan some two decades ago,
as it builds upon the theoretical perspectives of his Tradeoff Hypothesis and extends
from the considerable associated research into task types, characteristics, and implementation conditions. Of primary interest in this volume is the relationship between
task design variables and their effect on how language learners produce speech for
communicative purposes. Tasks here are generally brief spoken narratives of the sort
that have grown in popularity as primary pedagogic tools of task-based instruction that
seeks to provide a focus-on-form and meaning simultaneously. Beyond their apparent
face value as opportunities for practicing L2 speech and developing fluency, such tasks
offer the intriguing possibility of drawing learners attention to form-meaning connections, initiating learner analysis and restructuring of their interlanguage, improving their control of the language, and ultimately pushing the development of language
knowledge and proficiency. The main goal here is for learners to be able to produce
complex, accurate, and fluent L2 speech, with tasks being employed to integrate the
various learning processes; and the key question then is how?
Beginning in the early 1990s, and presented first in an influential article A framework for the implementation of task-based instruction followed by the highly cited
book A Cognitive Approach to Language Learning, Peter Skehan proposed that learner
performance on these kinds of tasks was determined in part by the fundamentally
limited cognitive resources that a person has available during speech. Telling a story,
describing a picture, explaining a process these tasks consume the attention that
learners have at their disposal, and which, as a consequence, needs to be divided
between fluency, accuracy and complexity of their performance. In this respect, certain tasks have been claimed to make greater or lesser demands on cognition; in a
similar vein the conditions under which learners are asked to perform tasks may influence what they focus on in their production. For example, providing learners with the
opportunity to plan prior to telling a story may free up attentional resources, resulting in spoken narratives that are lexically more diverse, syntactically more complex,
grammatically more accurate, and so on. Building from these observations into pedagogic implications, Skehan advocated for cycles of tasks that were selected, designed,
and sequenced to intentionally shift learners attention between a focus on fluent and
efficient communication versus the opportunity to restructure and push language
production at the cusp of interlanguage development.
Skehans groundbreaking ideas, along with the competing theoretical position of
Peter Robinsons Cognition Hypothesis (see volume 2 in the TBLT:IRP series), inspired
a generation of task-based research into the effects of task features (types, conditions,
characteristics) on the cognitive complexity of task demands and the resulting influences on L2 performance. It is no exaggeration to suggest that this line of work has
led to the majority of published empirical research related to TBLT during the 1990s
and 2000s. In the current volume, Skehan and his students at the Chinese University,
Hong Kong, make two major contributions to this accumulating domain of research.
First, the empirical studies compiled here reflect a relatively comprehensive agenda of
research into key dimensions of the Tradeoff Hypothesis: Taken together, they illuminate, strengthen, and extend patterns of findings previously attested in the literature.
In particular, they address the role of pre- and during-task planning, the effect of posttask demands on during-task performance, and the relationship between greater or
lesser task structure and performance. Second, in his introductory and concluding
chapters, Skehan offers an extended and updated explanation of the theory underlying his processing perspective on tasks, emphasizing the grounding of his ideas in
a Leveltian psycholinguistic model of speech production. He also provides a painstaking review and synthesis of findings from the studies in this volume in conjunction with previous research, thereby updating, clarifying, and expanding the Tradeoff
Hypothesis in critical ways.
The chapters collected in this volume, based on creative and insightful research
designs emanating from a core theoretical coherence, offer concrete findings that have
the potential to inform task design and implementation for language learning purposes in important ways. To be certain, the volume is about specific kinds of tasks
(relatively brief speaking tasks) that are controlled and manipulated according to a
handful of theoretically motivated principles, and the claims forwarded here cannot
be automatically generalized to all tasks that might shape what happens in task-based
instruction. In that regard, and in order to encourage future research that builds upon
the solid foundations laid in this volume, we would like to repeat here a suggestion that
we forwarded in our preface to volume 2 of this series:
It is, therefore, perhaps not overly ambitious to suggest that a next phase in this
particular research agenda would turn to the embedded investigation of cognitive task
complexity as one aspect of TBLT educational design and implementation. Certainly,
it will be only through such ecologically valid research that the ultimate contribution
of these important ideas in interaction with the variety of other factors at play in
long-term and otherwise complex language teaching and learning will be realized.
We look forward to featuring such work in a future volume of this series.
Along these lines, we hope very much to see work in the near term that takes important
theoretically motivated and empirically attested ideas, like those presented in the current volume, and explores their implementation in task-based educational p
ractice
such is the potential and intent of TBLT as a truly researched pedagogy.
John M. Norris, Martin Bygate, Kris Van den Branden
Preface
Producing a book is always nice, but this book is the source of particular pleasure.
In January 2004 I took up a post as Professor of Applied Linguistics in the English
Department at the Chinese University of Hong Kong, and spent six years there. Many
things characterised this period, but outstanding amongst these were the opportunities for doing empirical research. These came in two major guises opportunities
to apply for research grants (both internally to the university, and more importantly,
externally to the Research Grants Council of Hong Kong), and the delight of supervising doctoral students. In the former category, I was lucky enough to obtain one internal grant, and two external grants, both targeting task-based performance, enabling
me to undertake a programme of research, rather than one individual study. In the
latter case, I was extremely fortunate to supervise a succession of talented doctoral
students who were a superb combination of being individually highly motivated, as
doctoral students must be, and also willing to work within a framework. The delight
was that they each pursued a personal research agenda, but that agenda fitted within
the wider programme which motivated my research.
The result of all this is the present volume, which is an edited collection, but with
the difference, that while it has a sustained focus on task-based performance, this is
studied within certain parameters:
a view that attention is limited and that we must explore what the consequences
of this are
a concern to explore how task characteristics influence performance
an interest in the conditions under which tasks are done, with a certain amount
of focus on planning
The result is a collection which makes a cumulative contribution, rather than being
characterised by disparate chapters brought together through quality, but lacking a
shared focus. So, it is a source of pride to me that collectively we have been able to
produce such a volume, with the aim of contributing to the understanding of second
language spoken performance as well as the literature on second language tasks. In so
doing, we also hope to bring out the researchability of the area, as well as its practical relevance. A processing approach to second language performance has provided
fertile to us. We hope to convince other people of its utility.
There are many people to thank in relation to this volume. The Editors of the
series deserve considerable thanks, first for being encouraging about the idea of
the volume, and then for the considerable work each has put in with the chapters
of the volume, work which has strengthened the chapters considerably. I would
xii
also like to thank other Ph.D. students from the Chinese University, Hong Kong,
who worked as research assistants on various of the specific research projects: Dai
Binbin, Cai Jing, and Ren Hongtao. I am also grateful in that regard to the English
Department at CUHK which funded this research assistance, especially Tracey
Liang, and also M.A. students who played a part, by examining and rating candidate tasks that we were considering.
I would also like to offer thanks to two individuals who have strongly influenced
the processing research reported here. First there is Willem Levelt, whose model of
first language speaking is the starting point for many of the studies which follow. The
structure and theoretical foundations that this model provides have been immensely
influential and had a strong guiding role. Second, there is Peter Robinson, whose
Cognition Hypothesis stands in clear opposition to my own Tradeoff Approach. I am
grateful to him for the clear opposing standpoint to my own and because we share the
frame of reference by which the different positions can be judged. The disagreement
has been amicable and stimulating, and I feel my work has benefited enormously as a
result.
Finally, to return to the institutional context in Hong Kong, I would like to thank
the Arts Faculty at the Chinese University for an internal grant that funded research
that underpins Chapter 4, and the Research Grants Council for two Earmarked Grants
which were the main basis for Chapters 4, 6, and 7, as well, essentially, for much of the
material in Chapter 10. The RGC also deserves considerable thanks for the doctoral
studentships which supported all the remaining chapters.
chapter 1
The context for researching a processing

perspective on task performance
Peter Skehan
Introduction
In this chapter there is a need to set the scene, and explore why the contributing
authors were interested in tasks, task conditions, and task measurement, and why we
thought these were important enough to occupy us all for several years. To do that we
have to go back a little, since growth in interest in tasks has been a major development
in itself within applied linguistics. Communicative language teaching in the 1970s and
1980s represented a major shift in the goals and methods of language instruction relative to the structure-domination of the years before. An approach which gave much
greater importance to meaning and to language use as an important basis for development brought about a shift in the entire profession, from the nature of coursebooks
and language teaching methodology, through assessment procedures, to the goals and
procedures of teacher education. All of these put the learner centre-stage, and saw
interaction as vital. One interpretation of a communicative approach has been to organise teaching around the use of language learning tasks, tasks which have meaningprimacy, a focus on outcomes, and some connection with real-world language use.
In this respect, it has been fundamentally important that task-based approaches have
attracted considerable research and theoretical interest. Regarding the former I have
argued elsewhere (Skehan 2011) that the approach is distinguished most clearly by
the way claims are considered to be accountable to data so that tasks are judged not
simply by their appeal to the teacher, but also by their impact on performance, and,
ultimately, development. While there have been notable practical developments (Van
Den Branden 2006; Willis 1996) it is the research accountability that has been the
most distinctive contribution for many working in the task field, enabling a possible
move towards a researched pedagogy.
Tasks also have interesting theoretical linkages. Early approaches to researching
tasks were strongly influenced by the Interaction Hypothesis (Long 1996), and, making the assumption that certain sorts of interactional processes are most propitious for
second language development, the initial priority was largely one of exploring how far
Peter Skehan
tasks might be designed and used to maximise opportunities for interactional features,
such as recasting and negotiation of meaning, which were deemed to be particularly
helpful in driving acquisition. It was assumed that the goal was to understand how
an interlanguage system develops most effectively, and good task design, empirically
grounded, was seen as the basis for this since it would generate more conversational
feedback the oil that supported change and progress. Dangers were soon identified
with task-based approaches, and in this regard, the concept of focus on form was developed (Long & Robinson 1998). This proposes that when communication proceeds,
there is a possibility that communicational needs will predominate, and that form will
lose focus (and with this chances of development and control being enhanced will be
compromised). In fact, however, research interpreted processes such as recasting and
negotiation of meaning as promoting a focus on form while communication itself was
still the primary goal of the encounter. In other words, communicational naturalness is
not compromised, but at the same time, conditions are being created which mean that
form is not forgotten and indeed is being nurtured through the targeted, personalised
feedback that becomes available (Doughty & Williams 1998).
Since the late 1980s a somewhat contrasting approach to researching tasks has
emerged. This strand is less concerned with interaction processes (although these still
figure (Robinson & Gilabert 2007)) and is more concerned with task performance
and the processing influences upon it. This performance has been frequently conceived in terms of complexity, accuracy, and fluency. I have argued that it is possible
to conceive of an acquisitional dynamic implicit in these performance areas (Skehan
2009a), such that:
a. Complexity (or new, or emerging language) is associated with change, development, and risk-taking, but also possible error.
b. This possible error demonstrates a need for greater control, eventually leading to
greater Accuracy, as the new language is used with greater facility. But although
error may be avoided, performance may be halting and slow and probably reliant
on a rule-based system which has not yet been automatised.
c. The next stage is to acquire even greater control, to proceduralise, and to produce
correct language Fluently, without excessive interruptions to flow, and without
the need to apply rules with awareness.
The boldfaced terms are the three areas (complexity, accuracy, fluency) which are
more frequently measured in studies exploring task characteristics and task conditions
and have, recently, been the major focus for a book (Housen, Kuiken & Vedder 2012).
Making the assumption that these three dimensions (which have been shown to be
distinct from one another; (Skehan & Foster 1997; Tavakoli & Skehan 2005) are a good
reflection of performance, the goal is to explore what makes it more likely that each
will increase, whether as a function of the task or of the way the task is done. However,
if it is the case that where attentional resources are limited, the natural priority, in a
communicative context, is to emphasise meaning, rather than form (Van Patten 1990,
1996), the danger would be that form can lose focus, and that advanced language, or
control over less advanced language, might be sacrificed to the primary goal of achieving fluency and meaning expression.
While this changed analysis of performance itself is important, what is of even
more significance is how influences upon that performance are researched, because
there has been a clear switch in general framework. This has involved a move towards
a more cognitive approach, one in which the functioning of attention and working
memory have become more central. While previous studies had considered factors
which were cognitive in nature (associated with the processes of successful negotiation for meaning), these were not often interpreted within wider cognitive theory.
Researchers did not explore how a focus-on-form which arose during communication
(and which would be prominent in working memory) would then make the transition to long-term memory, or how many such exposures would be necessary to assist
this transition (but see Doughty (2003) for discussion of this issue). In this light, two
frameworks for studying tasks can be usefully contrasted. My own is to see limitations
in attention as fundamental to second language speech performance, an assumption
which leads to the need to explore what attentional and working memory demands
are made by a task, and the consequences this may have for different performance
dimensions. More demanding tasks are assumed to be likely to lead to prioritization
of fluency over accuracy and complexity what has become known as the tradeoff
hypothesis. Consistent with this view is the suggestion that tasks based on familiar or
concrete information favour a concern for accuracy, as do tasks which contain a posttask phase. Similarly interactive tasks, or tasks requiring transformation or manipulation of material, or tasks which have had pre-task planning might lead to greater
linguistic complexity. The fundamental principle here then is that demanding tasks
can create problems for a learner/ second language speaker because of processing limitations, and that one can explore methods of mitigating these difficulties, and even trying to nurture improved performance in all dimensions, through effective use of task
choice and task conditions which overcome attentional limitations.
In contrast to this is the position advocated by Peter Robinson, in the form of the
Cognition Hypothesis (Robinson 2001, 2011). He proposes that attention is not constrained in the same way as the Tradeoff position argues, and that it can expand to deal
with the demands placed upon it, under certain conditions. Further he proposes that
task complexity (a construct he discusses at some length) is what drives performance,
and that greater task complexity simultaneously raises accuracy and complexity. The
Tradeoff position, in contrast, argues that the range of attentional demands involved
can cause tasks to become more difficult and that a very common manifestation of
Peter Skehan
this will be complexity and accuracy being in competition with one another for these
limited resources, with the result that capacity given to one of them will be at the
expense of the other. Apart from arguing that task complexity can increase accuracy
and complexity simultaneously, Robinson also proposes that fluency will be lowered
when there is greater task complexity, an influence on which Tradeoff is generally neutral, since task complexity is seen as influencing what is said rather than how it is said.
Interestingly, therefore, the field has a constructive dispute regarding the impact on
difference performance dimensions of making tasks more complex (Robinson 2011)
or difficult (Skehan 1998). Much research has been generated by researchers attempting to demonstrate that one position or the other is better at accounting for results
something the present volume will also attempt to do!
It is clear that the Cognition Hypothesis is more ambitious than the Tradeoff position. It is more wide-ranging and comprehensive in nature. It also has been the basis
for applications to pedagogy of a fairly extensive nature, especially as regards curriculum or syllabus design (Robinson 2011). It even has the virtue of making what, for me,
are counter-intuitive predictions, predictions of the sort that are likely, if sustained, to
make unanticipated contributions to the field. In contrast, Robinson (2007), in commenting on the Tradeoff position, has suggested that it is vacuous in comparison to the
Cognition Hypothesis, arguing that it does not lead to predictions, so much as to posthoc rationalisations of results. Cognition, in principle, should not have this weakness,
since predictions should flow from it rather readily.
So it is useful at this point to clarify what the status of the Tradeoff Hypothesis is (in
an attempt to show that it is not vacuous), and the nature of the foundation it provides
for the chapters in this volume. The starting point here is that I do not think we are,
currently, able to put forward strong and wide-ranging models of task performance. It
seems to me that three points of reference are vital. First, we do need to use what we can
from neighbouring disciplines. In that respect, I have argued elsewhere (Skehan 2009a)
that a model of first language speaking, such as Levelts (1989, 1999; Kormos 2006) has
to be the starting point for a credible analysis of the psycholinguistic processes involved
in second language speaking. This model (Levelt 1989) is impressive, but it targets a
first language speaker equipped with a first language mental lexicon. It is obviously
not immediately transferable to the second language case, but it does contain structures and processes which are bound to relate to whatever is done in second language
speaking. So it becomes, for me, an inevitable starting point, even if it has been shown
to have certain limitations for this different context (De Bot 1992). Second, against the
background that this model provides for organising our thinking about second language speaking, we then need to explore how attentional and working memory limitations are important in accounting for the differences between first and second language
speaking: this is where the Tradeoff perspective comes in, because it assumes that the
existence of a far less impressive mental lexicon will have strong influences on how
second language speaking will proceed. Analysing second language speaking through
the Levelt model and its component stages allows us to make sense of where and why
problems might occur. The model can bring out the pressures which cause what ideally
should be a parallel process (Conceptualiser, Formulator and Articulator functioning
simultaneously and in, modular, automatic fashion) to become serial and effortful. In
this way, we have a sounder basis for making predictions about performance (as some
of the later chapters will show). Without such underpinning psycholinguistic theory,
it is difficult to see how we can make much progress. That is, in order to hypothesise
the effects of task complexity or difficulty on performance, it is essential to understand
the kinds of psycholinguistic processes that underlie the production of linear stream
of speech. Third, as a foundational target, we need to establish a wide range of generalisations about patterns of performance and to gain understanding about important
variables, in other words to establish a database of findings, before we can go on to build
effective models for the second language case. Such models are desirable, obviously,
but it serves no-ones interests if what is put forward is premature. The various research
studies in this volume are an attempt to gain understanding about major influences on
second language performance as a precursor to model building. Even so, on occasions
predictions are possible, even if they are quite limited in nature.
So, given the absence of a convincing model of second language speaking, one has
to have some framework for research to counter any tendency for individual studies to
be conducted in a piecemeal fashion, and to run the risk that they do not contribute to
any coherent picture. It is for that purpose that I have put forward a general framework
for the investigation of tasks, a framework which has the potential to organise findings
as they emerge. The outline of the framework is shown in Table 1:
Table 1. A framework for investigating second language task performance
Task
task types and characteristics
task difficulty
Task Conditions and Implementation Phases
pre-task: preparedness, such as planning, repetition, familiarity
during task: task processing
{{ time pressure
{{ support (e.g. visual)
{{ information pressure
{{ negotiability and mediation
post-task
{{ post-task activities
{{ post-task exploitation
The framework is not intended to provide any theoretical insights. Instead, it is

based on fairly direct features of tasks and their implementation which at least means
Peter Skehan
that application to real situations where tasks are used is facilitated. First of all, we have
tasks themselves, and here a distinction is made between task types and characteristics and task difficulty. The second of these, task difficulty, where difficulty is seen as
inherent in the task, rather than learner-dependent, has, in my view, seen surprisingly
little progress. We are little closer now to having a scale of difficulty which could be
used, for example, to locate any task within a more extensive syllabus. There may be
features which can be argued to have simpler or more difficult values, such as number
of elements, or concrete versus abstract (and some of these figure in Robinsons analysis of task complexity). So given two tasks, difficulty ranking might well be possible.
But there is the problem that any task is likely to subsume a bundle of features, and
not all of these features are likely to be jointly simpler or jointly more complex, such
as a small number of concrete elements versus a larger number of abstract elements.
Further, tasks, given their nature, are likely to be strongly influenced by context, and so
what is difficult in one (pedagogic) context or for one particular learner may not be so
difficult in another, through, for example, cultural knowledge or age differences. Then
we have the difficulty that many (good) tasks are capable of multiple interpretations, so
that one person could interpret a given task to make it more difficult than another (this
connects with the wider issue of the predictability of tasks). Foster and Skehan (1996)
illustrated this with different participants making radically different interpretations of
the depth at which they should address judgements about custodial sentences for some
crimes they were given to adjudicate on. Overall then, although some progress has
been made on establishing relative task difficulty, a lot of issues remain to be unraveled
before we can reach a position to make any sort of reliable and valid judgement about
a particular task and the identification of factors that impinge upon this.
In contrast, the study of task characteristics has seen greater progress. This
focusses, in a more micro way, on the relationship between specific task characteristics
and particular performance features. Within the constraints of variation through context differences and different task interpretations by different participants mentioned
above, which push for less predictability, the search here is for connections to performance from analysable task features, such as type of information (concrete-abstract;
familiar-unfamiliar material) or organisation of this information (e.g. structuredunstructured). There is a range of findings in this area already (Ellis 2003; Skehan
2001; Foster & Skehan 2012), and we hope to contribute more through this volume.
The Tradeoff approach, fundamentally, tries to uncover such generalisations regarding the link between task characteristics (concreteness, familiarity) and performance
dimensions (complexity, accuracy, fluency), and use them to gain a better understanding of how the Levelt model can illuminate the case of second language learners.
A range of generalisations can then provide a more robust basis for understanding
second language performance, as is discussed in the final chapter of this volume where
the impact of influences such as information familiarity, and task structure on different aspects of performance is discussed.
Task characteristics, as it happens, are analysed differently by Peter Robinson. The

Cognition Hypothesis makes a big thing of the distinction between resource-directing
and resource-dispersing factors. In essence, resource directing is very close to the sort
of selective influence discussed here influences (independent variables) which have
predictable and close relationships with particular aspects of performance. These may
be specific within form, for example, promoting more feedback, or engaging specific
language features such as relativisation or article usage. Alternatively, they may be
more general in impact, as, for example, when (putative) task complexity simultaneously raises accuracy and complexity. Resource-dispersing factors, such as planning,
have a more general impact on attentional resources and functioning, and thereby
performance, and do not lead to predictions such as that of joint accuracy-complexity
increases (and so resemble more the approach taken by a Tradeoff approach).
This treatment of types of variable is a fundamental difference between Cognition
and Tradeoff. A Limited Attentional Capacity approach does not have any place for
a distinction between resource-directing and resource-dispersing variables. Rather it
pursues the goal of finding links between a whole range of variables and performance,
and sees no common quality which unites, for example, resource-directing variables. It
explores variables on a case-by-case basis, and tries to use findings from such research
as a basis for wider theoretical claims, such as adaptations of Levelts model to second
language spoken performance (Kormos 2006; Skehan 2009a). Research designs within
the Tradeoff approach, therefore, do not probe the functioning of any such construct
as resource directedness. The two approaches do have in common the usefulness of
task as a unit for research. However, from that departure point, as we will see, they
diverge, and the various chapters to follow in this volume illustrate the richness of
being unconstrained by any need to look at categories of variable, or to use these to
motivate research designs. Even so, both positions take task as a unit for research very
seriously, and seek to establish whether we can understand how specific task characteristics influence performance in a useful way.
Within the framework in Table 1, we turn next to task conditions and implementation phases. The phases are essentially before, during, and after. Naturally, we start
with the pre-task phase. The label given here is preparedness, which is intended to
bring out the generality of this phase. Task researchers have mostly interpreted this
phase as pre-task planning (an interpretation important for some chapters in this volume). But there is more to preparedness than this. Pedagogically, Willis (1996) for
example outlines the range of activities that can be used to get learners ready to do
a task more effectively, through carefully constructed pre-tasks, or development of
splash diagrams, or input from native speakers. Linking more directly to the research
literature, there are other aspects of preparedness than planning. Having already done
something similar (or identical) to the task before would be a factor. A special form of
this would be repeating a task, since then one could look on the repetition as the more
prepared for version of the task. Familiarity with the information in the task would
Peter Skehan
be another type of preparedness, as more generally, would be the greater relatedness

of the task to ones real life encounters, experiences, or interests. In this case, the task
itself, in a particular form, might never have been done before by particular students,
but the ideas which feed into the task would be accessible and organized from other
prior experiences. So, to sum this up, the pre-task phase contains planning, but a lot
more besides, as we shall see later, and as Bui (this volume) discusses extensively in
his chapter.
The impact of what students do after a task has also been researched, and proved
interesting. Two aspects of this are shown in Table 1.1. First, we have post-task activities, whose role is to change the way the earlier task was done. Here it is the anticipation of the post-task, and the way this signals that doing the task is not the end of
things, that is important. As we will see, this is a growing area of research (Lynch
2001, 2007) and one which links with concepts of focus-on-form nicely. But second,
we have what is termed post-task exploitation. The focus here is on the subsequent use
for pedagogical purposes of language that has been made salient by a task. Language
whose importance emerges in this way can be timely for development and consolidation by teachers. This has not been an area associated with much research (though
see Samuda 2001), but the possibility beckons. So far we have only had discussions of
pedagogic possibilities (Skehan 2011, 2013), but there is considerable scope for more
systematic empirical study.
From the above framework, it remains to consider the during-task or task processing stage, referred to in the table as task processing This area concerns options in
processing, and how a task will actually run, bringing in issues of time pressure (and
opportunities for on-line planning, or not), information pressure if a lot of input is
built into the task, and support while the task is being done, as well as the scope there
is for the person doing the task to shape what is happening, to influence its goals and
content and to slow things down or divert it if that is helpful. All these factors can
have a considerable impact on performance, as we have seen through Ellis discussion
of on-line planning, and Robinsons views on time perspective his Here-and-now
versus There-and-then conditions essentially pit input prominence against visual support, after all. So, whatever the basic task is, processing conditions can have an independent impact on performance.
All the factors shown in Table 1 may have an influence, but there are a lot of dimensions here, and research has only established a partial understanding of their various
impacts. The purpose of the framework is to try to summarise and organise what has
been done, and what remains to be done, and to do so in a way which does not prejudge theoretical issues. Taking the example of planning, we need to know more about
what happens when planning takes place before we can decide how it fits in theoretically. It is located as a resource-dispersing variable in the Cognition Hypothesis, that
is, relevant because of its impact on resources, but without any predictable influence on
performance. But it is possible that, depending on the nature of the planning activities,
it could equally well be a resource-directing variable (in Robinsons terms) and linked
to particular aspects of performance (such as accuracy). Without relevant research, we
cannot know. Similarly, staying with the assumption in the Cognition Hypothesis that
time perspective is seen as a task complexifying variable, a framework such as that in
Table One allows us to explore visual support and input dominance (which both operate in the Here-and-now condition) separately, and not assume that a construct such
as task complexity needs to be invoked. The framework also allows interactions to be
explored. It is a simple matter to hypothesise that different variables have a conjoint
effect (i.e. condition-seeking research; McLaughlin 1980), such that together they produce an effect that is more than the sum of the parts, such as for example planning being
explored in relation to more complex tasks; or structured tasks being studied in relation
to There-and-then processing (see Wang & Skehan, this volume).
So, a framework such as the one shown has to be judged by its utility, by its capacity to organise and generate interesting research results. The ultimate aim, though, has
to be model building, and a more theoretical account of performance (and possibly
development). In my view, the Cognition Hypothesis has rather jumped to this stage
before it has established an adequate empirical grounding. So I would prefer, as a theoretical position, something like a modified version of the Levelt model of first language
speaking, adapted for the second language case, and which takes account of the differences in these two cases, such as a less elaborate second language mental lexicon. This,
at least, gives us some theoretical credentials. But accountability to a range of findings
is fundamental. One has to remember the old saying logic is the art of going wrong,
with confidence to moderate our natural tendency to over-theorise. In any case, as we
shall see, the studies reported in this volume fit into the above framework quite nicely,
with several studies of planning, one of the post-task phase, and a number looking at
task characteristics, principally task structure. As we will see in the final chapter, the
framework enables us to consider just how ambitious models in this area can be. But
for that, we need to look at the individual studies in the volume.
The structure of the book

There are eight empirically based chapters in the book, all based on data collected during the time I worked in the English Department at Chinese University. They are based
either on Hong Kong Research Grants Council funded studies (three), all of which
were motivated by the framework from the last section and by my interest in the comparison of Tradeoff or Cognition accounts, or on individual Ph.D. dissertations that I
supervised (three) and these came from students interested in similar issues, but based
on research problems that intrigued each of them.
10
Peter Skehan
Three of the chapters are concerned with planning, and we will start with these.
Chapter 2 is by Zhan Wang (Jan), and is based on her Ph.D. at Chinese University. Jan
started out interested in general planning, and so examined the literature for interesting possible research questions. She became intrigued by Ellis notion of on-line
planning (Ellis 2005). Planning, in general, is benign in its influence, and is associated with raised task performance, in a fairly general way. But the generalisation that
emerges in the literature is that complexity and fluency are consistently increased,
with large effect sizes, whereas accuracy is not so dependably affected, and when
it is, the effect sizes are smaller. This set of findings has been around for some time
now, and Ellis aim was to shed light on why accuracy seems to be more difficult to
influence through planning. He proposes that one can distinguish between pre-task
(or strategic) planning, and on-line planning. Pre-task planning is done, obviously,
before the task itself, with time allocations for this varying, but typically lasting ten
minutes. This, Ellis proposes, is good for complexity and fluency. On-line planning,
he proposes, is the sort of planning which takes place on the fly while speaking is
taking place and speakers reorganize their speech while continuing to talk. The Levelt
model fits well with these two types of planning. The model contains modular stages,
so that initial Conceptualisation is followed by different sub-stages within the macrostage of Formulation, and then Articulation, the actual speech production stage
completes the process. For any individual speaker all stages operate in parallel, since
different things are happening at the same time in each. Current Formulation, for
example, is the result of previous Conceptualisation, while current Conceptualisation
is yet to impact on the Formulator, but soon will. (See Wang, Chapter 2, this volume,
for a more extensive discussion)
Ellis proposes that when processing conditions are demanding, the second language speaker has difficulty in sustaining this parallel processing, and as a result,
accuracy suffers. In contrast, when processing conditions are less demanding, it is
possible for the speaker, even the second language speaker, to give attention simultaneously to Conceptualisation (i.e. preparing plans to be ready for the next stage),
while at the same time devoting enough attention to current Formulation, thereby
achieving greater levels of accuracy. Jan followed this reasoning, but was unhappy
with Ellis actual operationalisation of this distinction between planning types.
She felt there was scope to introduce modifications to distinguish between the two
planning types more clearly. So her basic aim was to introduce a methodological improvement. But she also wanted to explore more about Levelts third general stage, Articulation, and she linked this with the use of a Repetition condition.
She reported a set of results which bring out the usefulness of pre-task planning,
and also that of on-line planning, and most interesting of all, of their synergistic
effects in combination. Repetition, too, proved to be a rewarding variable to have
worked with.
Bui Hiu Yuet Gavin (Chapter 3), also looked at planning, but in a different though
similarly creative way. As we have just seen, the typical research design in the planning literature is to give second language speakers ten minutes to plan before they
do a speaking task. But planning, more broadly, is an aspect of preparedness (see the
earlier section), and this is much wider in scope. Ten minutes are given, after all, to
plan something you had no idea about moments before. However, preparedness can
take many other forms, such as having told the same story before (and see Jans use of a
repetition condition in her chapter). Or it could mean being more deeply f amiliar with
what is talked about and even talking about something which is important already
in ones life. Bui takes just such an approach in his study. In a clever research design,
he compares the effects of conventional planning (the approach that is typical in the
literature) with the effect of speaking about something familiar. Not only does he provide interesting results on this issue, he also puts forward a model to cover the various
senses of planning hinted at earlier in this paragraph and the previous section (i.e.
including on-line planning). He prefers the term readiness to capture this wider view
of planning. In this way, he contributes to enabling more sophisticated use of the construct of planning in future research.
The third chapter on planning (Chapter 4 in this volume) is based on a study
conducted by Francine Pang and myself as part of a Hong Kong RGC grant. Once
again the starting point is that the planning literature is extensive, that there are now
attested findings, but that there is more work to be done, particularly at the explanatory level. Specifically, there is the issue that although we have a range of findings,
we have tended to black box planning, and embed it within quantitative research
designs. Increasingly studies do interview participants after planning (Tavakoli &
Skehan 2005), but this tends to be only to ask if they thought the planning time was
worthwhile, and whether they thought they benefitted. What is missing is thorough
research on what participants say they do when they plan, although there is one major
exception to this the work of Lourdes Ortega (1995, 1999, 2005). However, as Ortega
herself states, it is remarkable that so little qualitative research has been done to peer
inside planning processes. The study reported here is an attempt to redress this state
of affairs, a little at least. We gave participants a narrative to tell, and then Francine
engaged them in retrospective interviews (RIs), probing what they had done during
the planning time that was available. She then developed a coding scheme to categorise the various processes they reported engaging in, a coding scheme which started
from the transcribed interviews, but which, as it happens, can be related to elements
of Levelts model of speaking. Next we did something we have not found anywhere in
the literature. We looked at the association between the planning behaviours which
were reported and the actual performances which were produced, generally with the
intention of exploring whether some planning behaviours are more effective than others. The chapter reports on how we made sense of the successful behaviours. But it also
11
12
Peter Skehan
gives an account of something which surprised us. It was just as important to discover
which reported planning behaviours were harmful to performance, or at least to certain aspects of it. So it isnt simply what you do during planning that counts. Its just as
important to know what you shouldnt do.
There is one more study which focusses on a task condition: Chapter 5 by Li Qian
is on what happens after a task, rather than before it. Li Qian (Christina) had become
interested in research that I had done with Pauline Foster (Skehan & Foster 1997;
Foster & Skehan 2013) which had explored the impact on task performance of anticipating what activity will follow the task. We had shown, tentatively at first (Skehan&
Foster 1997) but more robustly later (Foster & Skehan 2013), that a post-task activity
can have focused effects on performance. Our initial hypothesis had been that posttask effects are confined to raising accuracy, as was the case in Skehan and Foster
(1997). The second of our studies (Foster & Skehan 2013) had shown that anticipation of the need to transcribe some of ones performance post-task did raise accuracy,
with narrative and decision-making tasks, but also raised complexity with the second
of these tasks. We had hypothesised that a post-task activity would raise pedagogic
targets and induce participants to try to avoid error. In the event, the post-task activity
seemed to impact upon form in general, and also led participants to use more complex
language on one of the two tasks. Christina was interested in this, and liked the way
we had shown that self-transcription had desirable effects. But she felt that post-task
transcription was something of a blunt instrument, as we had used it (though she was
too polite to put it like that), and that there was scope to explore if different operationalisations of post-task transcription might have different effects on performance.
She complexifies what is possible with a post-task manipulation, using individual vs.
group-based transcription, and transcription with or without rewriting of an ideal
version. The chapter reports on her study, and shows that the impact of post-task transcription is indeed more complex than had been thought. It also enables another point
to be made, one that could be brought up elsewhere, too. All the studies reported in
this volume have L1 Mandarin or Cantonese speakers, and the data was collected in
Hong Kong, Macao, or in Guangzhou. There are obvious issues of generalisation here,
and one can ask whether the research is relevant for people outside this relatively narrow geographical context. Christinas research is interesting, however, because she is
partially replicating what had been done elsewhere (with a range of different L1s in the
Foster and Skehan research). The fact that she produced results which compare quite
well with those from the earlier studies does give us confidence that generalisation is
indeed possible, and that the findings reported in this volume do not apply only to
particular L1s.
We turn next to studies which have explored task characteristics. First of these is
the chapter I wrote with Zhan Wang (Chapter 6, this volume), which is the product of
an RGC grant on which she was a researcher. In the introductory section, Imentioned
the competing merits of the Cognition and Tradeoff approaches to understanding

task performance. The Cognition approach predicts that task complexity will raise
accuracy and complexity simultaneously. The Tradeoff approach proposes attentional
limitations as a constraint on performance, but does not at all preclude accuracy and
complexity both being raised simultaneously. Indeed people connected with Tradeoff
have published research to this effect (Skehan & Foster 1999; Foster & Skehan 2013;
Tavakoli & Skehan 2005). The Tradeoff interpretation of this kind of finding is that
separate influences, if they apply in combination, work together to produce the jointly
raised effect. The study reported on here builds on the (somewhat accidental) results
from these other studies, and puts the relevant variables into the research design. It
looks at task structure (a clear influence that has emerged in the literature) and time
perspective (a central variable in the Cognition Hypothesis, and one which has generated remarkably little supportive evidence for it). The study predicts that the conjoint
influence of these variables will produce a simultaneous accuracy-complexity effect,
but for totally different reasons than the Cognition Hypothesis would propose, that
is, nothing to do with task complexity. The study incorporates another variable, lexical difficulty, as an additional potential influence on task performance, on the basis of
other post-hoc analyses of previous studies (Skehan 2009b).
Structure also figures in the next study, co-written with Sabrina Shum (Chapter7),
and also the product of a Hong Kong RGC grant. One of the issues in the literature is
that structure tends to have been investigated with cartoon picture series. This volume
has more than one study where the focus of a narrative is video-based retelling. That is
the case with the Shum-Skehan chapter. Mr. Bean, an unsung hero of applied linguistics research, is the basis for a series of videos which vary in their degree of structure.
The first part of the study looks at the effects on performance of levels of structure,
defined with reference to discourse analysis research, in the context of retellings while
the video is running (and therefore putting some considerable processing pressure
on the participants). The second dimension of the study is to look at processing conditions. One comparison here is the Robinsonian time perspective (Here-and-now
versus There-and-then). But the study also includes variations on the Here-and-now
condition, with the attempt made to provide some degree of mitigation to see if this
can help the narrators by easing processing pressure, even with a video rolling in real
time. The study adds considerably to our understanding of the importance of the variable of structure, and a little to our understanding of processing.
The final chapter, written by me, attempts to summarise and extend the findings
from the eight empirical chapters, and to use them, where appropriate, in relation to
pedagogy. The theoretical-empirical first half of the chapter has two broad goals. First,
it takes some of the findings which cut across the separate studies, and relocates them
within the literature. There is an extensive analysis of planning and this is related to
existing models of speaking and task performance. The analysis incorporates the wider
13
14
Peter Skehan
view of preparedness, of the role of familiarity, of on-line planning, and of repetition.

There is also an analysis of structure, and how this relates to performance (and even to
planning). The findings from all three of the RGC projects demonstrate that structure
is a key aspect in assisting second language speaking. There is also some discussion
of the influence on performance of processing pressures, and the way these pressures
can be mitigated. Second, previous attempts to organise how task and condition influences impact upon performance (complexifying, easing, pressuring, and focussing)
are reviewed and extended. They become the basis for examining the conditions under
which second language learners are more likely to sustain parallel processing (and so
function consistently with the Levelt model) as opposed to being pushed down to a
serial mode of processing (Kormos 2006; Skehan 2009a).
The remainder of the final chapter is concerned with pedagogy. None of the
research studies was directly concerned with pedagogy, but they do have implications
for it. The chapter explores these implications, and extends them in two ways. First, it
updates a set of pedagogic principles I first put forward in 1998, and tries to link these
principles with task research findings. Second, the chapter presents the argument that
the post-task phase is the key phase if one is to make links between actual performance
and longer-term development and change.
Measurement issues
We have now set the scene for the chapters which follow, and we have introduced each
chapter separately. But there is one additional issue which cuts across all the studies, and that is how performance was measured. All studies reported in this book use
quantitative data (even the qualitative study of Pang & Skehan). As much as all studies
were conducted within the research framework outlined earlier in this chapter, they
were all based on (roughly) the same approach to measurement, and so it is useful here
to outline this approach. This saves repetition within the individual chapters. It also
offers suggestions for methodological progress in the field, since a fairly comprehensive set of measures is being proposed.
It is worth saying at the outset that there are many things which could be measured in task performance. However, since such measurements are time-consuming,
what is chosen is invariably chosen at the expense of something else. Interactional
moves (Long 1996), symmetry (Van Lier & Matsuo 2000), discourse markers, conversational feedback (Lyster & Ranta 1997) are all possibilities and have been used by
others. But in the present case, and continuing to follow a rather cognitive approach,
the focus will be on complexity, accuracy, fluency, and lexis (Skehan 2009a; Housen&
Kuiken 2009). These dimensions have been used in a huge number of studies now.
They were introduced earlier, with the point that studies have shown independence
between these areas higher proficiency does not simply mean that people score generally higher in each dimension. In fact, they can often function very distinctly from
one another, and so it is interesting in itself to explore when they correlate with one
another and when they do not.
Given that these areas are the focus of measurement, one still needs to discuss the
issue of specific versus general measures. Some researchers (e.g. Crookes 1989; Ortega
2005; Robinson et al. 2009) prefer to use specific measures, of, for example, article usage,
or verb concord, or pronoun use. Such a preference may have strong supportive arguments, with the idea that specific measures have greater construct validity, and that
provided one chooses appropriately, individual measures can be used to detect the influence of experimental conditions. For example, specific measures can reflect the nature
of the task chosen. The use of pronouns, for example, might be justified in this way
with a narrative retelling where clarity of reference is particularly important. The central problem with this approach, in my view, is not at all theoretical, but only practical.
When one is working with second language spoken performance, one has to work with
relatively brief performances, often less than five minutes in length. These, not infrequently, generate below two hundred words. The difficulty, therefore, is that there may
not be enough tokens to work with. If one wants, for example, to work with pronouns,
it is quite possible that in two hundred words or so, there may not be enough examples
of pronouns (or appropriate pronoun contexts) to give the sensitivity that is required to
detect differences between experimental groups. This, obviously, is a great limitation. It
is for this reason that we have, in this volume, preferred to work with generalised indices. These do not focus on particular areas, but use measures which draw on as much
of the sample as possible in order to have a sufficiently rich sampling of data. One loses
in precision of hypothesis-making, but one gains in detecting influence. (Logically, one
might use a two-step strategy here the first phase would try to detect influences which
are strongest, through general measures, and the second phase, in follow-up research,
could use this information with specific measures which would then be more likely to
be sufficiently sensitive).
Next, we need to explore how exactly we used these general measures of complexity, accuracy, fluency and lexis. Before that, though, some general discussion of analytic
procedures is necessary. The CHILDES system (MacWhinney 2000) exists to facilitate
data transcription (through the CHAT set of conventions), with subsequent analysis
of data possible through the associated CLAN suite of programs. The approach was
developed for first language acquisition data but is increasingly used for second language spoken performance also (Marsden et al. 2003). It can be very useful, and so
the data I have been associated with for some time has been transcribed in the CHAT
format. But the programs in CLAN, excellent though they are, do not really provide
clear indices relevant to the measurement profiles we want to work with. Accordingly,
in all the research which follows, data is coded in a modified CHAT format, in that
15
16
Peter Skehan
additional timing information is provided, capturing the beginning and end points of
each AS-unit, and then an extra line is added to the coding for each AS unit (Foster
etal. 2000), and this line contains codes relevant for the calculation of a range of complexity, accuracy, fluency and lexis measures. These will be described more fully below.
The issue, then, is how to analyse the additional fourth line. This is done, for all the
chapters in this volume, through the use of TaskProfile, a computer program written in
Delphi, which outputs a wide range of measures. The disadvantage of this is that transcription and coding have to follow a set of conventions exactly. The advantage is that
once the time-consuming transcription and coding are done, the program generates
results virtually instantly. In addition, there is the advantage that if additional ideas
about performance emerge, new measures can be developed and incorporated into the
program fairly easily, and then new results obtained. Accordingly, the next section will
outline the measures that are available. Obviously the focus will be on the measures
which have actually been used in the studies that follow, but brief mention may also
be made about other measures which are output by TaskProfile, since, methodologically, they could have value for other researchers. The following sections will explore
measures of complexity, accuracy, fluency, and lexis.
The standard measure of complexity which has been used in many task-based
studies is that of degree of subordination. No-one believes that subordination, in itself,
is complexity, but it is taken as a good, general-purpose surrogate measure. CHAT
codings are in terms of AS units, a measure Foster et al. (2000) argue is more appropriate for spoken language than the T-unit, and then TaskProfile computes the ratio of
total clauses, that is, main clauses plus other clauses, finite and non-finite, to AS units.
The minimum value here would be 1, with no subordination at all, so that the number of total (matrix) clauses and the number of AS clauses are identical. Typically, for
second language speakers, one gets values above 1.2, generally but not always below2,
with group means often in the range 1.4 to 1.6. This index has been used widely and
has been shown to be sensitive to experimental differences in a consistent way. However, it is far from the only method of assessing complexity. Measures of range of structural use have also been used (Foster & Skehan 1996), but their use is not widespread.
More recently, Norris and Ortega (2009) have proposed that the subordination measure is not so effective at higher levels of proficiency. Indeed, studies which have native
speaker baseline data provide some supportive evidence for this, since native and nonnative speaker groups often do not differ very much on subordination scores (Skehan
2009b). Norris and Ortega (2009) propose instead that measures based on the number
of words in clauses capture a different dimension of complexity, and that this is more
sensitive to differences at higher proficiency levels. Accordingly, TaskProfile computes
the scores for number of words in AS units, in matrix clauses, in finite subordinate
clauses, and in non-finite subordinate clauses. Several of the chapters draw on this
possibility as appropriate and in so doing allow an investigation, in passing, of the
construct validity of the two types of complexity measures.
The next performance area to consider is that of accuracy. Here the standard
index is to compute the proportion of all clauses that are error-free, so that clearly we
are dealing here with values between the minimum of 0 and a maximum of 1. This
index too has been very serviceable, and has been used in very many studies. On the
basis of this work, one could claim that the index is a good way of detecting a difference if there is one. But although it may be the most widely used method of measuring accuracy in task-based performance, it is not the only option. Mehnert (1998)
proposed that the measure is influenced by the clause structure of the language being
used, and that for L2 German, a more appropriate measure is errors per 100 words.
She proposes that this is a better general measure, since more or less clausality in a
language does not affect it, and so it has greater crosslinguistic comparative utility.
Skehan and Foster (2005) introduced a yet different method of measuring accuracy.
They were worried that an error-free clauses measure was vulnerable to distortion
in cases where a speaker used a large number of very short clauses in their speech,
with these clauses being more likely to be correct. The resulting score, they proposed,
might be inflated, and not constitute a true index of accuracy. So they proposed
building in to a measure of accuracy a safeguard through using clause length. Their
proposal is that one ranks all the clauses that have been produced for number of
words. Hence, one would bring together all two word clauses, all three word clauses,
and so on, up to whatever length of clause is produced. Then they propose calculating
the proportion of clauses that are correct for each length. Finally, one takes this information, establishes a criterion, and then establishes the cutoff point that a particular
participant has reached. So, if someone produced the following data: 2-word, 100%
correct; 3-word, 100% correct, 4-word, 90% correct, 5-word, 80% correct, 6-word,
70% correct, 7-word, 60% correct, 8-word, 50% correct, 9-word, 30% correct, and if
one had set the criterion of 70% correctness, then this speaker would be awarded a
score of 6, since 6 words is the highest length that is being produced at the requisite
accuracy.
Of course, not all speakers are considerate enough as to produce scores which
lend themselves to scoring so neatly. Skehan and Foster (2005) therefore propose decision rules to handle more difficult cases. If a speaker, for example, fails a criterion at
a particular level, but then produces the next two clause lengths at criterion or better,
they are excused the blip of the one level where they failed. But this only applies if
there is one blip. If they fail at more than one level consecutively, the score is given as
the length of clause at the last level that reached criterion. This additional rule handles
difficult cases very well, and so this becomes a reliable measure to use. It is important
to say that this illustrates the advantages of using a computer program to score coded
spoken language performance. As far as TaskProfile is concerned, there is negligible
additional effort in scoring accuracy in this way. The decision as to the score to be
assigned is done manually, that is, by the researcher, but this is based on the data which
is laid out by the program in tabular form, and so is the work of an instant.
17
18
Peter Skehan
As a result, we now have three accuracy measures: error-free clauses, errors

per 100 words, and what has been termed length accuracy. There is, however, one
more potential problem that is relevant for one or two of the chapters in this volume.
Foster and Wigglesworth (2010) propose another threat to the validity of accuracy
measurements that they treat all errors alike, however serious or trivial they might
be. They propose a scheme to capture error gravity, suggesting that three levels of
error should be distinguished, broadly through the extent to which they impair communication. At the lowest level, if an error does not impair communication or slow
down comprehension except minimally, then one should categorise this as a superficial error. More seriously, if the hearer can understand, but has to make some effort to
do so, perhaps with a little delay, this should be regarded as an intermediate level of
error. Finally, errors that compromise comprehension, leaving the listener uncertain
what the speaker was trying to communicate, are categorised as serious errors. Foster
and Wigglesworth (2010) provide guidelines and examples to assist in rater coding of
error gravity, to try to overcome problems of estimating what hearer difficulty would
be. They report high inter-rather agreement, something which needs to be established,
in any case, for each study which uses this approach.
In this way, for all of the methods of accuracy measurement outlined above, there
are two or three versions. We will take errors per 100 words as an example. Three versions exist if one scores separately for the three levels of error, for instance errors per
100 words for all errors; errors per 100 words only counting intermediate and serious
errors; errors per 100 words only counting serious errors. This approach could be justified on theoretical grounds no one can really regard all errors as equally significant.
But there is also a pragmatic issue. Participants in research studies vary in general proficiency. It is easy to make small errors, so for lower intermediate students, to count all
errors could mean rather low accuracy scores, and more seriously, lack of discrimination. This attenuation in range might then have implications for statistical work, suggesting that a more lenient scoring procedure would be more effective. The example
was in terms of errors per 100 words, but the two other methods can be adapted similarly. Preliminary work with error gravity suggests that intermediate and serious errors
are best combined, generating two error scores for each method, for example errors per
100 words, all errors; errors per 100 words, only intermediate and serious errors. As
the reader will have guessed, TaskProfile happily outputs all these results in a twinkling,
provided, of course, that error gravity has been coded in the first place.
We turn next to the measurement of fluency. This area is far from straightforward, not least because of the complexity of fluency (or dysfluency) itself. Many different things contribute to fluency, which I have brought together in the diagram below
(Figure 1).
In the figure, a distinction is made between disturbances to the flow of speech,
and then the speed of the speech which is produced. A major factor in flow is pausing.

Clause boundary
Silent
Mid-Clause
Pausing
Filled, e.g. um, ah
Non-silent
Flow
Pseudo, e.g. like, actually

Reformulation
Replacement
Repair
Repetition
False Starts
Pruned speech rate
Speed
Unpruned speech rate
Figure 1. Components of fluency and dysfluency
There are a number of distinctions within this category. First there is the distinction
between silent and non-silent pauses. Regarding silent pauses, one first has to deal with
the problem of how long (or short) a pause needs to be to qualify as a pause or not. In
the present research, 0.4 seconds is used as the cutoff point. Other values have been
proposed to distinguish between simply taking breath and actually pausing, some as
low as 0.28 seconds. We use 0.4 in the present case as a sort of compromise measure,
brief enough to capture very small interruptions to the speech stream, but long enough
to make manual coding feasible. Within silent pauses (termed breakdown fluency,
Skehan 2009b) one can examine separately pauses which occur at clause boundaries
and those which occur mid-clause. Segalowitz (2010), in a major review of fluency, suggests that it is particularly useful to look at measures which distinguish most effectively
between native and non-native speakers. In that respect, since native-speaker speech
contains quite a bit of clause boundary pausing, but much less mid-clause pausing, a
measure of such mid-clause pausing might be particularly effective (Skehan 2009b).
In addition, one could compute derivative measures, such as the ratio of boundary to
mid-clause pauses as another way to detect level of disruption to speech. One could
also explore the average length of pause, either at clause boundaries or mid-clause.
However, one can also regard certain verbal behaviours as constituting filled
pauses, and these come in two main flavours. First we have classic filled pauses, such
as um, ah, etc, where some interjection is placed into the speech stream to buy time
19
20
Peter Skehan
as it were. There is no meaning, and possibly little difference between this and an
unfilled pause (save that it may be more effective at keeping the floor). But one does
need to consider such interjections as pauses. In addition, one can argue that certain
forms of actual speech are more properly regarded as pauses, in that they contribute
no meaning to the ongoing discourse, and serve mainly to ease the pressure of time,
and perhaps keep the floor also. Forms such as you know, like can be analysed in
this form, as perhaps a word such as actually, although in coding data it is important
to distinguish between these words used meaningfully and their use as pseudo-filled
pauses. Actually, for example, is not always empty in meaning. Since all these pausing
measures, unfilled as well as filled, are affected by how much is said, and especially how
many words are used, they are standardised per 100 words.
Next to indices of a breakdown in the flow of speech, there are also occasions
where the speaker attempts to make changes to what is being said, rather than simply
having problems saying it. These attempts, which have been termed repair fluency (or
more properly repair dysfluency), can be realized in many different ways, as reported
in the second language speaking literature. Reformulations are occasions where the
speaker changes what has been said by modifying the syntax or morphology, either
by changing something or by inserting or deleting something. In contrast, Replacement, which also consists of change, is focused on lexical elements, so that syntax and
morphology remain unchanged, but something is done about the actual words which
are used. Repetition is self-evident: words or sequences of words are simply repeated,
without any intervening material. False Starts are occasions where something is abandoned, and some new form of expression is used. Of course, as with number of pauses,
there is the issue with all these measures that they occur more when speakers say
more. Accordingly, they are standardised per 100 words of discourse.
In contrast to measures of flow, one can also look at the speed with which language is produced. Logically, one can separate flow and speed, and imagine someone who paused a lot, but who, when they were speaking, spoke fast, and the reverse,
someone who speaks slowly but without interruption to the flow. Hence, it is useful
to have measures of each, distinct from one another (DeJong et al. 2012). Typically
measures of speech rate, expressed as words or syllables per minute, are either pruned
or unpruned. In the latter case, the raw number of words is used, including repetitions,
reformulations and so on. In the former case, the additional material is removed, and
the measure is of meaningful, contributing words or syllables per minute.
In addition to the measures we have now covered, there are two more to be considered. They are both, effectively, composite measures of aspects of dysfluency. Phonation time simply captures the proportion of the time speech is taking place, and
subtracts from total time the time spend pausing, so it reflects not simply number
of pauses but also length of pausing. Length of Run is a measure of the average span
in speech without any sort of interruption, whether a pause or a repair, and has been
roposed as a measure of automatisation in speech (Towell, Hawkins & Bazergut

p
1996). These two measures are therefore blends of several other measures but have the
potential to index level of general fluency more validly.
TaskProfile outputs all the above measures (assuming of course that things like
pseudo-filled pauses and pausing and repair have been coded). Now all this is useful,
but only up to a point. We still await a convincing theory of second language fluency, so the existence of all these measures is a mixed blessing. There is much work
to be done in the area of fluency alone to understand what its major dimensions are,
and how the different measures might relate to different underlying psycholinguistic
processes. The chapters in this volume do not set themselves that task (although Bui
comes close). Accordingly, the different authors took a pragmatic stance, and made
slightly different decisions. Perhaps one can regard examining results for a measure of
end-of-clause pausing, mid-clause pausing, and a repair measure as a useful basic set
here, to avoid proliferating measures. The different chapters will reveal how the different authors wrestled with this problem of choice.
Finally, we come to the area of lexis in second language task-based performance,
an area that I argue (Skehan 2009a, 2009b) has not received adequate attention in the
task literature. There are three measures to consider here, the type-token ratio (as an
index of lexical diversity), lexical density, and lexical sophistication. The type-token
ratio is far-and-away the most used measure in the second language field. It is the
ratio of different words to total words. In other words, the more particular words are
repeated in a (spoken) text, the lower this value becomes, and of course, conversely,
the more a speaker keeps bringing in new words, the higher the value will be. The
major issue here is that there is a very strong inverse relationship between length of
text and magnitude of type-token ratio (Richards & Malvern 2007), with this being
of the level of -0.70 with second language speakers (Foster 2001). So, to cite a typetoken ratio in itself is meaningless without information about the length of text.
In practice what this means for researchers is that some method has to be found
which corrects for text length with type-token measures. Fortunately, a range of procedures exist. The choice made in the present research follows from the decision
to adopt CHAT conventions, and the consequent availability of CLAN programs.
These include VocD (Malvern & Richards 2002) which takes a CHAT-formatted file,
and then calculates a statistic, D, which captures lexical diversity independent of text
length. Given the relatively short texts typical with task-research (and so a great vulnerability to text length effects on type-token ratios), this is a very effective solution
to a difficult problem.
Another aspect of lexis in second language speech is lexical density. This, following
work by Halliday (1975), first distinguishes between structure and content words, and
then calculates a ratio of content (or lexical) words to total words (i.e. structure plus
content words). This gives an indication of the extent to which lexical words penetrate
21
22
Peter Skehan
a spoken text, and could be taken as an indicator of the propositional density of text.
(But see Skehan 2009b, for additional discussion of such measures.)
The final aspect of lexis to be considered is lexical sophistication. In ideal form,
this reflects the extent to which the speaker draws upon difficult words in what they
say. In practice, defining difficulty is not so easy, and the typical approach which is
taken is to use frequency (or rather low frequency) as a surrogate for difficulty. So,
the task becomes one of finding a measure to capture the extent to which less frequent words are used by a speaker doing a task. Laufer and Nations (1999) Lexical
Frequency Profile is one means of doing this, but when the present research program
started, before my move to Hong Kong, there was no means of using this based on
any spoken language corpus. Accordingly, I adapted a computer program written by
Paul Meara, Plex (Meara & Bell 2001). This program divides a text up into ten word
chunks, and then calculates how many words in each ten-word chunk are of a lower
frequency. It then uses a Poisson Distribution (developed to model low frequency
events) to estimate a parameter, Lambda, which captures the extent to which the text
draws upon lower frequency words. This approach has been shown to be effective
with quite short texts (Bell 2003), and so it is useful for texts such as we use in the
task-based field. I adapted Mearas original program in three ways. First, I was able
to use a different frequency corpus as the basis for the decision making with any
particular word. I based my version of his program on the spoken component of the
British National Corpus. Second, I built a BNC-based reference dictionary for the
program which was lemmatised, and so the program outputs both lemmatised and
non-lemmatised values of Lambda for any particular text (although typically it is
the lemmatised value which is used). Finally, the program allows the user to specify
the cutoff frequency to separate low and high frequency words. In all uses of the
program in chapters in this volume, this was set as 150 occurrences per million running words. Further details on this statistic, as well as a discussion of other aspects
of lexical measurement in second language spoken language are provided in Skehan
(2009b).
As a result, we have three measuring procedures, lexical diversity, lexical density,
and lexical sophistication (with these terms taken from Read 2000). The first question
to consider is what each of them measures, and second, how they interrelate. One can
propose the following:
Lexical diversity captures the extent to which a speaker draws upon a wide range of
words in what they say, compared to a speaker who recycles a smaller set of words.
The measure is neutral as to whether high or low frequency words are used. What
is important is how words relate to other words within that text. For this reason
lexical diversity is referred to as a text-internal measure (Daller et al. 2003).
Lexical density reflects the penetration within a text of content words, as opposed
to reliance on structure words. It is thought to reflect the density of propositions
in the text, and is also considered to be likely to be different in spoken and written
language.
Lexical sophistication is an index of the speakers capacity or preference for using
less frequent words, which presupposes knowledge of such words (implying a
larger second language lexicon) as well as a capacity to mobilise them on-line
(lexical accessibility).
In a sense, these characterisations are tantalising. They hint at differences, but there seems
to be a considerable degree of overlap as well. Wouldnt high lexical diversity tend to go
with lexical sophistication, for example? However, Skehan (2009b) demonstrated quite
a bit of independence between measures in each of these areas. The truth is that at the
moment we are equipped with measures but are not sure exactly what they are getting at.
Typically, in second language studies (Skehan 2009b), measures of lexical sophistication
are more likely to show differences between groups or conditions than do the other measures. But there is a case for including all of them: lexical diversity, because it has been the
measure of choice more often than any other; lexical density, because of Hallidays (1975)
theoretical justification of this construct; and lexical sophistication, not only because
it may be the best bet for detecting differences, but also because size of mental lexicon
may be an issue with second language learners, and so such a measure may reveal how
different tasks and different task conditions enable second language speakers to draw on
that lexicon more or less effectively. In any case, we will see in some of the chapters in
this volume that characterising second language task performance without incorporating measures of lexical involvement is a hazardous undertaking.
In all, we can conclude that many measures are available. It is to be hoped, as the
reader goes through the chapters in this volume, that this range of measurement possibilities is itself put to the test, and we may gain some insights as to which measures
are most effective in such contexts.
References
Bell, H. (2003). Using frequency lists to assess L2 texts. Unpublished Ph.D. thesis, University of Swansea.
Crookes, G. (1989). Planning and interlanguage variation. Studies in Second Language Acquisition,
11, 367383.
Daller, H., Van Hout, R., & Treffers-Daller, J. (2003). Lexical richness in the spontaneous speech of
bilinguals. Applied Linguistics, 24, 197222.
DeBot, K. (1992). A bilingual production model: Levelt's Speaking model adapted. Applied Linguistics, 13, 124.
23
24
Peter Skehan
De Jong, N., Steinel, M.P., Florijn, A., Schoonen, R., & Hulstijn, J. (2012). The effect of task complexity
on functional adequacy, fluency and lexical diversity in speaking performances of native and
non-native speakers. In A. Housen, F. Kuiken, & I. Vedder (Eds.), Dimensions of L2 performance and proficiency: Complexity, accuracy, and fluency in SLA (pp. 121142). Amsterdam:
John Benjamins.
Doughty, C. (2003). Instructed SLA: Constraints, compensation, enhancement. In C. Doughty &
M.H. Long (Eds.), The handbook of second language acquisition (pp. 256310). Oxford: Blackwell.
Doughty, C. & Williams, J. (1998). Pedagogical choices in focus on form. In C. Doughty & J. Williams
J. (Eds.), Focus on form in classroom second language acquisition (pp 197262). Cambridge: CUP.
Ellis, R. (2003). Task-based language learning and teaching. Oxford: OUP.
Ellis, R. (2005). Planning and task-based performance: Theory and research. In R. Ellis (Ed.), Planning and task performance in a second language (pp. 334). Amsterdam: John Benjamins.
Foster, P. (2001). Lexical measures in task-based performance. Paper presented at the AAAL Conference, Vancouver, Canada.
Foster, P. & Skehan, P. (1996). The influence of planning on performance in task-based learning.
Studies in Second Language Acquisition, 18, 3, 299324.
Foster, P., & Skehan, P. (2012). Complexity, accuracy, fluency and lexis in task-based performance: a
synthesis of the Ealing research. In A. Housen, F. Kuiken, & I. Vedder (Eds.), Dimensions of L2
performance and proficiency: Complexity, accuracy and fluency in SLA (pp. 199220). Amsterdam: John Benjamins.
Foster, P., & Skehan, P. (2013). Anticipating a post-task activity: The effects on accuracy, complexity
and fluency of L2 language performance. Canadian Modern Language Review 69, 3, 249273.
Foster, P., Tonkyn, A., & Wigglesworth, G. (2000). Measuring spoken language: A unit for all reasons.
Applied Linguistics, 21, 35475.
Foster, P., & Wigglesworth, G. (2010). Towards a new measure of accuracy in task-based second
language performance. English Department, St.Marys University, Twickenham.
Halliday, M.A.K. (1975). Spoken and written language. Gelong: Deakin University Press.
Housen, A., & Kuiken, F. (2009). Complexity, accuracy, and fluency in second language acquisition.
Housen, A., Kuiken, F., & Vedder, I. (2012). Dimensions of L2 performance and proficiency: Complexity, accuracy, and fluency in SLA. Amsterdam: John Benjamins.
Kormos, J. (2006). Speech production and second language acquisition. Mahwah, NJ: Lawrence
Erlbaum Associates.
Laufer, B., & Nation, P. (1999). A vocabulary-size test of controlled productive ability. Language
Testing, 16, 3351.
Levelt, W.J. (1989). Speaking: From intention to articulation. Cambridge: CUP.
Levelt, W.J. (1999). Language production: a blueprint of the speaker. In C. Brown & P. Hagoort (Eds.),
Neurocognition of Language (pp. 83122). Oxford: OUP.
Long, M.H. (1996). The role of the linguistic environment in second language acquisition. In W.
Ritchie & T. Bhatia (Eds.), Handbook of Research on Second Language Acquisition, (pp. 413468).
New York, NY: Academic Press.
Lynch, T. (2001). Seeing what they meant: transcribing as a route to noticing. English Language
Teaching Journal, 55, 124132.
Lynch, T. (2007). Learning from the transcript of an oral communication task. English Language
Teaching Journal, 61, 311319.
Lyster, R., & Ranta, L. (1997). Corrective feedback and learner uptake: Negotiation of form in communicative classrooms. Studies in Second Language Acquisition, 19, 3766.
Long, M., & Robinson, P. (1998). Focus on form: Theory, research, and practice. In C. Doughty &
J.Williams (Eds.), Focus on form in classroom SLA (pp. 1541). Cambridge: CUP.
MacWhinney, B. (2000). The CHILDES Project: Tools for analysing talk, Volume 1: Transcription format and programs (3rd ed.). Mahwah, NJ: Lawrence Erlbaum Associates.
Malvern, D., & Richards, B. (2002). Investigating accommodation in language proficiency interviews
using a new measure of lexical diversity. Language Testing, 19, 85104.
Marsden, E., Myles, F., Rule, S., & Mitchell, R. (2003). Using CHILDES tools for researching second
language acquisition. In S. Sarangi & T. van Leeuwen (Eds.), Applied Linguistics and Communities of Practice (Vol. 18, pp. 98113). London: BAAL/Continuum.
McLaughlin, B. (1980).Theory and research in second language learning: An emerging paradigm.
Language Learning, 30, 331350.
Meara, P., & Bell, H. (2001). P_Lex: A simple and effective way of describing the lexical characteristics of short L2 texts. Prospect, 16(3), 519.
Mehnert, U. (1998). The effects of different lengths of time for planning on second language performance. Studies in Second Language Acquisition, 20, 83108.
Norris, J., & Ortega, L. (2009). Towards an organic approach to investigating CAF in instructed SLA:
The case of complexity. Applied Linguistics, 30, 555578.
Ortega, L. (1995). The effect of planning on L2 Spanish narratives. Research Note 15, Honolulu, HI:
University of Hawaii Second Language Teaching and Curriculum Center.
Ortega, L. (1999). Planning and focus-on-form in L2 oral performance. Studies in Second Language
Acquisition, 21, 109148.
Ortega, L. (2005). What do learners plan? Learner-driven attention to form during pre-task planning.
In R. Ellis (Ed.), Planning and task performance in a second language (pp. 77109). Amsterdam:
John Benjamins.
Read, J. (2000). Assessing vocabulary. Cambridge: CUP.
Richards, B. & Malvern, D. (2007). Validity and threats to the validity of vocabulary measurement.
In H. Daller, J. Milton, J. Treffers-Daller (Eds.), Modelling and assessing vocabulary knowledge
(pp.7992). Cambridge: CUP.
Robinson, P. (2001). Task complexity, task difficulty, and task production: Exploring interactions in
a componential framework. Applied Linguistics, 22, 2757.
Robinson, P. (2007). Rethinking-for-speaking and L2 task demands: The Cognition Hypothesis, task
classification, and sequencing. Paper presented at the 2nd International Conference on Taskbased Language Teaching, University of Hawaii, September 2007.
Robinson, P., & Gilabert, R. (2007). Task complexity, the Cognition Hypothesis, and second language
learning and performance. International Review of Applied Linguistics, 45, 161176.
Robinson, P., Cadierno, T., & Shirai, Y. (2009). Time and motion: Measuring the effects of the
conceptual demands of tasks on second language speech production. Applied Linguistics, 30,
533544.
Robinson, P. (2011). Second language task complexity, the Cognition Hypothesis, language learning, and performance. In P. Robinson (Ed.), Second language task complexity: Researching
the Cognition Hypothesis of language learning and performance (pp. 338). Amsterdam: John
Benjamins.
Samuda, V. (2001). Guiding relationships between form and meaning during task performance: the
role of the teacher. In Bygate, M., Skehan, P., & Swain, M. (Eds.), Researching pedagogic tasks:
second language learning, teaching, and testing. London: Longman.
Segalowitz, N. (2010). Cognitive bases of second language fluency. London: Routledge.
Skehan, P. (1998). A cognitive approach to language learning. Oxford: OUP.
25
26
Peter Skehan
Skehan, P. (2001). Tasks and language performance. In M. Bygate, P. Skehan, & Swain M. (Eds.),
Researching pedagogic tasks: Second language learning, teaching, and testing (pp. 167185).
London: Longman.
Skehan, P. (2009a). Modelling second language performance: Integrating complexity, accuracy, fluency and lexis. Applied Linguistics, 30, 510532.
Skehan, P. (2009b). Models of speaking and the assessment of second language proficiency. In
A.Benati (Ed.), Issues in second language proficiency (pp. 202215). London: Continuum.
Skehan, P. (2011). Researching tasks: Performance, assessment, pedagogy. Shanghai: Shanghai Foreign
Language Education Press.
Skehan, P. (2013). Nurturing noticing. In J.M. Bergsleithner, S.N. Frota, & J.K. Yoshioka (Eds.),
Noticing and second language acquisition: Studies in honor of Richard Schmidt. (pp. 169180).
Honolulu, HI: National Foreign Language Resource Center.
Skehan, P., & Foster, P. (1997). Task type and task processing conditions as influences on foreign
language performance. Language Teaching Research, 1, 185211.
Skehan, P., & Foster, P. (1999). The influence of task structure and processing conditions on narrative
retellings. Language Learning, 49, 93120.
Skehan P., & Foster, P. (2005). Strategic and on-line planning: The influence of surprise information
and task time on second language performance. In R. Ellis (Ed.), Planning and task performance
in a second language (pp. 193216). Amsterdam: John Benjamins.
Tavakoli, P., & Skehan, P. (2005). Planning, task structure, and performance testing. In R. Ellis
(Ed.), Planning and task performance in a second language (pp. 239273). Amsterdam: John
Benjamins.
Towell, R., Hawkins, R., & Bazergui, N. (1996). The development of fluency in advanced learners of
French. Applied Linguistics, 17, 84115.
Van Lier, L. & Matsuo, N. (2000). Varieties of conversational experience: Looking for learning opportunities. Applied Language Learning, 11, 265288.
Van Patten, B. (1990). Attending to content and form in the input: an experiment in consciousness.
Studies in Second Language Acquisition, 12, 287301.
Van Patten, B. (1996). Input processing and grammar instruction in second language acquisition.
Norwood, NJ: Ablex.
Van den Branden, K. (2006). Task-based language education: From theory to practice. Cambridge:
CUP.
Willis J. (1996). A framework for task-based learning. London: Longman.
chapter 2
On-line time pressure manipulations

L2 speaking performance under five types of planning
and repetition conditions
Zhan Wang
University of Pittsburgh
This chapter is concerned with an investigation of the underlying mechanisms of

second language speaking. It reports on an experiment containing five different
types of planning and repetition conditions, each relevant to certain processes
and stages of speech production. The five conditions were two forms of strategic
planning, two forms of on-line planning, and task repetition. Data were collected
from 77 undergraduates (L1: Chinese and L2: English) doing a video narrative task in
English. Speech samples were transcribed and coded. The study found: (1)strategic
planning improved speech complexity and fluency, suggesting that support for
the conceptualization stage in speech production helps language complexity and
fluency; (2) on-line planning solely focusing on the formulation stage did not
enhance speech complexity and accuracy, whereas on-line planning supported by an
earlier opportunity to watch the video did, indicating a meaning priority principle
in speaking; (3) repetition enhanced speech complexity, fluency, and accuracy,
suggesting that repetition is a robust way to improve speaking quality. Based on the
results, an instructional model of L2 speech intervention is proposed. It argues that
speech monitoring is the key to accuracy. Interventions at the speech formulation
stage, which are often emphasized in pedagogy, work for accuracy only when speakers
are instructed to attend to monitoring.
Introduction
Processes of L1 speaking
Speaking in ones native language is an easy, fluent, and automatic process. According
to Levelts model of first language (L1) speech production (Levelt 1989, 1993, 1999),
information processing in L1 speaking contains the stages of conceptualization,
formulation, and articulation as well as a speech comprehension system by which
28
Zhan Wang
the outputs of each stage can be monitored: the pre-verbal message, the inner speech
plan, and the overt speech plan. L1 speaking to a large extent is an incremental, parallel, and automatic process (Kormos 2006, p. 78).
To say that L1 speaking is incremental means that speech information needs to flow
from Conceptualizer, Formulator, to Articulator for speech to be produced. Through
these processors, information is delivered from a larger unit in the hierarchy first to
intermediate, and then to even smaller, subunits (Lashley 1951; Meyer & Gordon
1985), which is the opposite process to word recognition and reading (i.e.decoding
the printed material to meaning and from smaller units such as phonemes, syllables,
and words, to larger units such as the meaning of sentences and discourse). For example, when a large unit of speech intention has been formed, the brain searches for a
mapping between the pre-verbal message and specific target lexical nodes based on
the relevant semantic and syntactic information in the lexicon the process of lexical selection (Levelt 2001). Then the activated lexical nodes (e.g. play-verb) get morphologically encoded (e.g. played-past tense verb) and are passed to the phonological
encoder for the activation of phonological and phonetic encoding (e.g. /pleid/) (Levelt,
Roelofs & Meyer 1999; Schriefers, Meyer & Levelt 1990). Finally, through the motor
action from the Articulator, the small units of each phonetic word are articulated
what we perceive as speech.
L1 speech production is also a parallel and highly automatic process, two important aspects of fluent L1 speaking. In Levelts model, for example, as soon as the Conceptualizer passes information to the Formulator, the Conceptualizer starts to work
on the next piece of information regardless of the fact that the last piece of information is still being processed by the Formulator (Kempen & Hoenkamp 1987; cited
in Kormos 2006, p. 8). This efficient parallel processing is based on the claim that
conceptualization in speech production is likely to be the only controlled process that
requires conscious awareness. Both Formulator and Articulator to a great extent work
automatically without much conscious awareness1 (Levelt 1989, p. 20). Researchers
may not have consensus on exactly how morphological transformations are computed
when retrieving linguistic forms. For example, researchers who argue for statistical learning rules (Seidenberg 1997) posited that low frequency forms are less easily
retrieved from mental lexicon or mental grammar. That is why sometimes speakers
hesitate about the precise phrasing to be used. However, as to the high frequency L1
. Although articulation in L1 speaking is largely regarded as automatic motor processing
without conscious awareness, it may still involve conscious awareness at the time course around
the articulation motor movement. Levelt addressed this issue by assuming a monitoring process
along the time course of speech production. He assumed that because of the self-monitoring of
speech production, speakers sometimes can detect speech errors prior to articulation and repair
them (Levelt 1983).
forms that speakers encounter, the morpho-phonological formulation as well as their

articulation are largely automatic. Considering the large proportion of high frequency
forms in our daily speech, the phases of formulation and articulation can be considered as being highly automatic, which makes L1 speech production fast and efficient.
In terms of the controlled and automatic processing of language, Ullman (2001a,
2001b, 2004, 2005, 2008) distinguishes between declarative memory and procedural
memory (both of them belonging to the long-term memory system). Declarative
memory concerns the learning, representation, and use of knowledge about facts and
events. This type of knowledge is partly explicitly learned and remembered that is,
it requires conscious awareness. Procedural memory concerns the control of longestablished, motor and cognitive skills or habits (Ullman 2001b, p. 106). This part of
knowledge is implicitly learned and remembered that is, without conscious awareness. In L1 speaking, the conceptualization of non-compositional forms from lexical
memory relies on declarative memory; the formulation of compositional forms and
rules (e.g. regular forms such as ed in past tense) which make up the mental grammar relies on procedural memory. The articulation of speech relies on procedural
memory too because pronunciation can be regarded as essentially a motor skill.
A similar distinction can be found in other cognitive psychological theories such
as the distinction between declarative and procedural knowledge in ACT-R (Anderson
1983; Anderson et al. 2004; Anderson & Lebiere 1998; Squire 1987) and the distinction
between controlled and automatic processing (Schneider & Chein 2003; Shiffrin &
Schneider 1977). All of these help explain how different types of language mechanism
in the human brain can work together coherently to produce L1 speech.
Since the above three stages of L1 speaking are all relevant for understanding L2
speech processing, we present an illustration of parallel processing in L1 speaking
(Table 1). The phases involving automatic processing that makes L1 speaking quick
and efficient are highlighted.
Table 1. Illustration of parallel processing in L1 speaking
L1 speech processor
Tan
Tn+1
Tn+2
Conceptualizer
ICbn
ICn+1
ICn+2
Formulator
ICn
ICn+1
Articulator
ICn
a
b
Tn+3
ICn+3
ICn+2
ICn+1
Memory access
Processing
Declarative
Controlled
Procedural
Automatic
Procedural
Automatic
T: Time
IC: Information Chunk
Table 1 clarifies how the three processors in speech production work in an

incremental, parallel, and automatic way as we discussed above. In the table, we
are o
bserving information processing in the brain starting from Time n (Tn). If we
29
30
Zhan Wang
s uppose our technology can distinguish brain activity from the three processors (i.e.
Conceptualizer, Formulator, and Articulator), at Tn, Information Chunk n (ICn) is
processed through the Conceptualizer first. When ICn completes processing at the
Conceptualizer stage, it enters into the Formulator for morpho-phonological processing at Tn+1. This demonstrates the feature of incremental processing. If we only observe
a time point at Tn+2, we find that none of the processors is at rest. Conceptualizer,
Formulator, and Articulator are working simultaneously in parallel. This remarkable
capacity for parallel processing ability is attributed to the largely automatic nature of
processing at the Formulator and the Articulator phases. These incremental, parallel,
and automatic features make L1 speaking continuous, fast, and efficient (Levelt 1989,
1993, 1999; Kormos 2006).
L2 speaking processing and time pressure

Unlike L1 speaking, which is largely regarded as based upon easy, fluent and automatic
processes, L2 speakers have at least the following four sources of problem in communication, as argued by Drnyei and Scott (1997) and cited in Kormos (2006):
1.
2.
3.
4.
resource deficits,
processing time pressure,
perceived deficiencies in their own language output, and
perceived deficiencies in decoding the interlocutors message.
It seems that processing time pressure is one of the bottlenecks in L2 speaking given
that our working memory capacity is regarded as being limited (Baddeley 1986, 2003).
The bottleneck is likely to impede language learning and development. Table 2 illustrates why L2 speakers are experiencing real-time pressure during speaking. The
phases that have differences from L1 processing and make L2 speaking less quick and
efficient are highlighted.
Table 2. Illustration of the bottleneck of time pressure in L2 speaking
L2 speech processor
Tan
Tn+1
Tn+2
Tn+3
Memory access
Processing
Conceptualizer
ICbn
ICn+1
ICn+2
ICn+3
Declarative
Controlled
Formulator
ICn
ICn+1
ICn+2
Declarative
Controlled
Articulator
ICn
ICn+1
Procedural
Automatic
a
b
T: Time
IC: Information Chunk
In comparison to L1 speakers, L2 speakers generally have more time pressure

at both the conceptualization and formulation stages (though L1 speakers to some
degree have time pressure at conceptualization too). Beginning L2 learners have more
time pressure than their L1 peers in the compilation of articulatory motor programs
too. However, for intermediate and advanced L2 learners, since they have acquired
the basic acoustic templates of the target language phonetics, the articulation of the
encoded phonetic plans is likely to implicate automatic motor programming, which
requires little conscious awareness.2
Researchers often regard conceptualization as being the least different between
L1 and L2 speakers. However, for processing L2 forms either learned (but not proceduralized) or missing in the L2 mental lexicon, L2 speakers are generally slower than
their native peers. First of all, Kroll and Stewart (1994) have argued in their Revised
Hierarchical Model (RHM) that there is no direct link between concepts and the L2
initially, but a link begins to be established gradually when L2 proficiency increases.
They also argue that for low proficiency L2 learners, the link between concept and
its L2 representation is often assisted by their native language. This implies that in
contrast to the direct link between meaning and L1 representations, a longer time is
required to access and retrieve target L2 lemmas from the mental lexicon so that on
its own it is less efficient in processing the L2 than the L1. Therefore, L2 speakers are
likely to need more time to retrieve L2 words in real time speech, even for items that
have been learned.
Second, due to resource deficits in the L2 (Drnyei & Scott 1997; Kormos 2006),
ideas which have been conceived so that they can be expressed in real-time speech
sometimes encounter gaps in the L2 mental lexicon. As a result, L2 speakers generally
are more tentative in the conceptualization of ideas. They have developed strategies
to make the best match between what to conceive that they are able to express and
what to compromise about due to the lack of sufficient L2 resources at hand the
issues of cognitive comparison and selective attention (Doughty 2001). Therefore,
conceptualization for L2 speakers is a bi-directional3 process involving revision and
re-conceptualization (i.e. finding alternative ways to express thoughts) in order to
match speech with the L2 resources available.
At the stage of formulation, L1 speakers rely on automatic processing in most
instances to encode morphological and phonological information, which makes
speech production easy and fast once ideas are conceptualized. However, with late L2
. Similar to L1 speech processing, though motor programming of articulation to a large extent
is an automatic process, it still involves conscious awareness around the time course of the motor
action.
. L1 speaking at the stratum of conceptualization may have a bi-directional connection
between concepts and lemmas too, as argued by Roelof s WEAVER network (Levelt et al. 1999;
Roelof 1997) this is not due to the lack of linguistic resources but due to the change of speech
plan.
31
32
Zhan Wang
learners, grammatical rules whose computation depends upon procedural m

emory in
L1 are posited to be more dependent upon declarative memory in the L2 due to the
lack of brain plasticity after the close of the critical period (Ullman 2008). L2 learners
may have to memorize or construct utterances through explicit rules from declarative
memory. Therefore, the formulation of L2 syntactic, morphological, and phonological
forms during speech production is less of an automatic and more of a controlled process that requires attentional resources. Because of this difference, L2 speakers need
more time to compose morpho-syntactic and phonological information from the
mental lexicon; however the real-time nature of any speech production does not readily allow extra time for processing, so L2 speakers often feel short of time in speech
production.
L2 speaking intervention targeting the bottleneck of time pressure

In the previous section I clarified how time pressure affects L2 speech and impedes
learners from retrieving newly learned but not yet automatized linguistic structures.
Therefore, effective interventions that focus on the bottleneck of time pressure during speech should be explored. This study is aimed at examining such interventions.
They should be able to provide learners with opportunities to overcome time pressure by giving them practice in accessing relevant linguistic knowledge as effectively
as possible, using forms and functions, and proceduralizing them in long term
memory.
The past two decades have witnessed a transition of both research interest and
classroom practice from communicative language teaching to task-based language
teaching approaches (Bygate & Samuda 2005; Robinson 2011; Skehan, Bei, Li & Wang
2012). Using tasks in language learning and teaching involves solving communication
problems as well as real-world language use in which meaning is primary (see the
definition of tasks in Ellis 2003; Samuda & Bygate 2008; Skehan 1998). For our study,
using a task-based approach as a medium to explore various interventions that focus
on time-pressure reduction has at least the following two advantages: (1) it provides
evidence from the discourse level, which resembles real-time speaking activities; (2)it
connects speech processing conditions with speech product, with language performance seen in terms of certain standardized measures such as complexity, accuracy
and fluency (CAF) (Ellis 2009; Housen & Kuiken 2009; Norris & Ortega 2009; Skehan
2009a) which will provide evidence for evaluating the potential interventions that
can be used in language learning classrooms.
Researchers have already found that giving time to plan when doing tasks may
provide opportunities for learners to notice the gap (Schimidt 1990) between task
demands and linguistic resources, and then strategically to allocate attentional
resources to focus on form (Long & Robinson 1998) so as to increase accuracy,
fluency, and complexity in speech production (Skehan 1998). Connecting with

psycholinguistic models of speech production, particularly the Levelt (1989) model,
Ellis (2005) proposed a framework of task-based planning, which primarily distinguishes two types of planning according to when the planning occurs: pre-task
planning and within-task planning. Pre-task planning is conducted before learners
perform a task. Within-task planning is conducted on-line while learners are performing a task. P
re-task planning can be further divided into two forms: rehearsal
and strategic p
lanning the difference being whether learners have opportunity
to actually perform the complete task as a preparation, that is, rehearsal; or are
only allowed time to consider the content and expressions to be encoded before
speaking on the task in their working memory, that is, strategic planning. Withintask planning has two (may be extreme) forms too: one is pressured (i.e. having a
limited time for on-line planning) and the other is unpressured (i.e. having unlimited time to perform a task). These distinctions in the Ellis framework provide the
inspiration for the current study which explores the links between different stages
of speech production and effective interventions regarding how time pressure is
handled.
Based on Ellis model, this study consists of a wide-ranging experiment c omparing
five time pressure reduction conditions (i.e. experimental conditions) with a control
condition. The control condition is a baseline designed to involve the least possible
planning opportunity either before or within the task. In contrast, the five experimental conditions provide certain degrees of planning opportunities to reduce time
pressure at a specific phase of speech production: conceptualization, formulation, and
articulation, or a combination of the phases. The comparison between each of the
experimental condition with the control condition can help us examine the effect of
time pressure reductions for pedagogical purposes. The conditions and the proposed
locations of time pressure reduction with reference to the Levelt speech production
model are outlined in Table 3. The table also presents their hypothesised effects on L2
speaking performance as well as some of the evidence in the literature that lends support to these hypotheses.
Condition one is the control condition, and gives no planning and no p
re-watching
opportunities for speakers. In its operationalization, a speaker was sitting in front of
a computer. The researcher asked the speaker to narrate while watching a normally
played video and the speaker had no opportunity for pre-watching the video or knowing the story. The speaker had to narrate in response to what he/she was watching, similar to a sports commentator broadcasting in a live sporting event. It is a challenging
condition because the speaker, without knowing the story content before speaking, was
telling the story merely guided by watching the on-going frames of the video. From the
linguistic perspective, borrowing the terms from Bygate and Samuda(2005), it is hard
33
34
Zhan Wang
Table 3. Control and experimental conditions

Condition Operationalization Targeted
location#
Hypotheses^
Evidence from the

literature
1 Control
watch+tell
2 Watched
watch, watch+tell
Conceptualizer
C, F
Skehan & Foster 1999
3 On-line
Planning
watch+tell
(slowed video)
Formulator
A, F ()
4 Watched
On-line
Planning
watch, watch+tell
(slowed video)
Conceptualizer+
Formulator
(on-line)
C, A, F ()
Ahmadian &
Tavakoli 2011;
Ellis 1987; Hulstijn &
Hulstijn 1984;
Yuan & Ellis 2003
5 Watched
Strategic
Planning
watch, planning,
watch+tell
Conceptualizer + C, A, F
Formulator
(strategic
problem-solving)
Guar-Tavares 2008;
Mehnert 1998; Ortega
1999; Skehan & Foster
1997; Tajima 2003
Conceptualizer + C, A, F
Formulator +
Articulator
Bygate 1996, 1999, 2001;

De Jong & Perfetti 2011;
Gass et al. 1999; Lynch &
McLean 2000, 2001
6 Repetition watch+tell,
watch+tell
# Location: The speech production stage(s) that the time pressure reduction intervention targets
^ Hypotheses:
C: increasing complexity
F: increasing fluency
F (): decreasing fluency
A: increasing accuracy
for speakers to either take the perspectives of either the speakers or the presumed
listeners attitude, or to preview, that is, consider the background and foreground of
what is happening and what is about to happen these are two important linguistic
resources that lead to framing a narration toward discourse coherence (Bygate &
Samuda 2005: p. 48). Having the least possible planning time (before and within the
task) this condition is hypothesised as being the hardest among all the conditions
involving what Ellis (2005) described as pressured online planning.
The Watched condition (Condition 2) allows speakers to watch the video before
narrating the story. Watching is a type of pre-task activity (Skehan & Foster 1999),
which provides exposure to the task material before the real task performance.
This manipulation is intended to reduce time pressure at speech conceptualization
by exposing to speakers the story content first. In this way, a greater proportion of
attention becomes available to attend to formulation (and to focus on form) while
the speaker is performing the task, and his/her speech performance can therefore be
enhanced. Few studies have explored the effect of pre-watching. Skehan and Foster
(1999), as an exception, compared the effects of having and not having a pre-watching
pportunity. However, contrary to its theoretical assumptions, their study did not find
o
significant results on the CAF measures. A similar line of research that involves easing
speech conceptualization may be strategic planning. For example, Pang and Skehan
(this volume) in a qualitative study found that during strategic planning, learners
reported that they selectively prepared for speech content or language forms. The literature on strategic planning (see Ellis 2003, 2008, 2009 for reviews) generally shows
that having the chance to plan before speaking helps raise speech fluency and syntactic complexity4 (Crookes 1989; Foster 1996; Foster & Skehan 1996, 1999; Skehan &
Foster 2005; Tavakoli & Skehan 2005; Wendel 1997; Yuan & Ellis 2003). Therefore, this
study hypothesizes that watching has a similar effect to strategic planning and can help
learners produce more complex and more fluent speech.
The Online Planning condition (Condition 3), is the same as the control condition regarding no pre-watching opportunity, but is different from the control condition as it provides extra time for speakers to conduct on-line planning. The video
used for narration is of the same content for all the conditions, except that the video
for the two online planning conditions (i.e. Online Planning and Watched Online
Planning) was edited to be played slowly so that extra time was created for conducting on-line planning (see Methodology). By comparing Condition 3 with the control
condition, it is hypothesised that the extra time for conducting online planning can
help learners attend to morpho-syntactic formulation so as to increase speech accuracy but speech fluency especially regarding speech rate (Yuan & Ellis 2003) is affected
by the detrimental effect of on-line planning. This condition leads to the question:
because the Online Planning condition does not allow pre-watching, when provided
with extra time for on-line planning, can learners successfully take the opportunity to
focus on form? That is, can they successfully divide their attention between linguistic formulation and story conceptualization simultaneously when both pressures are
high? A p
ossible answer can be offered by comparing the result of this condition with
that of the Watched Online Planning Condition (Condition 4).
The Watched Online Planning condition (Condition 4) provides the opportunity
for pre-watching as well as the extra time for speakers to conduct on-line planning while
narrating. It is likely that having the chance to watch a video before speaking and then
benefitting from additional time when speaking (so that on-line planning is facilitated)
will reduce time pressure at both the conceptualization and the formulation phases so
that learners can speak with higher accuracy and complexity (Yuan & Ellis 2003). This
. There have also been studies which have found positive effects of pre-task planning on
accuracy, but these studies were either in a language testing context (Tavakoli & Skehan 2005;
Wigglesworth 1997) or provided planning time as long as 10 minutes and allowed note-taking,
which are very different from the watching condition in this study. (Guar-Tavares 2008; Ortega
1999; Skehan & Foster 1997; Tajima 2003).
35
36
Zhan Wang
prediction is based on the on-line planning literature. Most of the studies in the literature used narrative tasks in which the content of the narration was already known to the
speakers (Ahmadian & Tavakoli 2011; Ellis 1987; Yuan & Ellis 2003), which is similar
to having watched the video in this task condition. These studies found that providing
on-line planning to the speakers who has already known the story content had positive
effects on speech accuracy and complexity. Therefore, it is hypothesized that Watched
Online Planning can help increase complexity and accuracy in learners speech performance. Meanwhile, due to the time used for careful formulation, it will also have a
detrimental effect on speech fluency, especially on speech rate.
The Watched Strategic Planning condition (Condition 5) provides an opportunity for pre-watching as well as extra time (3 minutes) for speakers to conduct strategic
planning before speaking. It is a reinforcement of the Watched condition (Condition
2). Through the provision of the pre-watching opportunity, conceptualization pressure is likely to be reduced. Then having extra time before narrating, learners can use
the strategic planning time either to prepare for expressing the content of the story or
to search for solutions to certain anticipated linguistic problems (see Pang & Skehan
this volume). This condition is different from the two on-line planning conditions, in
which the extra time was inserted into the video so that speakers are more likely to
use the time resource on-line to deal with immediate linguistic problems. However,
if the speakers know that the retrieval of the unfamiliar lexis may take a longer time
than the real time speaking situation could tolerate, they are likely to stop searching
for it o
n-line and switch to their familiar ways of expression that might be a constraint of the two online planning conditions. In contrast, strategic planning affords
speakers time to search for unfamiliar lexis and expression before speaking. Therefore, in this condition, having both a pre-watching opportunity and strategic planning
time, Watched Strategic Planning could have a similar effect to those task conditions
involving adequate pre-task planning time such as 10 minutes (Guar-Tavares 2008;
Mehnert 1998; Ortega 1999; Skehan & Foster 1997; Tajima 2003). Based on the results
of these studies, it is hypothesized that Watched Strategic Planning will result in
enhanced speech performance in complexity, accuracy, and fluency.
The Repetition condition (Condition 6) uses immediate task repetition (i.e. a
kind of rehearsal) as an intervention to investigate time pressure reduction in the
complete process of speech production (i.e. conceptualization, formulation, and
articulation). Speakers were not told that they would carry out the same task again
until they had finished speaking on the task for the first time (so they themselves did
not think of their first encounter with the task as a rehearsal). Researchers have studied task repetition in various forms, but most of the studies have examined task repetition after several days or weeks interval. For example, Bygate (1996) found fluency
and accuracy effects in a repetition task after a 3 day interval. Bygate (2001) found
increased speech fluency and complexity in a repetition task after 10 weeks. Gass,
Mackey, lvarez-Torres and Fernndez-Garca (1999) found that the group which
repeated the same task after a 23 day interval on two occasions outperformed the
non-repetition group regarding a general proficiency rating, partial accuracy of the
Spanish structure to be, morphosyntax, lexical density (i.e. type token ratio) and lexical sophistication (i.e. the number of difficult words used). The study by Lynch and
McLean (2000), however, involves a speaking condition similar to immediate task
repetition. They found that repeatedly making a presentation of a poster to different interlocutors six times resulted in improved accuracy and fluency. More recently,
De Jong and Perfetti (2011), using a 4-3-2 repetition task (i.e. repeating a topic for
4, 3, and 2 minutes) as the training method, found that the group which repeated
the same topic three times had significantly higher fluency in a post-test one week
later than the group that spoke on three different topics each time. Although these
studies involve various forms of task repetition, generally they lead to the claim that
task repetition is likely to be a robust condition to enhance L2 speakers speaking fluency, complexity and accuracy. The reason, as Bygate and Samuda (2005) explained,
is that during the re-run of the task, speakers are likely to build on the knowledge and
performance of the first enactment so that both speaking processing and language
product can be impacted by task repetition (p. 45). Therefore it is hypothesized that
immediate task repetition has the advantage of time pressure reduction in conceptualization, formulation, and articulation, and L2 speakers speech complexity, accuracy, and fluency will be increased.
To summarize, the five experimental planning conditions proposed in this study are
connected with stages in speaking production and speaking performance their comparisons with the non-intervention control condition may reveal the specific impact of
each time pressure reduction on L2 speaking performance. They may also reveal the
underlying mechanisms of L2 speech production with reference to processing stages.
Research Questions
This study is guided by the following five research questions:
1. Does the Watched condition (Condition 2), which targets time pressure reduction
at the Conceptualizer stage, result in significantly more complex and more fluent
speech in comparison to the control condition (Condition 1)?
2. Does the On-line Planning condition (Condition 3), which targets time pressure
reduction at the formulator stage, result in significantly more accurate but less
fluent speech in comparison to the control condition (Condition 1)?
3. Does the Watched On-line Planning condition (Condition 4), which targets
time pressure reduction at both the Conceptualizer and the Formulator (through
on-line planning) stages, result in significantly more complex and more accurate
but less fluent speech in comparison to the control condition (Condition 1)?
37
38
Zhan Wang
4. Does the Watched Strategic Planning condition (Condition 5), which targets
time pressure reduction at both the Conceptualizer and the Formulator (through
strategic planning) stages, result in significantly more complex, more accurate,
and more fluent speech in comparison to the control condition (Condition 1)?
5. Does the Repetition condition (Condition 6), which targets time pressure reduction at the complete process of speech production (i.e. conceptualization, formulation, and articulation), result in significantly more complex, more accurate, and
more fluent speech in comparison to the control condition (Condition 1)?
Method
Participants
The 77 participants (50 females, 27 males) in this study were undergraduates in different majors at a university in Hong Kong, aged from 18 to 22. They were native Chinese
speakers who learned English as a second language. None of them had overseas experience of more than 3 months. They were recruited on a voluntary basis and a time
compensation fee was provided after they completed the tasks. Data were collected
through one-to-one meetings with participants. Once a student arrived for the data
collection, a pre-test was administered (see below) and according to the pre-test score,
the participants were assigned to one of the task conditions. The researcher made the
grouping decisions with the intention of achieving the balance of pre-test (primarily),
gender, year of study, and major of study across groups. The participants English proficiency, as self-reported, ranged from TOEFL 540 to 630 and IELTS 6 to 7.5 (with the
speaking subset ranging from 5.5 to 7.5).
English proficiency pre-test

A pre-test was administered to the participants on the same day as the main study. The
pre-test used a version of the TOEFL Listening subtest (extracted from Hinkel 2004).
A listening test was used because TOEFL listeningwas reported to be a strong indicator ofthe general English proficiency measured by TOEFL exams (Sawaki et al. 2009).
Listening also involves a relatively similar process to speaking, especially regarding the
degree of on-line resource generation for problem solving (Yuan & Ellis 2003, p.9). Participants were allocated to different speaking conditions to balance English proficiency.
Material
Drawing on Skehan and Foster (1999, 2005), we used two videos from the Mr. Bean
series to elicit speaking performance (the content of the two video stories is presented
in Appendix A). Using two videos can avoid some task irrelevant variables such as
learners being unfamiliar with the narrative task. The presentation sequence of the two
videos was counter-balanced, and the study reports the mean scores of the two video
performances as the results. The same videos were used in all conditions. The only difference across the conditions was that there were two playing rates: a slow speed for
conditions involving on-line planning, and a normal speed for all other conditions.
Details about task duration are discussed in Table 4. The Mr. Bean video series are
appropriate for narrative retelling because each episode is short, largely mimed, easy
to comprehend, and appealing (Skehan & Foster 1999). The videos were piloted for its
comprehensibility and cultural understanding.
Slowed video for on-line planning

The operationalization for the slowed video as the basis for on-line planning narratives
is inspired by tempo-naming tasks and response deadline tasks (Kello 2004; Kello &
Plaut 2000; Kello, Plaut & MacWhinney 2000; Kello & Plaut 2003 based on a computational model). These tasks investigated the relation between rate of processing and
control of processing, as well as the underlying mechanisms that are revealed in on-line
reading or speaking. With a visually and audially manipulated tempo, it is claimed that
people have the natural ability to synchronise behavior to an external rhythm (Kello&
Plaut 2003, p. 210). The results showed that readers and speakers responses can be
well-timed with rhythm, and the fastest tempo interval drove response latencies faster
and also induced a speed/accuracy trade-off (called premature processing). This rationale as to how external rhythm can shape the control of processing is in line with the
on-line planning definition moment by moment preparation during speech (Yuan &
Ellis 2003) and careful control of processing during speech production.
To obtain the slowed version of the video for the two conditions involving on-line
planning (i.e. Online Planning and Watched Online Planning), the author piloted
three different rates of playing with a group of participants with a similar background
Table 4. Description of speaking conditions (Independent variables)
Speaking conditions
Pre-watching
(5 min)
On-line
planning
(3 min#)
Strategic
planning
(3 min)
1 Control
2 Watched
10
13
13
Repetition
10
3 Online Planning#
4 Watched Online
Planning#
5 Watched Strategic Planning

6 Repetition
Task time
(min)
# Condition 3 Online Planning and Condition 4 Watched Online Planning used a slow version of video
that made a 5 minutes normal video become 8 minutes long so that it allows 3 minutes implicitly as on-line
planning time.
39
40
Zhan Wang
and proficiency level to the main study participants. The three pilot versions were: 50%
of the normal speed (making a normal 5 minutes video 10 minutes long), 60 % of the
normal speed (8 minutes), and 75% of the normal speed (6.5 minutes). These slowed
versions were made by using the Adobe Premiere video compilation software to
edit the videos to be played consistently slower throughout the video (with no pauses
manually inserted). Based on the pilot participants feedback, 60% of the normal speed
was selected as the experimental version of on-line planning video because it was not
so slow for the manipulation to be recognized. 50% of normal speed, in contrast, was
recognized as artificial (even though it would allow more on-line planning time).
Therefore the 60% speed was selected, which made a normal 5 minutes video become
8 minutes long. In other words, an extra 3 minutes was available to facilitate on-line
planning time implicitly while the video was running.
Task conditions and instructions

All the speaking conditions were conducted in the Here-and-now mode speakers narrate a story while watching the video simultaneously. As shown in Table 4 below, Control
(No Pre-watching, No Planning) is the baseline performance condition in which participants start to narrate the story without pre-watching and without planning. The Watched
condition allows speakers to watch the movie silently once before narration. The On-line
Planning condition is supposed to provide participants on-line planning time by speaking
to the slowed version of video as explained earlier. The Watched On-line Planning condition allows participants to watch the normal speed video once silently before speaking
to the slowed on-line planning video. In Watched Strategic Planning, speakers watch the
video once silently, then have an additional 3 minutes time for doing strategic planning,
and after that speak to the normal speed video. In the Repetition condition, learners watch
and narrate the story simultaneously once, and then watch and narrate the story simultaneously again. Table 4 lists the total task completion time for each condition.
To help participants become familiar with the speaking conditions, a page of task
instructions in both English and Chinese was provided (Appendix B). The participants
comprehension of the instructions was checked through a few questions and sample
practice with the researcher. All the communication around the instructions was conducted in Chinese. After a participant was clear about the data collection procedure, the
researcher started to play the video. At the beginning of the video, the same instructions
appeared again on the screen. Shortly after that, the computer told the speaker to get ready
to speak in 10 seconds, and then the task started and task performance was recorded.
Measures of speaking performance

Since the overarching aim of this study is to examine whether each proposed time
pressure reduction has a positive effect on L2 speaking performance, the measures
of speaking performance are important. Researchers generally regard speaking

performance as multi-componential and consisting of at least the following dimensions: syntactic complexity, accuracy, fluency, and lexis (Ellis 2003; Ellis & Barkhuizen
2005; Norris & Ortega 2009; Skehan 1998, 2009a). Indicators of these performance
aspects have been widely used in the TBLT literature (see Housen & Kuiken 2009 for a
discussion). Table 5 lists the performance measures used in this study.
Table 5. Measures of speaking performance (Dependent variables)5 6 7 8
Components
Measures
Descriptions
Fluency (speed)5
1 Speech_Rate6
Th
e number of words per minute for a
speech sample.
Fluency
(breakdown)
2 AS_End_Pause
The average length of pauses at the end of

AS units.
3 AS_Mid_Pause
The average length of pauses in the

middle of AS units.
Fluency (repair)
4 Reformulation
The number of strings in a speech sample

that are repeated with some modifications
to syntax, morphology, or word order, etc.
Complexity
5 Total_Words
Total number of words in a speech

sample.
6 ML_AS
The average number of morphemes per

AS unit.
7 Subordination
Total number of subordination clauses

and verb infinitives divided by total AS
units (Foster, Tonkyn & Wigglesworth
2000).
8 EF_Clause
Total number of error free clauses (which

has no error in syntax, morphology, or
word order, etc.) in a speech sample.
9 EF_Clause_Rate
Total number of error free clauses divided

by total number of clauses.
10D
Adjusted type token ratio8 (Malvern &

Richards 2002)
Accuracy
Lexical Diversity7
. Following Skehan (2009a), we divided fluency into three components: speed, breakdown, and
repair.
. Following Yuan and Ellis (2003) we used pruned words the words that were repeated, reformulated and reduced were excluded from the calculation.
. Here we follow the CLAN manual in calling D lexical diversity (MacWhinney 2000).
. The type token ratio is the total number of different words divided by total number of words.
41
42
Zhan Wang
Coding
The 90 speech samples collected (See Table 6 for the sample size of each condition)
were transcribed and coded following CHAT format (MacWhinney 2000) and Taskprofile conventions (Skehan, Chapter 1, this volume). The basic segmentation of
units for analysis was AS Units (Foster, Tonkyn & Wigglesworth 2000). Codes such
as measures of language complexity (e.g. subordination) and language accuracy (e.g.
error-free clauses) were computed by Taskprofile except for lexical diversity, which
was computed by the command VOCD in CLAN software (MacWhinney 2000;
Malvern& Richards 2002; Richards & Malvern 1998).
Analysis
In view of the large number of dependent variables (i.e. speech performance measures), five MANOVAs were conducted to analyze each experimental condition (i.e.
Watched, On-line Planning, Watched On-line Planning, Watched Strategic Planning, and Repetition) in comparison with the control condition. Statistical significance is assessed relative to the two-tailed a priori alpha level of 0.05 (p < .05) for all
the measures. In each MANOVA, the dependent variables were the 10 performance
measures as listed in Table 5, and the independent variables were two speaking conditions the various experimental speaking conditions in comparison with the control
condition.
Before performing the MANOVAs, multivariate normality was examined. The
normal distribution of every dependent variable was examined and all variables that
deviated from normality (p < .001) were transformed into normal distribution using
a logarithm transformation. Standardized skewness and kurtosis were set within the
range of (2, 2). Cohens d (Cohen 1988, 1992, 1994) was used in this study as the form
of effect size measures. Cohens d is based on the concept of standardized mean difference of a contrast (e.g. the difference between mean scores of a control condition and
an experiment), which is easy to comprehend and consistent with Norris and Ortegas
(2000) meta-analysis of L2 instruction.
In such a study containing 5 MANOVAs, where each MANOVA involves 2 speaking conditions as the independent variables and 10 performance measures as the
dependent variables, a mini meta-analysis based on effect size comparisons is ideal for
comparing the effect of each experimental condition. Confidence intervals, as mentioned by Norris and Ortega (2000), gauge the statistical trustworthiness of observed
effects (Rosenthal 1991). Therefore, 95% confidence intervals (CI95) around each
mean effect size were computed. A confidence interval at 95% in a population can be
interpreted as claiming the effect as likely 95% of the time.
Results
Pre-test
Two participants who scored 49 and 50 (out of 50) were excluded from the data analysis since their scores were too close to the ceiling. Another participant who dropped
out after completing the first speaking task was excluded too. Table 6 below presents
the means and standard deviations (SD) for the pre-test. An ANOVA showed that
there was no significant difference among the groups in terms of pre-test scores (at
p<.05). The Control/ Repetition group represents a within-subjects design, in which
the same group of participants produced speech samples twice, once for the baseline
performance as Control samples, and the other as Repetition samples.
Table 6. Descriptive statistics for pre-test listening scores
Group
Mean
SD
Min
Max
Control/Repetition*
13
35.8
5.3
28
45
Watched
16
37.1
5.7
26
45
Online Planning
16
37.1
5.6
26
46
Watched Online Planning
16
37.4
6.1
26
47
Watched Strategic Planning
16
36.8
6.3
24
48
All
77
36.9
5.7
24
48
*All the conditions are a between-subjects design except for the Control/
Repetition conditions.
Speaking conditions
The results of descriptive and inferential statistics are given in Table 7. Since tests of
statistical significance can be greatly affected by sample size, a very weak effect can be
statistically significant while a strong effect can fail to attain significance (Cortina &
Nouri 2000; Hunter & Schmidt 1990; Meehl 1990) if sample sizes are large or small
respectively. However, effect size, which is the standardized index of the magnitude of
an effect, makes it possible to compare the effects of different variables within a given
study or to compare the effects of the same variable across different studies (Cortina &
Nouri 2000: p. 8). It is also unbiased with regard to the scale of the measurement the
researcher used as well as the standard errors of the dependent variables. Therefore,
the effect size of each contrast being significant or not is given in Table 8. The results
in Table 8 serve as a meta-analysis which synthesizes the effects of all the 5 types of
planning conditions relative to the control condition.
43
44
Zhan Wang
Table 7. Descriptive and inferential statistics for speaking conditions

Control
group
Watched
Online
planning
Measures
N = 13
N = 16
N = 16
1. Speech_Rate
61.50
76.81**
58.64
(15.17)
2. AS_End_Pause#
3.
AS_Mid_Pause#
4. Reformulation#
6. ML_AS
7. Subordination
8. EF_Clause
9. EF_Clause_Rate
10. Lexical_DiversityD
Watched
strategic
planning
N = 16
Repetition
61.75
77.22*
75.36***
(18.41)
(17.24)
(18.78)
3.04
2.05*
4.01
4.54
2.24
(1.18)
(1.20)
(1.88)
(2.87)
(1.54)
N = 13
(15.06)
1.99***
(0.90)
0.93
0.82
1.25
1.01
0.74
0.81
(0.52)
(0.58)
(0.62)
(0.60)
(0.49)
(0.38)
5.89
8.32
8.10
10.66*
8.04
7.04
(3.71)
5. Total_Words
(13.39)
Watched
online
planning
N = 16
(5.73)
(5.17)
(6.76)
(4.99)
(3.52)
307.50
384.01**
469.10***
494.00***
386.10*
376.77***
(75.84)
(66.92)
(147.23)
(137.95)
(93.90)
(75.30)
9.03
10.45*
9.42
10.73*
10.55*
(1.87)
(1.78)
(1.84)
(2.52)
(1.92)
1.62**
1.41
1.57*
1.48
(0.16)
(0.22)
(0.18)
(0.24)
(0.25)
(0.24)
22.81
34.81*
39.38**
49.79***
35.78*
31.39***
(9.41)
(14.46)
(18.05)
0.40
0.53
0.48
0.56*
0.52
(0.16)
(0.22)
(0.17)
(0.22)
(0.19)
(0.19)
42.93
42.85
48.31
44.68
42.90
44.82
(10.84)
(10.74)
(9.56)
(7.21)
(8.48)
(8.42)
(26.52)
1.64**
9.75**
(2.10)
(17.11)
1.58***
(13.14)
0.45**
#Scores have been transformed in order to obtain a normal distribution.

Note: The table presents all the mean scores of the dependent variables (i.e. performance measures).
Standard deviation is given in parentheses. The results of the five MANOVAs comparing each of the
experimental condition with the control condition (using Pillais trace) are: Watched vs Control:
F(28)=3.12, p < .05; Online Planning vs Control: F(28) = 4.53E6, p < .001; Watched Online Planning vs
Control: F(28) = 6.09E6, p < .001; Watched Strategic Planning vs Control: F(28) = 3, p < .05; Repetition
vs Control: F(25) = 37.14, p < .01. Because all the MANOVAs are significant, ANOVAs were followed to
locate the significant contrast (*p < .05; **p < .01, ***p < .001).
From Table 8 we can see that first, Watched and Watched Strategic Planning have
similar effects on speaking performance. Both of them significantly increased speech
fluency (measured by Speech_Rate) and complexity (measured by Total_Words, ML_
AS, and Subordination), but did not affect speech accuracy measured by the rate of
error-free clauses (EF_Clause_Rate) or lexical diversity. It is worth noting here that the
Watched condition was designed to reduce time pressure at the stage of conceptualization while Watched Strategic Planning was targeted at both conceptualization and
Table 8. A synthesis of all the experimental results

Measure
Speech_Rate
[CI95]
AS_End_Pause
[CI95]
AS_Mid_Pause
[CI95]
Reformulation
[CI95]
Total_Words
[CI95]
ML_AS
[CI95]
Subordination
[CI95]
EF_Clauses
[CI95]
EF_Clause_Rate
[CI95]
Lex.DiversityD
[CI95]
Watched
Online
planning
Watched
online
planning
Watched
strategic
planning
Task
pepetition
0.25 ns
0.03 ns
0.76
3.35
0.96a
[ 0.17, 1.76
[ -0.51, 1.0 ] [ -0.72, 0.79 ] [ -0.02, 1.55 ]

0.64 ns
0.85
[ 0.06,
]b
0.61 ns
0.58 ns
1.64 ] [ -0.13, 1.42 ] [ -0.17, 1.38 ] [ -0.19, 1.35 ]
0.24 ns
0.66 ns
0.16 ns
0.43 ns
[ 2.11,
4.59 ]
2.92
[ 1.77,
4.08 ]
0.63 ns
[ -0.51, 1.00 ] [ -0.11, 1.44 ] [ -0.60, 0.92 ] [ -0.33, 1.19 ] [ -0.19, 1.45 ]
0.38 ns
0.52 ns
0.99
0.53 ns
0.50 ns
[ -0.38, 1.14 ] [ -0.25, 1.29 ] [ 0.19, 1.79 ] [ -0.23, 1.30 ] [ -0.31, 1.32 ]
0.96a
[ 0.17,
1.43
1.53
0.76
0.20 ns
0.74
0.69
0.70
[ -0.04, 1.52 ] [ -0.56, 0.96 ] [ -0.09, 1.46 ] [ -0.08, 1.48 ]

0.44 ns
0.94
[ 0.15,
1.74 ] [ -0.33, 1.2 ]
0.93
[ 0.13,
3.34
1.76 ] [ 0.58, 2.28 ] [ 0.67, 2.39 ] [-0.02, 1.55 ] [ 2.10,
1.25
1.16
1.14
[ 0.34, 1.98 ] [ 0.32, 1.95 ]

1.41
0.99
1.73 ] [ 0.42, 2.07 ] [ 0.56, 2.25 ] [ 0.19, 1.79 ]
0.69 ns
0.49 ns
0.89
0.74 ns
[ -0.08, 1.47 ] [ -0.28, 1.26 ] [ 0.10, 1.68 ] [ -0.04, 1.52 ]

0.01 ns
0.56 ns
0.26 ns
0.05 ns
4.58 ]
0.89
[ 0.05,
1.73 ]
1.89
[ 0.93,
2.86 ]
3.47
[ 2.20,
4.74 ]
1.26
[ 0.39,
2.14 ]
0.73 ns
[ -0.74, 0.77 ] [ -0.21, 1.33 ] [ -0.50, 1.02 ] [ -0.71, 0.80 ] [ -0.10, 1.56 ]
Note: This analysis synthesizes the effect sizes of each experimental condition relative to the control condition.
Significant contrasts at p < .05; ns = non-significant.
a Effect
size d
and upper 95% Confidence Interval.
b Lower
formulation. In comparison to the Watched condition, participants in Watched Strategic Planning had 3 minutes strategic planning time before speaking to solve anticipated linguistic formulation problems. However it seems that having 3 more minutes
to prepare after watching the movie made no difference from the more limited condition of directly narrating the story after watching the movie.
Second, On-line Planning, which was designed to reduce time pressure at the linguistic formulation stage and hypothesized to increase speech accuracy, did not have
any significant effects on general measures of speech fluency, complexity, and accuracy. It did generate significant effects with speech length (measured by Total_Words)
45
46
Zhan Wang
and the number of error free clauses (measured by EF_Clause) that were produced.
However, these two effects were likely to be the result of the extended speaking time,
so that speech quantity was increased.
Third, unlike the On-line Planning condition, Watched On-line Planning
significantly enhanced speech complexity (measured by Total_Words, ML_AS,
and Subordination) and accuracy (measured by EF_Clause and EF_Clause_Rate).
In addition, Watched On-line Planning was the only condition that significantly
increased the number of reformulations an indicator of less fluency and more
self-repairs, which actually provides evidence for learners greater engagement in
the on-line planning and monitoring activities compared to the control condition.
It seems that although both On-line Planning and Watched Online Planning are
supposed to influence on-line linguistic formulation, only the Watched On-line
Planning condition enhanced performance through speaking in a more complex
and more accurate way. We will discuss in detail later why these two conditions produced such contrasting results.
Fourth, the Repetition condition, which was supposed to release time pressure at
all three speaking stages, generated significant improvement in speech fluency, complexity, and accuracy. The results therefore lend support to the previous hypotheses.
Moreover, the magnitude of effects for the Repetition condition is the largest among
all the conditions which were used.9
Last but not the least, none of the experimental conditions had a significant effect
on lexical diversity (measured by D), which is consistent with the general findings in
L2 planning studies (e.g. Ortega 1999; Yuan & Ellis 2003). Ortega used the type-token
ratio as a measure and did not find a significant planning effect with lexical variety.
Similarly, Yuan and Ellis (2003) did not find an effect of strategic planning or on-line
planning on lexical variety relative to No Planning either. It seems that learners lexical
performance is task dependent. As Skehan (2009b) and Skehan and Foster (2008) have
found, different tasks (Personal, Narrative and Decision-making tasks) resulted in significantly different lexical sophistication10 scores (Read, 2000) measured by Lambda.11
Tavakoli and Foster (2008) also demonstrated that a task with storyline background
produced higher lexical diversity D than a task without background in unstructured
tasks (but not in structured tasks) for EFL learners. They argued that it may be the
. Repetition is the only condition that used within-subject design. Such repeated measures may
slightly increase the effect sizes (Cortina & Nouri 2000).
. This lexical measure computes how many difficult words are used in a text defined by lower
frequencies on the basis of a certain frequency list from corpus analysis.
. This uses a Poisson distribution to identify events which have low frequency levels, following
Meara and Bell (2001) and Bell (2003) (see Skehan 2009a, and Chapter 1, this volume, for more
details).
number of events in tasks that determine the extent of lexical variety that L2 speakers
use more events, more diversity of vocabulary (p. 462).
Table 9 further summarizes the above results. Four patterns can be identified from
the table (and in relation to Table 3):
1. All the experimental conditions that contained pre-watching had a positive effect
on speech complexity.
2. All the pre-watching conditions that did not involve on-line planning improved
speech fluency as well indicating that reducing time pressure at conceptualization had positive effects on speech complexity and fluency.
3. The experimental conditions that had time pressure reduction at both conceptualization and on-line formulation (i.e. Watched On-line Planning) had a positive
effect on speech accuracy.
4. The only condition that had positive effects on all the three components of speaking performance (i.e. fluency, complexity and accuracy) is the Repetition condition, which targets time pressure reduction at conceptualization, formulation, and
articulation.
Table 9. Comparing each experimental condition with the control condition
Experimental conditions
Complexity
Accuracy
Fluency
Lexis
Watched
highera
nsb
higher
ns
Watched Strategic Planning
higher
ns
higher
ns
Online Planning
ns
ns
ns
ns
Watched Online Planning
higher
higher
lower/ns
ns
Repetition
higher
higher
higher
ns
ahigher/lower
refers to a certain intervention that has significantly higher/lower complexity/

accuracy/fluency than the control condition;
b ns refers to no significant difference that was found.
Discussion
This study started with a discussion of the stages of L2 speech production. It then
explored methods of achieving time pressure reduction for L2 speakers, drawing upon
Ellis task-based planning framework. A study was designed to explore the comparative effectiveness of several different research conditions. The results obtained from
the study demonstrated that different interventions had different effects on speech
performance. In this section, we mainly discuss the following three issues:
1. Why did time pressure reduction at the conceptualization stage result in higher
language complexity and fluency (as in the Watched and Watched Strategic
conditions)?
47
48
Zhan Wang
2. Why did time pressure reduction solely at the on-line formulation stage fail to
improve the quality of speech (as in the On-line Planning condition) whereas
intervention targeting both the formulation and conceptualization stages resulted
in higher speech complexity and accuracy (as in the Watched On-line Planning
condition)?
3. Why did the conditions involving monitoring (as in Watched On-line Planning
that involves on-line monitoring and Repetition that involves speech perception
monitoring) result in higher speech accuracy?
Skehan et al. (2012) have argued that working with ideas, rehearsing, and monitoring are the three processes that influence how L2 speech is produced and what the
nature of the speech performance will be. This analysis may provide answers to the
above questions. Based on the results of this study, and trying to connect pedagogical
interventions with the speech production processes, I adopt their generalization and
revise the three processes into content conceptualization, linguistic formulation, and
speech monitoring (See Table 10).
Table 10. An instructional model of L2 speech interventions
Influence
Intervention focus
Content conceptualization
Linguistic formulation
Speech monitoring
Complexity
(syntactic)
Accuracy
Fluency
(speech rate)
+
(+)
+ refers to a positive effect; refers to a negative effect

(+) refers to a conditional positive effect (e.g. the intervention targeting linguistic
formulation may have a positive effect if the conceptualization pressure has been dealt with).
Table 10 presents an instructional model of L2 speech interventions. First of all,

following L2 speech production theories from among others Bygate (2001), Bygate
and Samuda (2005), Ellis (2005), Kormos (2006), Skehan (2009a), Skehan et al. (2012),
the model argues that the three processes (i.e. content conceptualization, linguistic
formation, and speech monitoring) should be the focus of speech interventions. The
Levelt (1989) model proposed all the processing stages/phases of speaking. However,
not all the processes in the Levelt model should be the pedagogical focus. As argued by
Bygate and Samuda (2005), speech articulation is not thought to be a major drain on
attention capacity; it is unlikely to be the focus of explicit planning (p. 43); nor should
it be the focus of language interventions for adult learners who are already proficient
at L2 pronunciation. Instead, monitoring, which has often been neglected in many
L2 speaking curricula and methods, should be highlighted in language instruction.
This is because monitoring is involved in almost every process of speech production,

especially with conceptualization, formulation, and the links between the two (since
articulation is largely an automatic process) (Bygate 2001, p. 25; Bygate & Samuda
2005, pp. 4445). The quality of monitoring therefore directly concerns the quality
of speaking performance especially regarding speech accuracy as indicated by this
study. Second, based on the results of this study, the model lists the effects of different interventions on speech performance, which connects psycholinguistic processing with potential language learning products. The results of this study indicate that
interventions at speech conceptualization enhance language complexity and fluency;
interventions at the formulation stage influence language fluency and may improve
language accuracy too (if the conceptualization pressure has been dealt with); and
more attention for monitoring speech production enhances language accuracy. More
detailed discussions of the evidence and pedagogical suggestions are provided in the
following sections.
Intervention targeting content conceptualization

Interventions at content conceptualization (Bygate & Samuda 2005) or working with
ideas (Skehan et al. 2012) can extend and organize the content of what is being said.
In this respect, the pre-watching condition actually allowed speakers to organize the
ideas to be expressed. Given that syntactic structure is strongly influenced by the output of the conceptualization stage, that is, the pre-verbal message (Levelt 1989, 1999),
the conditions allowing pre-watching increased the scale or range of ideas and clause
embedding so that when speech performance was assessed through complexity indices such as total words, mean length of AS Unit, and the number of subordinations per
AS Unit, significantly higher complexity was found for the pre-watching conditions.
Although the present study is based on L2 data, the raised complexity applies to both
L1 and L2 speech production because the speaking stage at conceptualization is largely
controlled processing for both L1 and L2. The pre-watching effect on language complexity found in this study is consistent with pre-task planning research in L1 speaking
(Foster 2001; Skehan 2009a, 2009b; Skehan & Foster 2007, 2008) as well as in L2 speaking (Kawauchi 2005; Ortega 1999; Tavakoli & Skehan 2005).
Intervention at conceptualization can also enhance speech fluency. This is
because the parallel processing assumption of L2 speaking (Bygate & Samuda 2005;
Kormos 2006; this study) suggests that language fluency is related to both the stages
of conceptualization and formulation, so that acceleration and smoothing in either
stage would help improve fluency and vice versa, a result pointed out by Bygate and
Samuda (2005) as speedier (and more accurate) speech performance (p. 45). It is
assumed that having pre-watching before speaking probably accelerates the processing of conceptualization during speaking. The results from this study, by and large,
lend support to these assumptions. Conceptualization is eased in the Watched and
49
50
Zhan Wang
Watched Strategic Planning conditions and shows increased fluency. However,

Watched On-line Planning though containing a pre-watching facilitation of content
conceptualization, resulted in significantly more reformulations (which means more
breakdowns and less fluency) than the control condition. It seems that in the Watched
On-line Planning condition, the on-line planning component mitigated the effect of
the pre-watching advantage on speech fluency. The effect of pre-watching on fluency
found in this study is consistent with pre-task planning research in both L1 (Foster
2001; Foster & S kehan 1996; Ortega 1999; Skehan & Foster 2007, 2008) and L2 speaking (Mehnert 1998; Ortega 1999; Sangarun 2005; Skehan & Foster 1997, 2005; Tajima
2003; Tavokoli & Skehan 2005; Gilabert 2007; Wendel 1997; Wigglesworth 1997).
Intervention targeting linguistic formulation

Interventions at linguistic formulation have been hypothesized to increase speech accuracy (Table 3). However, the pure On-line Planning condition in this study was not
found to improve the quality of speech performance significantly. In contrast, on-line
planning with pre-watching (the Watched On-line Planning condition) improved
speech accuracy (and speech complexity as well). This could be termed a conceptualized on-line planning effect. That conceptualized on-line planning helps language
complexity and accuracy can be explained by the incremental feature of L1 and L2
speech production. Indeed, sequentially, knowing what to say is a pre-requisite for
knowing how to say (Ellis 2003, p. 109). Knowing what to say might help speakers
use the spare capacity for monitoring linguistic formulation (Bygate & Samuda 2005).
This implies that on-line planning if there is no prior conceptualization might not
have much effect on syntactic formulation. It is also likely that a speaker will not direct
their attention to formulation (even when they have been given on-line planning time)
unless the conceptualization pressure is dealt with a similar natural tendency to the
meaning-priority principle in listening (VanPatten 1990), a point which is also raised
by Skehan (1998) and Bygate and Samuda (2005) regarding speech production.
A similar operationalization of conceptualized on-line planning can be found in
studies by Ellis (1987) and Yuan and Ellis (2003). They operationalized on-line planning as providing learners unlimited time to produce writing or speaking based on
picture prompts, which means that speakers could have enough time (i.e. unpressured
on-line planning time, Ellis 2005) for both content conceptualization and linguistic formulations under the on-line planning condition. Yuan and Ellis (2003) have
reported similar results to the current study in that syntactic complexity and speech
accuracy were improved.
Intervention targeting speech monitoring

The only two conditions in the study that enhanced speech accuracy are: Watched
On-line Planning and Repetition. Ellis (2003) has argued that the mixed results of
the planning effects on speech accuracy in the literature reflect whether learners were
able to, or chose to engage in monitoring while they performed the task (p. 131).
Based on the results of speech accuracy in Bui (this volume), Li (this volume), and
this study, I also argue that monitoring is the key to accuracy. Skehan et al. (2012)
have proposed that allocating attentional resources to monitor what is being said
just before it is said induces selective attention towards accuracy. Watched On-line
Planning freed up attentional resources for conceptualization and also released time
for on-line planning so as to commit more attention to the monitoring of linguistic
structures that are being produced. In Buis study, pre-task planning did not help L2
learners improve accuracy whereas familiarity with the speaking topic (operationalized as the task topic being related to learners major of study) did. The advantage of
familiarity probably allowed less pressure on conceptualization and easier retrieval of
topic-related lexis so as to have more attentional resources to monitor the speech as
a result, accuracy was increased. Similarly, Li found that having post-task activities
(operationalized as transcribing their own speech products) induced L2 speakers to
produce more accurate language. As Skehan et al. (2012) proposed, learners who knew
they would transcribe their own errors seemed to (selectively) direct their attention to
accuracy during speech.
Following this analysis, it is claimed that the accuracy benefit demonstrated in this
study is not solely because of the interventions at both conceptualization and formulation stages (as in Watched On-line Planning), but more precisely it is because the
reduction in time pressure at the two stages provides greater opportunities to direct
attentional resources to the monitoring of speech production so that speech accuracy
was raised. The evidence of increased self-repair (measured by reformulations) in the
Watched On-line Planning condition is essentially evidence for more engaged speech
monitoring.
Two other studies which also provide supportive evidence for this claim are
worth mentioning. Hulstijn and Hulstijn (1984) hypothesized that two conditions in
their study would favour accuracy: (a) instructions to focus-on-form (in contrast to
focus-on-meaning), and (b) slowed-down processing (in contrast to normal processing speed). The results showed that only the focus-on-form condition helped produce
higher grammatical accuracy, while slowed-down processing did not. Interestingly, the
focus-on-form condition led to more time being taken in processing the speech, indicating that focus-on-form might involve a higher level of speech monitoring whereas
slowing-down speech to enable on-line planning might not necessarily. Mochizuki
and Ortega (2008) found similar results in terms of an instructional intervention targeting the monitoring of grammatical structures. They investigated three speaking
conditions in a narrative task with an auditory stimulus: no planning, unguided planning, and guided planning (operationalized as providing printed instructions on how
to make relative clauses). The results showed that while guided planners had a lower
speech rate than unguided planners and no planners, they had higher accuracy than
51
52
Zhan Wang
unguided planners in the relative clauses that they produced. All the studies discussed
here are consistent with the generalisation that allowing unlimited time for processing
or Watched On-line Planning does not necessarily enhance accuracy, but directing
learners attention to speech monitoring often does. This claim can then explain why
mixed results have been found in the literature regarding whether different types of
planning benefit speech accuracy (Ellis 2009).
Finally an effect for speech monitoring was also found in the Repetition condition too. As proposed by Bygate and Samuda (2005), it is likely that speaking for
the first time has already involved the speech comprehension system so that speech
monitoring can operate at three places: the pre-verbal message, the inner speech plan
and overt speech plan (Levelt 1993). In other words, when L2 speakers are actually
speaking for the first time, they become the first (more than anyone else) to parse their
speech. As a result, the degree of monitoring in speech production in the Repetition
condition is higher than the other conditions that do not involve overt articulation.
Repeating the same task would induce more attention for monitoring, from which the
whole speech performance would benefit. This accuracy benefit found from immediate task repetition in this study is consistent with the findings in Bygate (1996), and
Gass et al. (1999).
To sum up, based on the results of this study, an instructional model of L2 speech
intervention was proposed (in Table 10). It suggests that different instructions will
enhance different aspects of language. First, L2 complexity could be enhanced by
interventions focusing on conceptualization. Second, speech fluency can be improved
by the manipulation of either conceptualization or formulation: easing conceptualization or enabling rehearsal at the formulation stage could increase fluency but expanding on-line formulation could decrease fluency to some extent. Third, interventions
focusing on speech monitoring is the key to improving language accuracy.
Conclusion
Researchers are interested in models of second language speaking because such models can uncover the subtle underlying mechanisms of speech production. Second
language pedagogy needs instructional models because they help teachers teach in
a systematic and effective way. The overarching goal of this study is to connect L2
speech production theories and instruction in order to develop an evidence-based
instructional model for L2 teaching. I have tried to accomplish this goal by establishing a relation between the points of interventions with different processes of speech
production, on the one hand, and with performances, on the other. The five different
types of planning and repetition conditions designed, which are: Watched, On-line
Planning, Watched On-line Planning, Watched Strategic Planning, and Repetition
represent time pressure reduction at certain production stages when compared with
the control condition.
The results showed that Watched and Watched Strategic Planning both helped
produce more complex and more fluent language; On-line Planning did not help
complexity and accuracy, whereas Watched On-line Planning did help produce more
complex and more accurate language but with a trade-off of having more reformulations in speech; repetition is a robust condition that promoted speech complexity,
accuracy, and fluency with large effect sizes.
These results lead to the question as to why On-line Planning, which is s upposed
to direct attention to on-line formulation, did not help language accuracy but Watched
On-line Planning did; and further, why there are often mixed results with accuracy in
task-based planning research. What has been proposed here is that L2 speakers are not
likely to attend to form unless conceptualization pressure has been solved a meaning priority principle in L2 speaking. Because of this, Watched On-line Planning has
been described as conceptualized on-line planning. It is argued that conceptualized
on-line planning has a greater chance, in terms of time and attentional resources, to
focus on speech monitoring. The evidence from this study as well as a range of other
empirical studies is consistent with this claim, such as in Bui (this volume), H
ulstijn
and Hulstijn (1984), Li (this volume), Mochizuki and Ortega (2008), and Yuan and Ellis
(2003). These studies, together with the findings from the Watched On-line P
lanning
and Repetition conditions of this study, all report an accuracy effect for second language speaking. It is concluded that speech monitoring is the key to L2 speech accuracy.
Author note
The author would like to express gratitude to Peter Skehan, Martin Bygate, John Norris,
Kris Van den Branden, and Rod Ellis for their helpful comments on earlier versions
of this article.
References
Ahmadian, M.J., & Tavakoli, M. (2011). The effects of simultaneous use of careful online planning
and task repetition on accuracy, fluency, and complexity of EFL learners oral production. Language Teaching Research, 15, 3559.
Anderson, J.R. (1983). The architecture of cognition. Cambridge, MA: Harvard University Press.
Anderson, J.R., & Lebiere, C. (1998). The atomic components of thought. Mahwah, NJ: Lawrence
Erlbaum Associates.
Anderson, J.R., Bothell, D., Byrne, M.D., Douglass, S., Lebiere, C., & Qin, Y. (2004). An integrated
theory of the mind. Psychological Review, 111(4), 10361060.
53
54
Zhan Wang
Baddeley, A.D. (1986). Working Memory. Oxford: OUP.
Baddeley, A.D. (2003). Working memory and language: An overview. Journal of Communication
Disorders, 36, 189208.
Bell, H. (2003). Using frequency lists to assess L2 texts. Unpublished Ph.D. thesis, University of
Swansea.
Bygate, M. (1996). Effects of task repetition: Appraising the developing language of learners. In
D.Willis & J. Willis (Eds.), Challenge and change in language teaching (pp. 134146). London:
Heinemann.
Bygate, M. (1999). Task as context for the framing, reframing and unframing of language. System,
27(1), 3348.
Bygate, M. (2001). Effects of task repetition on the structure and control of oral language. In
M. Bygate et al., (Eds.), Researching pedagogic tasks: Second language learning, teaching, and
testing (pp. 2348). Harlow: Longman.
Bygate, M., & Samuda, V. (2005). Integrative planning through the use of task-repetition. In R.Ellis
Benjamins.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates.
Cohen, J. (1992). A power primer. Psychological Bulletin, 112(1), 155159.
Cohen, J. (1994). The earth is round (p <.05). American Psychologist, 49, 9971003.
Cortina, J.M., & Nouri, H. (2000). Effect size for ANOVA Designs. State University Papers Series on
Quantitative Applications in the Social Sciences, 07129. Thousand Oaks, CA: Sage.
11, 36783.
De Jong, N. & Perfetti, C.A. (2011). Fluency training in the ESL classroom: An experimental study of
fluency development and proceduralization. Language Learning, 61, 533568.
Drnyei, Z., & Scott, M.L. (1997). Communication strategies in a second language: Definitions and
taxonomies. Language Learning, 47, 173210.
Doughty, C. (2001). Cognitive underpinnings of focus on form. In P. Robinson (Ed.), Cognition and
second language instruction (pp. 206257). Cambridge: CUP.
Ellis, R. (2009). The differential effects of three types of task planning on the fluency, complexity and
accuracy in L2 oral production. Applied Linguistics, 30(4), 474507.
Ellis, R. (1987). Interlanguage variability in narrative discourse: Style shifting in the use of the past
tense. Studies in Second Language Acquisition, 9, 120.
Ellis, R. (2005). Integrative planning through the use of task-repetition. In R. Ellis (Ed.), Planning
and task performance in a second language (pp. 336). Amsterdam: John Benjamins.
Ellis, R. (2008). The study of second language acquisition. Oxford: OUP.
accuracy in l2 oral production. Applied Linguistics, 30, 474509.
Ellis, R. & Barkhuizen, G. (2005). Analysing learner language. Oxford: OUP.
Foster, P. (1996). Doing the task better: How planning time influences students performance. In
J. Willis & D. Willis (Eds.), Challenge and change in language teaching (pp. 126135). Oxford:
Heinemann.
Foster, P. (2001). Rules and routines: A consideration of their role in the task-based language
production of native and non-native speakers. In M. Bygate, P. Skehan, & M. Swain (Eds.),
Harlow: Longman.
Foster, P., Tonkyn, A., & Wigglesworth, J. (2000). Measuring spoken language: A unit for all reasons.
Applied Linguistics, 21(3), 35475.
Foster, P., & Skehan, P. (1996). The influence of planning on performance in task-based learning.
Foster, P., & Skehan, P. (1999). The influence of source of planning and focus of planning on taskbased performance. Language Teaching research, 3, 185214.
Gass, S., Mackey, A., lvarez-Torres, M.J., & Fernndez-Garca, M. (1999). The effects of task repetition on linguistic output. Language Learning, 49, 549581.
Gilabert, R. (2007). Effects of manipulating task complexity on self-repairs during L2 oral production. IRAL, 45, 215240.
Guar-Tavares, M.G. (2008). Pre-task planning, working memory capacity and L2 speech performance.
Unpublished Ph.D. thesis. Universidade Federal de Santa Catarina, Brazil.
Hinkel, E. (2004). TOEFL test strategies with practice tests (3rd ed.) Hauppauge, NY: Barrons.
Hulstijn, J.H., & Hulstijn, W. (1984). Grammatical errors as a function of processing constraints and
explicit knowledge. Language Learning, 34, 2343.
Hunter, J.E., & Schmidt, F.L. (1990). Methods of meta-analysis: Correcting error and bias in research
findings. Newbury Park, CA: Sage.
Kawauchi, C. (2005). The effects of strategic planning on the oral narratives of leaners with low and
high intermediate L2 proficiency. In R. Ellis (Ed.), Planning and task performance in a second
language (pp. 3776). Amsterdam: John Benjamins.
Kello, C.T, & Plaut, D. (2003). Strategic control over rate of processing in word reading: A computational investigation. Journal of Memory and Language, 48, 207232.
Kello, C.T. (2004). Control over the time course of cognition in the tempo-naming task. Journal of
Experimental Psychology: Human Perception and Performance, 30(5), 942955.
Kello, C.T., & Plaut, D.C. (2000). Strategic control in word reading: Evidence from speeded responding in the tempo-naming task. Journal of Experimental Psychology: Learning, Memory, & Cognition, 26, 719750.
Kello, C., Plaut, D., & MacWhinney, B. (2000). The task-dependence of staged versus cascaded processing: An empirical and computational study of Stroop interference in speech production.
Journal of Experimental Psychology: General, 129(3), 340360.
Kempen, G., & Hoenkamp, E. (1987). An incremental procedural grammar for sentence formulation. Cognitive Science, 11, 201258.
Erlbaum Associates.
Kroll, J.F., & Stewart, E. (1994). Category interference in translation and picture naming: Evidence
for asymmetric connections between bilingual memory representations. Journal of Memory and
Language, 33, 149174.
Lashley, K.S. (1951). The problem of serial order in behavior. In L.A. Jeffress (Ed.), Cerebral mechanisms in behavior (pp. 112146). New York, NY: Wiley.
Levelt, W.J.M, Roelofs, A., & Meyer, A.S. (1999). A theory of lexical access in speech production.
Behavioral and Brain Sciences, 22, 175.
Levelt, W.J.M. (1983). Monitoring and self-repair in speech. Cognition, 33, 41103.
Levelt, W.J.M. (1989). Speaking: From intention to articulation. Cambridge, MA: The MIT Press.
Levelt, W.J.M. (1993). Language use in normal speakers and its disorders. In G. Blanken, J. D
ittmann,
H. Grimm, J.C. Marshall, & CW. Wallesch, (Eds.), Linguistic disorders and pathologies
(pp.115). Berlin: De Gruyter.
55
56
Zhan Wang
Levelt, W.J.M. (1999). Producing spoken language: A blueprint of the speaker. In C. Brown &
P. Hagoort (Eds.), Neurocognition of language (pp. 83122). Oxford: OUP.
Levelt, W.J.M. (2001). Spoken word production: A theory of lexical access. Proceedings of the National
Academy of Sciences USA, 98(23), 1346413471.
Long M., & Robinson, P. (1998). Focus on form: Theory, research, and practice. In C. Doughty &
J. Williams (Eds.), Focus on form in classroom SLA (pp. 1541). Cambridge: CUP.
Lynch, T., & Maclean, J. (2000). Exploring the benefits of task repetition and recycling for classroom
language learning. Language Teaching Research, 4, 22150.
Lynch, T., & Maclean, J. (2001). Effects of immediate task repetition on learners performance. In
M. Bygate, P. Skehan, & M. Swain (Eds.), Researching pedagogic tasks, second language learning,
teaching and testing (pp. 99118). Harlow: Longman.
MacWhinney, B. (2000). The CHILDES Project: Tools for Analyzing Talk (3rd ed.). Mahwah, NJ: Lawrence Erlbaum Associates.
Meara, P., & Bell, H. (2001). P_Lex: A simple and effective way of describing the lexical characteristics of short L2 texts. Prospect, 16(3), 519.
Meehl, P.E. (1990). Why summaries of research on psychological theories are often uninterpretable.
Psychological Reports, 66, 195244.
Meyer, D.E., & Gordon, P.C. (1985). Speech production: Motor programming of phonetic features.
Journal of Memory and Language, 24, 326.
Mochizuki, N., & Ortega, L. (2008). Balancing communication and grammar in beginning level foreign language classrooms: A study of guided planning and relativization, Language Teaching
Research, 12, 1137.
Norris, J.M., & Ortega, L. (2009). Towards an organic approach to investigating CAF in SLA: The case
of complexity. Applied Linguistics, 30(4), 555578.
Norris, J.M., & Ortega, L. (2000). Effectiveness of L2 Instruction: A research synthesis and quantitative meta-analysis. Language Learning, 50(3), 417528.
Ortega, L. (1999). Planning and focus on form in L2 oral performance. Studies in Second Language
Richards, B.J., & Malvern, D.D. (1998). A new research tool: Mathematical modelling in the measurement of vocabulary diversity (Award reference no. R000221995). Final Report to the E
conomic
and Social Research Council, Swindon, UK.
Robinson, P. (1995). Task complexity and second language narrative discourse. Language Learning,
45, 99140.
Robinson, P. (2011). Task-based language learning: A review of issues. Language Learning,
61(Suppl.1), 136.
Roelofs, A. (1997). The WEAVER model of word-form encoding in speech production. Cognition,
64, 249284.
Rosenthal, R. (1991). Meta-analytic procedures for social research. Newbury Park, CA: Sage.
Samuda, V., & Bygate, M. (2008). Tasks in second language learning. Basingstoke: Palgrave.
Sangarun, J. (2005). The effects of focusing on meaning and form in strategic planning. In R. Ellis (Ed.),
Planning and task performance in a second language (pp. 111141). Amsterdam: John Benjamins.
Sawaki, Y., Stricker, L.J., & Oranje, A.H. (2009). Factor structure of the TOEFL internet-based test.
Language Testing, 26(1), 530.
Schmidt, R. (1990). The role of consciousness in second language learning. Applied Linguistics, 11,
129158.
Schneider, W., & Chein, J.M. (2003). Controlled & automatic processing: Behavior, theory, and biological mechanisms. Cognitive Science, 27, 525559.
Schriefers, H., Meyer, A.S., & Levelt, W.J.M. (1990). Exploring the time course of lexical access in
language production: Picture-word interference studies. Journal of Memory and Language, 29,
86102.
Seidenberg, M.S. (1997). Language acquisition and use: Learning and applying probabilistic constraints. Science, 275, 15991603.
Shiffrin, R.M., & Schneider, W. (1977). Controlled and automatic human information processing,
II: Perceptual learning, automatic attending and a general theory. Psychological Review, 84(2),
127190.
Skehan, P. (2009a). Modelling second language performance: Integrating complexity, accuracy,
fluency and lexis. Applied Linguistics, 30(4), 510532.
Skehan, P. (2009b). Lexical performance by native and non-native speakers on language-learning
tasks. In H. Daller, D. Malvern, P. Meara, J. Milton, B. Richards, & J. Treffers-Daller (Eds.),
Vocabulary studies in first and second language acquisition: The interface between theory and
application (pp. 107124). London: Palgrave Macmillan.
Skehan, P., & Foster, P. (1997). The influence of planning and post-task activities on accuracy and
complexity in task-based learning. Language Teaching Research, 1(3), 1633.
retellings. Language Learning, 49, 93120.
Skehan, P., & Foster, P. (2005). Strategic and online planning: The influence of surprise information
Skehan, P., & Foster, P. (2008). Complexity, accuracy, fluency and lexis in task-based performance:
A meta-analysis of the Ealing research. In S. Van Daele, A. Housen, F. Kuiken, M. Pierrard, &
I. Vedder, (Eds.), Complexity, accuracy and fluency in second language use, learning and teaching
(pp. 263284). Brussels: Royal Flemish Academy of Belgium for Sciences and Arts.
Skehan, P., Bei, X., Li, Q., & Wang, Z. (2012). The task is not enough: Processing approaches to taskbased performance. Language Teaching Research, 16(3), 170187.
Squire, L.R. (1987). Memory and brain. Oxford: OUP.
Tajima, M. (2003). The effects of planning on oral performance of Japanese as a foreign language.
Unpublished Ph.D. thesis, Purdue University.
Tavakoli, P., & Foster, P. (2008). Task design and second language performance: The effect of narrative type on learner output. Language Learning, 58, 439473.
Tavakoli, P., & Skehan, P. (2005). Strategic planning, task structure, and performance testing. In
R. Ellis (Ed.), Planning and task performance in a second language (pp. 239276). Amsterdam:
John Benjamins.
Ullman, M.T. (2001a). The declarative/procedural model of lexicon and grammar. Journal of Psycholinguistic Research, 30(1), 3769.
Ullman, M.T. (2001b). The neural basis of lexicon and grammar in first and second language: The
declarative/procedural model. Bilingualism: Language and Cognition, 4(1), 105122.
57
58
Zhan Wang
Ullman, M.T. (2004). Contributions of memory circuits to language: The declarative/procedural
model. Cognition, 92(12), 231270.
Ullman, M.T. (2008). The role of memory systems in disorders of language. In B. Stemmer & H.A.
Whitaker (Eds.), Handbook of the neuroscience of language (pp. 189198). Oxford: Elsevier.
Ullman, M.T., Pancheva, R., Love, T., Yee, E., Swinney, D., & Hickok, G. (2005). Neural correlates of
lexicon and grammar: Evidence from the production, reading, and judgment of inflection in
aphasia. Brain and Language, 93(2), 185238.
VanPatten, B. (1990). Attending to content and form in the input: An experiment in consciousness.
Wendel, J. (1997). Planning and second language narrative production. Unpublished Ph.D. thesis,
Temple University, Japan.
Wigglesworth, G. (1997). An investigation of planning time and proficiency level on oral test discourse. Language Testing, 14, 85106.
Yuan, F., & Ellis, R. (2003). The Effects of pre-task planning and online planning on fluency, complexity, and accuracy in L2 monologicoral production. Applied Linguistics, 24, 12.
Appendix A The material content (story scenes)

Story 1 Mr. Bean Had a Sleepless Night
1. Cleaning teeth with an electronic toothbrush
2. Washing mouth with a toy water gun
3. Using the toothbrush to clean ears
4. Saying good night to the mirror
5. Asking Teddy to go to bed
6. Reading a book for Teddy and laughing foolishly
7. Putting a pair of glasses onto the Teddy
8. Waving Good night to Teddy
9. Putting Teddy onto the bed
10. Shooting the light by a gun to switch it off
11. The clock is making a loud noise
12. Another noise from outside of the window
13. Making himself up as a pig to drive away the cats outside
14. Going back to bed but unfortunately is asphyxiated by the pillow
15. Switching on a TV by a spear
16. Chess playing program is on TV
17. Being waken up suddenly by TV advertisement
18. Starting to count the sheep
19. Being confused by the number
20. Getting an answer from a calculator
Story 2 Mr. Bean went Shopping

1. Showing off his credit card in a Department store
2. Being disgusted by perfume
3. Crawling on the floor
4. Persuade a woman from moving on
5. Going to the tooth brush shelves
6. Unpacking and trying a toothbrush
7. Taking an intact toothbrush
8. Trying on a bathroom towel
9. Going to the hardware section
10. Paring a potato
11. Selecting the pans
12. Taking out a fish from pocket to try a pan
13. Being annoyed by the soundless telephones
14. Picking up the saleswomans phone set
15. Checking out and showing off his card again
16. His card was swapped by an old man by mistake
17. Trying to steal back his card
18. Following the old man to a toilet
19. Helping the man find the toilet paper
20. Shocked the old man.
Appendix B Task instructions

Condition [1] The Control Condition
You will see a section of a video from the Mr. Bean series. The section will last for 5 minutes.
While you are watching the video for the first time, I would like you to tell the story of what is
happening.
Imagine you are telling the story to someone who has never seen this video, and cannot see the
screen at the same time as you. You should try to tell the story in the same time that the video
runs. As the events in the story take place, tell the story to this person; when the video stops you
should be finishing your story, and you should stop too.
Please be as detailed as you can when you are telling the story. Try to use Mr. Bean or He
as the subject of the story, and try to say what is happening at any point, probably using the
present tense. For example:

Mr. Bean is watching TV.

He feels a bit tired.
59
60
Zhan Wang
Condition [2] The Watched condition

You will see a section of a video from the Mr. Bean series. The section will be played for you
twice. While you are watching for the second time; I would like you to tell the story of what is
happening. You can use the first watch as a preparation for your narrative.


Condition [3] The Online Planning
You will see a section of a video from the Mr. Bean series. The section will last for 8 minutes.
While you are watching the video for the first time, I would like you to tell the story of what is
happening.


Condition [4] Watched Online Planning
You will see a section of a video from the Mr. Bean series. While you are watching for the
second time; I would like you to tell the story of what is happening. You can use the first watch
as a preparation for your narrative.


Condition [5] Watched Strategic Planning

You will see a section of a video from the Mr. Bean series. The section will be played for you
twice. There is a 3-minute interval between your first watch and the second watch. While you
are watching for the second time, I would like you to tell the story of what is happening. You
can use the first watch as a preparation and you can use 3 minute interval to plan your narrative.


Condition [6] Repetition
The task instructions for this condition are the same as the Control condition. However, the
participants did not know that they would be told to carry out the same task again until they
have finished speaking one time. In other words, the researcher tried to obtain independent
measures of the Control condition and the Repetition condition. There was a one minute
interval between the two performance as the researcher used the time to tell the speaker to continue with a next task (e.g., Repetition) and to play the same instructions again (then followed
by the video for narration).
61
chapter 3
Task readiness
Theoretical framework and empirical evidence
from topic familiarity, strategic planning,
and proficiency levels*
Bui Hiu Yuet Gavin
Hang Seng Management College, Hong Kong
The construct of planning, operationalized as strategic planning, rehearsal and on-line

planning (Ellis 2009), has been much studied in the task-based language teaching
literature. These forms of planning could be thought of as task-external readiness
in which extra preparatory time is provided for learners to focus their attention
on certain performance areas. This chapter proposes a theoretical framework of
task-readiness as an extension to planning so that task research in planning could
broaden its horizons from a task-external perspective to also include a task-internal
perspective, that is, familiarity with aspects of a task.
To examine the relationship between task-external and task internal readiness,
this study explores the effects of topic familiarity (task-internal readiness), strategic
planning (task-external readiness), and proficiency levels (an individual difference)
in a 2 22 split-plot factorial design. The results show that both topic familiarity
and s trategic planning promoted more fluent language, but strategic planning
was a stronger form of task-readiness as indicated by its effect sizes. In contrast,
topic familiarity induced more accurate performance from the participants, while
planning was associated with significantly higher complexity. Proficiency seemed to
be positively related to formal accuracy rather than to fluency as higher proficiency
participants always scored higher in accuracy and sometimes in complexity, but not
so much in fluency. These findings suggest that though task-internal readiness and
task-external readiness share common factors in rendering assistance to learners, they
differ in their influence on various performance areas as well as the magnitude of the
effects. All this lends support to the differentiation between task-external and taskinternal readiness, and to the notion of task-readiness as a contextualizing framework
for relevant task research. Based on the theoretical discussion and empirical results,
pedagogical implications are also outlined in this chapter.
* I would like to thank the Editor of this volume, Peter Skehan, who was also my Ph.D. supervisor,
for his guidance through this research. Thanks also go to Martin Bygate for his valuable comments
on previous drafts of this chapter.
64
Bui Hiu Yuet Gavin
Introduction
One distinctive feature of second language (L2) speaking is that many learners put
a lot of effort into speaking but still fail to reach native-like proficiency. A tension
between the meaning to express and the appropriate forms to use becomes a major
challenge to L2 learners. While communicative language ability in general involves
the ability to express ideational, interpersonal and discoursal meanings through the
use of formal linguistic resources, L2 development in particular further requires helping learners to achieve the capacity to use resources already available to them (Bygate
2001). There exists a gap between L2 learners potential competence and their actual
performance. Such a phenomenon may be attributed to L2 learners underdeveloped
proficiency, but on top of this, they also fall prey to their processing capacity limitations (Baddeley 1997; Skehan 1998). Therefore, there is a need to explore pedagogical
tasks and task conditions which go beyond cultivating underlying structural abilities
and which also increase learners readiness to handle various communicative needs
(Samuda 2001).
Much current research done to this end has focussed on planning (Ellis 2005,
2009), operationalized in a variety of forms such as pre-task planning and within-task
planning, as a means of maximizing learners readiness for tasks (see Wang, C
hapter2,
this volume). Different types of planning have been shown to promote learner performance in different areas, that is, fluency, complexity, and accuracy. What seems
unfortunate is that, after all these studies, we still lack a comprehensive account of the
interrelationship between various types of planning from a more wide-ranging perspective. The term planning per se, if it is to be used as an umbrella term for concepts
such as rehearsal, strategic planning, and online planning, (Ellis 2005), appears to fall
short of capturing the generic features which are shared, and is a bit too limited in its
scope as a means for preparing students to handle tasks more effectively. Based on an
empirical study, this chapter proposes task readiness as an alternative theoretical
framework to planning in order that task research can be better contextualized and
the different types of planning can be more clearly inter-connected to each other in
research as well as practice.
A theoretical framework of task readiness

Ellis (2005) distinguished between two types of planning: (1) pre-task planning which
can be further divided into rehearsal and strategic planning, and (2) within-task
planning that subsumes both pressured and unpressured situations. Rehearsal, simply
put, is to allow learners to practise a task before its actual performance, as e xemplified
Task readiness
by Bei (2013). Rehearsal usually involves explicit signalling to the learner that the previous performance may serve as preparation for the next. This makes an interesting
contrast with task repetition (Bygate 2001) in which students receive no briefing about
future performance, thus their drawing on the prior knowledge of the same task for
the following tasks becomes implicit planning (see Table 1 below for the comparison). In general both rehearsal and task repetition show very strong positive effects
on fluency, complexity, and/or accuracy (also see Wang 2009; Chapter 2, this volume).
Strategic planning is the most widely studied form of planning in the literature. It is
generally operationalized as offering planning time (Crookes 1989; Foster & Skehan
1996; Skehan & Foster 2005) prior to a task. Strategic planning is in most cases found
to push learners towards more fluent speech which involves more complex clausal
structure, whereas its effects on accuracy are not consistent in the literature (Ellis 2009;
Skehan Bei, Li & Wang 2012). One possible reason for the mixed results with accuracy is that studies have differed as to whether the task conditions allowed time for or
encouraged careful on-line planning (i.e. formulation and monitoring of speech plans
during performance) (Yuan & Ellis 2003). Within-task planning, or on-line planning,
is assumed to occur when sufficient time is available for planning during speaking. An
example of unpressured within-task planning is to have learners describe an edited
video which is played at a lower speed (Wang, Chapter 2, this volume). In contrast,
pressured within-task planning does not allow any leeway in gaining extra time for
planning while speaking. Speakers have to engage in real time planning for the ongoing communicative task.
Ellis (2009) slightly revised this system of categorization and talks about three
types of planning: rehearsal, pre-task planning and within-task planning. Even so,
these two categorizations (2005 & 2009) are essentially the same, dwelling on taskexternal manipulations of the degree of preparedness for a task. Rehearsal and pre-task
(strategic) planning without doubt prepare learners prior to a task, but within-task
planning can also be viewed as being something that can be increased or decreased so
as to vary the readiness for performance in a series of consecutive segments of strategic planning, carried out ad hoc during a task.
If we adopt a broader perspective on this issue, the notion of planning as preparation or readiness in order to increase ones capacity to do a task should extend its horizons beyond these task-external means outlined in Ellis (2005, 2009). Prior knowledge
about, and hence familiarity with, the content of a task or the schemata of a task
the knowledge or preparedness a speaker brings to any task whether or not pre- or
within-task planning time is provided should also be incorporated into this broader
sense of planning. This study will provide evidence that familiarity with a certain topic
facilitates learner performance in a variety of ways similar to other types of planning,
albeit different in some other areas as well. Therefore topic familiarity is, one could
65
66
Bui Hiu Yuet Gavin
argue, a kind of task-internal readiness, or implicit planning, as contrasted with task

external readiness, or explicit planning. The following table displays this extension of
the construct of planning.
Table 1. A framework of task-readiness
Macro-dimension
Micro-dimension
Sample studies
This chapter
Task-internal
readiness
(implicit
planning)
Topic familiarity (prior

subject knowledge)
Schematic familiarity
(structural or
procedural knowledge)
Task familiarity (task
types)
Task repetition (content
repetition without
awareness of future
performance.)
Bei (2013)
Task-external
readiness
(explicit
planning)
Rehearsal (repetition
with awareness of future
performance)
Strategic planning
(pre-task preparation)
Within-task planning
(online preparation)
Task-readiness
Skehan & Foster

(1999)
Bygate (2001)
Bygate (2001)
Foster & Skehan

(1996)
Yuan & Ellis
(2003)
As shown in Table 1, task readiness consists of two macro-dimensions. Obviously what the second macro-dimension involves are the three common planning
types, those discussed in Ellis (2005 & 2009). The novel part here is the first macro-
dimension, task-internal readiness, which subsumes four different aspects.
The first kind of task-internal readiness is topic familiarity, which derives from
prior knowledge about a certain domain area, such as medical knowledge on a natural
virus by a medicine major, or the technical specialty about a computer virus by a computer major, as exemplified in the present study. The second kind concerns schematic
familiarity. An example can be found in Skehan and Foster (1999) in which a going
to a restaurant in the Mr. Bean video stood out as a fairly predictable story because
nearly everyone has a schema of come in order the dishes eat the meal pay the
bill leave the restaurant. Compared with the more predictable storyline in a restaurant, what happened when Mr. Bean played golf was hard to foresee due to the
lack of a relevant schema. Further examples of such schematic familiarity are Skehan
and Shun (this volume) and Wang and Skehan (this volume). The third type of taskinternal readiness is task familiarity which deals with whether there will be a practice
effect transferred from one task to another of the same type (but with a different topic).
Bygate (2001) is a sample task-type familiarity study.
Task readiness
The last case of task-internal readiness, task repetition, makes an intriguing comparison with the first type of task-external readiness, namely rehearsal, with the major
difference lying in whether one knows if s/he is going to do the task again. Task repetition involves no briefing about future performance, so the previous task constitutes an
implicit planning opportunity which brings potential topic familiarity and task familiarity to the next round of performance. In contrast, rehearsal as a task-external readiness offers a probability known to the learner that one can prepare by practising the
task prior to the actual performance. Rehearsal thus becomes explicit planning which
also characterizes the other two kinds of task-external readiness: strategic planning
and within-task planning.
The major difference between task-internal and task-external readiness is the
degree of naturalness, or rather the degree of ad hoc manipulation, of the task preparation. Task-internal readiness, especially topic familiarity and schematic familiarity,
could be thought of as a more inherent and natural type of readiness, albeit perhaps
not so much a conscious process. At the same time, task-external readiness has a more
artificial element in that learners have imposed upon them extra manipulations to a
task. A question then arises from this comparison: which has a stronger influence for
the improvement of task performance? The literature on task research has little to offer
in this regard, so we will turn next to other areas for relevant insights.
Evidence for the influence of topic familiarity exists mainly in studies of speech
comprehension. The effects of prior knowledge, or schemas, in Piagetian terms, provide
good explanations for speech comprehension from a top-down perspective. In reading
comprehension research, being familiar with a certain content area has been generally found to be facilitative (Barry & Lazarte 1995; Bgel & Buunk 1996; Chang 2006;
Chen& Donin 1997; Johnson 1982; Lee 1986; Shimoda 1993). More recently, Lee (2007)
and Leeser (2007) also discovered that familiar texts greatly contributed to the comprehension of reading materials with also better content recall among L2 English learners
and L2 Spanish learners respectively. In listening comprehension, the mechanism of
topic familiarity bears some resemblance to that in reading, but the time constraints of
listening in real time impose additional difficulty on listeners. The time allowed in listening for the construction process (Kintsch 1988, 1998) before an appropriate schema
can be activated is much shorter, so while L2 readers have the opportunity of going back
to the textual data when first-inferencing fails, L2 listeners might encounter trouble at
this stage, before any helpful schema is able to take effect. In general, schemata might
be more important in L2 listening than L2 reading in that unlike readers who might,
given less time pressure, be able to rely more on bottom-up linguistic cues for meaning
construction, listeners probably have no such resource and a schema is crucial for prediction and inference in a top-down manner. Not surprisingly, familiarity with content
knowledge in general aids learners in understanding audio input (Chiang & Dunkel
1992; Long 1990; Markham & Lathams 1987; Schmidt-Rinehart 1994).
67
68
Bui Hiu Yuet Gavin
Not much research on topic familiarity has been conducted to investigate speech
production, and the existing literature mostly concerns L1 speaking. Prior knowledge
about a certain subject was found to raise temporal fluency (Good & Butterworh 1980),
reduce repeats and restarts but increase fillers (Bortfeld et al. 2001), but there are also
studies like Merlo and Mansur (2004) reporting unaffected fluency with a more familiar topic. They instead found more propositions in the more familiar task, indicating
that the speech on a more familiar topic contains a higher density of information.
In addition, topic familiarity does not seem to help improve structural complexity
or coherence in narrative discourse (Banks 2004). These studies appear to show that
topic familiarity is more concerned with meaning expression (fluency and information load) but less with structural ability (complexity and coherence) in first language
speaking.
Research on the influence of topic familiarity in L2 oral production is scarcer.
Familiarity with a topic seemed to enhance performance regarding fluency (words per
error-free T-unit and words per minute) but not with accuracy (error rate per T-unit)
in a monologic task (Chang 1999, with only 6 participants though). Familiarity with
the structure of a story, that is, a clearer schema in going to a restaurant versus a less
predictable storyline in playing golf, led to greater fluency (Skehan & Foster 1999).
Having a schema of a familiar area (a University map) also promoted fluency, but the
unfamiliar task (with an unfamiliar street map) generated higher lexical complexity
(Robinson 2001). More familiar tasks in Bei (2011) induced more formal language
features in discourse with a higher density of nouns and noun-associated word classes
such as articles and adjectives.
A few research gaps can be identified in the literature. It appears firstly from a
macro perspective that past research has generated quite extensive coverage on the
effects of task-external readiness, and even the interplay between its three types of
planning, namely rehearsal, strategic planning, and online planning. In contrast,
task-internal readiness has been much less touched upon, let alone the relationship
between task-external and task-internal readiness. The present study would argue that
task-internal readiness stands out as an inherent characteristic of task and could render more natural assistance to learner performance than task-external readiness does.
Secondly, at a micro level, topic familiarity has been shown quite unequivocally to help
L2 participants with greater fluency, but its influence on other performance areas like
complexity and accuracy is much less researched. A deeper consideration of this might
alert us to the possibility of its impact on test fairness as well, which warrants closer
scrutiny. At the same time and with no less significance, the extension of planning
to task-readiness may provide implications of findings for the Processing Approach
(Skehan 1998) versus Cognition Hypothesis (Robinson 2001) debate, which could
help to shed light on tasks and task behaviour from a wider perspective.
Task readiness
In pursuit of these goals, the current study employs a combination of three factors,
namely topic familiarity (task-internal), strategic planning (task-external), and proficiency (learner characteristic) in a 222 split-plot design. The following section
details the implementation of the study.
Methods
Participants
The participants in this study were eighty university students aged between eighteen
and twenty-four in Hong Kong. They were selected from 102 volunteers based on their
proficiency levels (see 3.3 Proficiency criteria below) and their academic major (see
below). They were all native Cantonese speakers but with reasonable L2 English proficiency. They had received twelve to sixteen years education of English as their second
language at the time of the study. Among them, forty students were medicine majors
and another forty were computer science students (see 3.4 Study Design below for
the rationale).
Speaking tasks
Participants were given the following general scenario:
You are a specialist in the field giving a presentation to a group of university students
who are neither medicine nor computer majors but are interested in the topics.
Each participant was invited to talk about the following two descriptive topics.
Topic 1: Please describe in detail the general process of the infection of virus in a
human body, the possible consequences, and the general procedure for dealing with a
virus-infected person.
Topic 2: Please describe in detail the general process of the infection of virus in
a computer, the possible consequences, and the general procedure for dealing with a
virus-infected computer.
Proficiency criteria
The 80 participants were equally divided into a high and an intermediate group
according to their proficiency levels. The grouping criteria include a combination of
their previous Use of English (UE) examination results in their Hong Kong AdvancedLevel (HKALE) public exams and a pre-test (a C-test adapted from Dornyei & Katona
1992) administered immediately before the tasks. According to entrance requirements of the participants university, their UE results were approximately pitched
69
70
Bui Hiu Yuet Gavin
at Band 68 of the IELTS system. Therefore they were termed intermediate to low
advanced L2 learners.
Study design: Independent variables

There are two between-participant independent variables (strategic planning and proficiency) and one within-participant independent variable (topic familiarity) in this
study, with each being a two-level variable, as shown in Table 2 below. Specifically,
the 80 candidates were evenly divided into a ten-minute pre-task planning group and
a non-planning (control) group. Within each group, there were two subgroups, each
containing 20 high and 20 intermediate proficiency learners. These 20 candidates consisted of 10 computer majors and 10 medicine majors to counter-balance any topic
effect. That is, when there is a topic familiarity effect, we can be more confident that it
is not simply because one topic is easier than the other, since each cell performs exactly
the same topics. The order of familiar and unfamiliar tasks was also counterbalanced.
That is, half of the participants would present on the familiar topic first then the unfamiliar one, with the other half in the reverse order. Given the fact that the disciplines
were not regarded as an independent variable in this study, the sample size in each
cell therefore reached 20. Because every participant performed two tasks, the 80 candidates produced 160 data points in total. Based on Gardner (2001, p. 127153), the
present study constitutes a 2 2 2 split-plot factorial design.
Table 2. Number of participants in each group
Planning
(between-participant)
Planners
Non-planners
Proficiency
(between-participant)
Topic familiarity (within-participant)

Familiar
Unfamiliar
High
20
20
Intermediate
20
20
High
20
20
Intermediate
20
20
When the medicine majors talk about a natural virus, the topic is regarded as
familiar to them, while the computer virus becomes the unfamiliar. The opposite is
then true for the computer majors. After doing the two tasks, participants were asked
to rate their familiarity with the topics. A medicine major who indicated an equal or
higher degree of familiarity with the computer virus topic than their own natural virus
topic would be excluded from the study. The same familiarity screening procedure
applied to the computer majors as well.
Task readiness
Performance measures: Dependent variables

The effects of topic familiarity, planning, and proficiency on speech performance
were examined by measuring three conventional areas, namely fluency, complexity,
and accuracy to allow cross-study comparisons. Individual measures are described in
detail in Table 3 (and see additional description in Chapter 1, this volume).
Table 3. Dependent variables
Areas
Variable
Breakdown Speech rate

fluency
Repair
(dys)fluency
Description
Studies
A pruned speech rate operationalized as

Tavakoli &
total words per minute after deletion of
Skehan (2005)
filled pauses, reformulations, replacements,
false starts, and repetitions.
Mean length
of run
Number of words uttered before any

breakdown or repair.
Skehan &
Foster (2005)
Mid-clause
pause no.
Number of pauses in the middle of a

clause per one hundred words. A pause
operationalized as any break of 0.4 second
or longer.
Foster &
Skehan (1996)
Clause-end
pause no.
Number of pauses the end of a clause per

one hundred words.
Skehan &
Foster (2005)
Mid-clause
silence total
The total length of pauses in the middle

of a clause per one hundred words.
Bui (In review)
Clause-end
silence total
The total length of pauses at the end of a

clause per one hundred words.
Bui (In review)
Mid-clause
pause length
The average length of pauses in the middle Bui (In review)

of a clause.
Clause-end
pause length
The average length of pauses at the end of

a clause.
Bui (In review)
Reformulations:
Phrases or clauses repeated with some

modification to syntax, morphology, or
word order.
Foster &
Skehan (1999)
False starts:
Utterances abandoned before completion

with or without a reformulation followed.
Foster &
Skehan (1999)
Repetitions
Words, phrases or clauses repeated with

no modification to syntax, morphology, or
word order.
Foster &
Skehan (1999)
Replacements
Lexical items immediately substituted for

another.
Foster &
Skehan (1999)
(Continued)
71
72
Bui Hiu Yuet Gavin
Table 3. Dependent variables (Continued)

Areas
Variable
Description
Studies
Accuracy
Error-free Ratio
Ratio of error-free clauses to all clauses.
Foster &
Skehan (1996)
Errors per 100

Words
Number of errors per 100 pruned words.
Mehnert (1998)
70% Accuracy
Clause length1
Length of clause at which 70% of all clauses Skehan &

are correct. E.g., if 70% of all 5-word
Foster (2005)
clauses and only 60% of all 6-word clauses
are error-free, a score of 5 is awarded.
Complexity Clauses per AS

unit
Words Per AS
Unit
Ratio of subordinate clauses per AS unit.
Foster, Tonkyn, &

Wigglesworth (2000)
Average number of words in all AS units.
Ortega, L., Iwashita,

N., Norris, J., &
Rabie, S. (in prep)
Words Per Clause Average number of words in all clauses
Data analysis1
Speaking performance from each task was recorded and transcribed largely following
the CHILDES format before having it analyzed with the above measures. The results
will be presented in the next section as both descriptive (including means and standard deviations) and inferential statistics (including significance levels and effect sizes
in Cohens d). Following Thalheimer and Cook (2002), Cohens d was calculated to
indicate the size of an experimental effect. Other statistical results were obtained from
repeated measures analyses of variance (ANOVA) in SPSS 19, which is deemed appropriate in dealing with a split-plot design like the current study. Results are organised
in terms of the sets of dependent measures, fluency, accuracy and then complexity.
Results
Fluency
The first result concerns the length of each performance, as measured by the number of words, under various conditions. Participants produced longer accounts on
the more familiar topics (360.36 raw words and 300.84 pruned words as compared to
. Following Skehan and Foster (2005), for example, if 50% of all 5-word sentences but lower than
50% of all 6-word sentence are correct, then with 50% as the threshold, the accuracy score is 5 in
that L2 speech. This study calculated 50%, 60%, and 70% as the thresholds, but only the 70% value is
reported in this study because it was found that 70% appeared to be a better threshold in differentiating accuracy performance among learners of higher proficiency, such as those in the present study.
Task readiness
284.05 raw words and 229.61 pruned words with the unfamiliar topics, Cohens d = .53
for raw words, and Cohens d = .61 for pruned words, p = .000 for both). The opportunity to have planning time seems to be a less powerful means in pushing learners to say
more, as a significant effect is reached only with total pruned words (297.8 and 233.60
in the familiar and unfamiliar tasks, p = .007, Cohens d = .44), which is an indication
that participants reduced repair features such as hesitation, repetition, interjections
and fillers (e.g. err, hmm) after strategic planning. A comparison of effect sizes further
supports the argument that familiarity with a certain topic has a greater impact on the
number of words used than does planning. Proficiency, interestingly, does not have
any effect on the number of words.
Factor analyses in the task literature (e.g. Mehnert 1998; Skehan & Foster 1999;
Tavakoli & Skehan 2005) have generally confirmed two types of fluency: breakdown
fluency and repair fluency. Breakdown fluency concerns temporal aspects of speech
and is usually measured through speech rate and pausing. In contrast, repair fluency
is associated with modifying language. It is operationalized as false starts, reformulation, replacement, and repetition. Table 4 and Table 5 report on these two categories of
fluency variable respectively. The tables report on the effects of the three independent
variables, namely topic familiarity, strategic planning, and proficiency, on different
dependent variables such as the speech rate and phonation time in Table 4. In addition
to the means and standard deviations, the significance levels (p values) and the effect
sizes (Cohens d) are also given.
Table 4 shows that topic familiarity displays an overall effect on most (6 out
of 8) breakdown fluency measures. Being familiar with a certain subject matter
enables the participants to speak at a faster speech rate, with a longer stretch of
words before encountering any pauses, repairs or fillers (mean length of run).
Familiarity with a topic also helps to reduce the number as well as the average
length of pauses, and the total amount of silence in the middle of a clause. In addition, topic familiarity is able to shorten the total silence time between two clauses
(clause-end silence). However, the number and the length of pauses at the end of
clauses seem unaffected by topic familiarity. A notable point revealed in Table 4 is
the consistently small effect sizes in all measures contrasted with the wider range of
significance values. None of the effect sizes (Cohens d) reaches the medium level,2
which is an indication that while topic familiarity leads to higher fluency, the extent
of the influence is limited. It is intriguing that the largest effect sizes concern midclause difficulties familiarity seems particularly supportive in this respect.
. According to Cohen (1992), the effect size of Cohens d at .20 is small, .50 is medium, and .80
is large.
73
74
Bui Hiu Yuet Gavin
Table 4. The effects of topic familiarity, strategic planning and proficiency on breakdown fluency
Planning
Topic familiarity
Speech rate
Fam.
Unfam.
Pland
Unpd
High
Interm.
96.30 (23.02)
90.47 (26.33)
102.52 (22.45)
84.24 (23.51)
96.79 (25.26)
89.97 (23.90)
p = .000 d = .26
Mean length of run
5.26 (1.57)
4.99 (1.76)
p = .016 d = .17
Mid-clause pause No.
9.73 (6.20)
12.13 (7.33)
p = .000 d = .38
Clause-end pauses No.
6.53 (2.09)
6.82 (2.52)
p = .562 ns d = .13
Mid-clause silence total
8.51 (7.86)
12.35 (13.41)
p = .000 d = .38
Clause-end silence total
5.73 (2.91)
6.41 (4.04)
p = .026 d = .19
Mid-clause pause length
.79 (.31)
.87 (.35)
p = .019 d = .28
Clause-end pause length
Proficiency
1.71 (.61)
1.78 (.81)
p = .437 ns d = .10
p = .000 d = .58
5.48 (1.85)
4.77 (1.38)
p = .046 d = .32
8.61 (5.27)
13.24 (7.31)
p = .001 d = .54
6.32 (2.22)
7.05 (2.34)
p = .087 ns d = .32
6.46 (4.83)
14.39 (13.09)
p = .000 d = .61
5.10 (2.80)
7.08 (3.83)
p = .001 d = .59
.70 (.33)
.96 (.40)
p = .000 d = .71
1.53 (.42)
1.96 (.86)
p = .001 d = .64
p = .177 ns d = .28
5.28 (1.51)
4.98 (1.80)
p = .397 ns d = .02
10.17 (5.96)
11.69 (7.47)
p = .265 ns d = .22
6.15 (1.97)
7.22 (2.48)
p = .023 d = .48
9.35 (9.25)
11.51 (11.77)
p = .299 ns d = .20
5.63 (3.12)
6.54 (3.79)
p = .118 ns d = .26
.80 (.29)
.86 (.36)
p = .288 ns d = .18
1.77 (.68)
1.72 (.74)
p = .584 ns d = .07
Notes: 1. Standard deviation in (). 2. All pause numbers and silence measures are standardized by calculating their occurrence per 100 words.
3. d = Cohens d which is a measure of effect size.
Task readiness
The effects of planning are quite similar to those of topic familiarity, except that
planning achieves a significant impact on more measures (7 out 8) with a larger magnitude of the effects (i.e. bigger effect sizes). The opportunity to plan prior to speaking
raises the speech rate and the mean length of run. Planning reduces the number of
pauses in the middle of a clause, though not the number of pauses at the end of clauses.
There is also a reduction in the amount of silence and the average length of pauses in
the middle of a clause, the amount of silence, as well as the average pause length at the
end-of-clause positions.
Rather counter-intuitively, proficiency appears irrelevant to all but one measure,
the number of clause-end pauses. The intermediate proficiency participants produced
more pauses at the end of a clause than their high proficiency counterparts. This occurrence is intriguing in that the number of pauses at clause boundaries is one of few
measures that neither topic familiarity nor planning exerts any influence on, whereas
proficiency happens to fill this vacancy, with a medium Cohens d value (d = .48) indicating a considerable effect. But other than that, the effect of proficiency seems to have
been overridden by topic familiarity and strategic planning.
In addition to the above main effects, there are also familiarity-by-planning
interaction effects in four breakdown fluency measures (see Table 5 below), that is,
speech rate (p = .001), number of mid-clause pauses (p = .005), mid-clause silence total
(p=.004) and clause-end silence total (p = .026). These five interaction effects consistently point to a general trend that participants were able to reach a similar fluency
level after strategic planning, regardless of the familiarity level of the topics. That is, the
significant difference in breakdown fluency between familiar and unfamiliar topics is
reduced to almost non-existence when pre-task planning time is allowed. The results
suggest that, although planning helps to improve fluency performance in both familiar
and unfamiliar tasks, the unfamiliar tasks have benefited much more.
Table 5. Means of the five measures with interaction effects
Topic familiarity
Speech Rate
Planned
Unplanned
Familiar
Unfamiliar
Significance
103.47 (20.88)
101.58 (24.01)
p = .001
89.12 (23.06)
79.35 (23.96)
8.08 (5.16)
9.15 (5.38)
11.37 (6.76)
15.11 (7.86)
6.04 (4.86)
6.89 (4.81)
10.97 (9.44)
17.81 (16.73)
Mid-clause
pause number
Planned
Mid-clause
silence total
Planned
Clause-end
silence total
Planned
5.06 (2.40)
5.13 (3.19)
Unplanned
6.48 (3.24)
7.72 (4.42)
Unplanned
Unplanned
p = .005
p = .004
p = .026
Notes: 1. Standard deviation in ( ). 2. Dependent variables in italics. 3. N = 40 in each cell.
75
76
Bui Hiu Yuet Gavin
Regarding the different repair fluency measures, Table 6 shows that topic
familiarity helps to significantly reduce the number of repetitions, but only with
a small effect size (d = .31). Though not reaching significance, the means of the
other three variables are in the predicted direction. In comparison, ten-minutes
pre-task planning has a significant effect more generally, with fewer false starts,
reformulations, and repetitions, but it also induced more replacements. Similar
to the findings with breakdown fluency, the effect sizes produced by planning for
repair fluency range from medium to large, much bigger than those for familiarity.
As with the results for breakdown fluency, proficiency seems to exert little effect
on repair fluency.
Table 6. The effects of topic familiarity, strategic planning, and proficiency on repair fluency
Topic familiarity
Planning
Proficiency
Fam.
Unfam.
Pland
Unpd
High
Interm.
False starts
1.38 (1.28)
1.63 (1.35)
.85 (.80)
2.15 (1.40)
1.50 (1.29)
1.50 (1.35)
Reformulations
1.39 (1.00)
p = .125 ns d = .19
1.62 (1.25)
p = .088 ns d = .20
Replacements
.95 (.79)
1.15 (.97)
p = .077 ns d = .23
Repetitions
3.94 (2.69)
4.72 (3.36)
p = .001 d = .31
p = .000 d = 1.02
1.16 (.84)
1.84 (1.26)
p = .001 d = .53
1.26 (.90)
.84 (.81)
p = .008 d = .43
3.16 (2.03)
5.40 (3.44)
p = .000 d = .60
p = .999 ns d = 0
1.60 (1.28)
1.40 (.95)
p = .321 ns d = .18
1.08 (.85)
1.02 (.92)
p = .705 ns d = .07
4.29 (3.32)
4.27 (2.74)
p = .979 ns d = .01
Notes: 1. Standard deviation in (). 2. All number of the repairs are standardized by calculating their
occurrence per 100 words. 3. d = Cohens d which is a measure of effect size.
Accuracy
As shown in Table 7, topic familiarity appears to push participants to achieve a higher
ratio of error-free clauses, and thus fewer errors per 100 words, but only with small
effect sizes. Being familiar with a topic, however, does not help learners to produce
longer clauses where at least 70% of these clauses are correct (70% accuracy clause
length, p > .05). Strategic planning has even less of an impact here, showing no effect
on any of the measures. Proficiency, however, does show some significances. The
more advanced participants performed with longer 70% accuracy clauses, in addition
to having a significantly higher error-free ratio as well, and also a smaller number
of total errors per 100 words, when compared with their intermediate counterparts.
Proficiency is a strong driving force for accuracy as evidenced by the medium to large
effect sizes.
Task readiness
Table 7. The effects of topic familiarity, strategic planning, and proficiency on accuracy
Topic familiarity
Error-free
clause ratio
Unfam.
Pland
Unpd
High
Interm.
.544 (.13)
.517 (.14)
.537 (.14)
.524 (.13)
.586 (.13)
.475 (.11)
3.73 (2.27)
3.68 (2.17)
p = .850 ns d = .02
Errors per 100
words
Proficiency
Fam.
p = .020 d = .22
70% Accuracy
Clause length
Planning
6.86 (2.46)
7.71 (2.61)
p = .000 d = .38
p = .618 ns d = .10
3.79 (2.63)
p = .000 d = .69
3.61 (1.96)
4.4 (2.49)
p = .656 ns d = .08
6.92 (2.59)
7.64 (2.45)
3.0 (1.66)
p = .001 d = .57
6.16 (2.25)
p = .121 ns d = .29
8.41 (2.30)
p = .000 d = .77
Notes: 1. Standard deviation in (). 2. d = Cohens d which is a measure of effect size.
Complexity
Table 8 gives the findings for the three different complexity measures. Topic familiarity
seems irrelevant to any of the complexity measures, but two measures, namely clauses
per AS unit and words per AS unit, are significantly influenced by planning. In these
cases planners outperformed non-planners, with small and large effect sizes respectively.
Participants of higher proficiency also spoke with significantly longer AS units than
those of lower proficiency. Though only approaching significance (p = .067), the p value
in clauses per AS unit shows a similar trend in that the advanced learners are probably
able to produce a higher subordination ratio than the intermediate ones. In comparison
to the effect size for proficiency, planning appears to be a stronger variable in promoting
complexity. A bit unexpectedly, the newly developed measure of words per clause does
not seem to be sensitive to the influence of familiarity, planning, or proficiency.
Table 8. The effects of topic familiarity, strategic planning, and proficiency on complexity
Topic familiarity
Clauses per AS unit
Unfam.
Pland
Unpd
High
1.74 (.32)
1.73 (.35)
1.81 (.31)
1.67 (.35)
1.79 (.34)
p = .018 d = .39
Interm.
1.68 (.32)
p = .067 ns d = .33
12.93 (2.69) 12.43 (3.36) 13.96 (2.2) 11.39 (3.36) 13.49 (2.70) 11.85 (3.15)
p = .089 ns d = .16
Words per Clause
Proficiency
Fam.
p = .747 ns d = .03
Words per AS unit
Planning
7.11 (.77)
6.97 (.85)
p = .195 ns d = .17
p = .000 d = .81
7.13 (.62)
6.95 (.75)
p = .22 ns d = .26
Notes: 1. Standard deviation in ( ). 2. d = Cohens d which is a measure of effect size.
p = .002 d = .52
7.15 (.81)
6.92 (.82)
p = .112 ns d = .28
77
78
Bui Hiu Yuet Gavin
Two interaction effects are also found between proficiency and planning for the
measure of clauses per AS unit (p = .002) and words per AS unit (p = .026). Table 9
suggests two noteworthy points. Firstly, though the more advanced participants always
scored higher than the intermediate ones in complexity, the gap between them is significantly narrowed after planning. Secondly, while planning raises the length of AS
units for all, it appears to help participants of lower proficiency more than the higher.
Table 9. Interaction effects in clauses per AS unit and words per AS unit
Planning
Clauses per AS unit
Words per AS unit
Proficiency
High
Intermediate
Unplanned
1.82(.40)
1.52(.20)
Planned
1.77(.27)
1.85(.34)
Unplanned
12.78(3.18)
9.99(2.90)
Planned
14.21(1.93)
13.71(2.14)
Significance
p = .002
p = .026
Note: Standard deviation in ( ).
Discussion
The Results section has provided a detailed description of the three major performance areas (fluency, complexity, and accuracy). This section will further synthesize
the results in terms of topic familiarity, strategic planning and L2 proficiency so that
the effects of these three independent variables can be explored more directly.
Topic familiarity
A recapitulation of the results shows that topic familiarity enables learners to produce
longer speech with greater fluency with fewer breakdowns and slightly higher accuracy and repair fluency. What topic familiarity was not so effective with is syntactic
complexity. Several aspects derived from these results have theoretical significance.
First, topic familiarity seems to affect both the Conceptualization and the Formulation stages in Levelts (1989) speaking model. The Conceptualizer is responsible for
drawing information from memory and forming a pre-verbal message as input for the
Formulator. It seems to take less time to access more familiar information due to an
immediacy effect, since speakers are more primed in the relevant knowledge domain.
As a Conceptualizer effect, too, speakers have a more ready-made schematic structure at their disposal which could be accessed on a macro basis. The faster-accessible
message plus an existing framework into which the message can be structured helps
Task readiness
ease the workload at the Conceptualization stage. The longer account produced on the
familiar topics indicates that more familiar information can be retrieved from long
term memory in any given time period.
Topic familiarity also appears to exert an influence on Levelts (1989) Formulation
stage. The Formulator receives the pre-verbal message from the Conceptualiser, then
draws on lemmas and lexemes from the mental lexicon and assembles them into a
linguistic plan waiting to be articulated at the next stage. In this process, lexis can be
retrieved not only at higher speed, as evidenced by the fewer breakdowns (Table 4),
but also in a larger quantity (more total words and more varied words (Bui, in preparation). In addition, the greater mean length of run and fewer mid-clause pauses on
the familiar topics all suggest that the more familiar topics facilitate the use of bigger
chunks in which more lexical items are packed into an uninterrupted stream, which
is an indication that topic familiarity helps learners with lexicalized language. This
would not only explain the superior temporal aspects of speaking, but also the slightly,
but significantly higher accuracy results because if some expressions are memorized as
wholes, it reduces the computational workload and thus error probability. To sum up,
learners are able to more efficiently access the exemplar-based system with faster word
searches, and reduce the analytic computation in their rule-based system (Skehan
1998) with more efficient on-line assembly of utterances when they are in possession
of relevant prior knowledge.
An additional explanation that might not be as general as those discussed above
is also relevant here. The medium of instruction for all the participants in their major
courses in both academic groups is primarily English, and all the textbooks and lecture notes are in English. According to the encoding specificity principle (Tulving &
Thomson 1973), the language in which knowledge is stored in long term memory
will speed up access. Therefore, participants in the unfamiliar condition might have
to go through one more step at the Formulation stage, that of transforming their general knowledge about the unfamiliar topic from Chinese into English. This could then
hamper their performance in terms of fluency and lexis. Such an observation might
have some implications for content-based language teaching in that if a certain domain
knowledge is learnt in ones L2, it appears that future retrieval of the knowledge and
production in the L2 will be enhanced at least as far as fluency and lexis are concerned.
Another perspective on these findings is to make the point that the unplanned
condition effectively triggered more pressured communication there was only scope
for on-line planning (Ellis 2005). Their limited processing capacity (Skehan 1998) creates difficulties for L2 speakers whose target language system is not yet automatized
to do efficient parallel processing. Consequently, more attentional resources allocated
to the Conceptualization stage means there may be difficulties at the later Formulation and Articulation stages. Learners had to slow down their speech rate and pause
more often with a shorter average speaking time in order to cope with the unfamiliar
79
80
Bui Hiu Yuet Gavin
t opics. This result for fluency is largely consistent with some studies in L1 (e.g. Good&
Butterworth 1980; Bortfeld et al. 2001) and L2 (Chang 1999; Skehan & Foster 1999;
Robinson 2001) research. However, the discovery from the present research is that
pre-task planning is able to attenuate the difference between unfamiliar and familiar
topics in many of the fluency measures.
The second point to consider concerns form-meaning connections in relation
to topic familiarity. The primary concern in a speaking task is obviously to get the
message across. Meaning expression is more likely to be attended to than the other
aspects of speaking. However, it appears that the familiar topics also raise accuracy,
enabling meaning and form to be handled at the same time. In addition to the theory
of better access to the exemplar system and chunking, two more possibilities from a
processing perspective are worth noting. First of all, the attentional resources released
from the Conceptualization and the Formulation stages can help learners with selfmonitoring. With the more familiar topics, speakers may shift their attentional focus
partly from what to say to how to say and even how to say well, whereas they will
have to struggle with the content to express in the unfamiliar situations, which results
in greater working memory load for monitoring and correction (see also Bygate &
Samuda 2005). Secondly, on-line planning studies (e.g. Yuan & Ellis 2003) provide
evidence that unpressured within-task planning can contribute to more accurate performance. As a task-internal readiness construct, topic familiarity appears to achieve
a similar effect because it prepares learners not only prior to the task, but through
the whole process of speaking. This resemblance of on-line readiness to unpressured
on-line planning may partly explain the higher accuracy scores in the familiar tasks.
At the same time, the small effect sizes may also be justified simply because taskinternal readiness is still time-pressured when compared to the unpressured taskexternal on-line planning.
Strategic planning
Before going on to the discussion on strategic planning, a recapitulation of its general effects is helpful. Briefly, strategic planning greatly helps learners to improve their
fluency (with both breakdowns and repairs) and syntactic complexity, though it was
not so effective in raising accuracy. The results seem generally consistent with the
bulk of the literature in planning, but the comparison and contrast between strategic
planning and topic familiarity would add more insight into the story. We can start
with fluency. First of all, planning works in a very similar way to topic familiarity in
term of fluency, especially with breakdown fluency. Such a pattern is reflected in the
measures on which both planning and familiarity have similar effects. Secondly, planning should at the same time be distinguished from topic familiarity in terms of the
strength and the range of their influence on fluency performance. Strategic planning is
Task readiness
likely a more powerful pedagogical means that constitutes a potentially higher level of
task-readiness than familiarity as a task-internal readiness, although this also depends
on how thoroughly the planners are able to anticipate the detail of the task, how much
they can cover during planning, and also how much they can retain and recall during task performance. The effectiveness of such pre-task planning is supported by the
effect sizes that planning produces with fluency measures. To account for these two
observations, one could argue that the ten-minute planning time allows learners to
formulate a conceptual plan for the relevant message to convey (Ellis 2005; Mehnert
1998), which greatly reduces the need for online macro-structure planning. Instead,
L2 learners can allocate scarce attentional resources for the Formulator, thus speaking
with fewer pauses and at a faster rate. While speaking on a familiar topic without planning is still a pressured process, planned speech is much less so, which may explain
why planning is able to cut down on the frequency of repairs but topic familiarity is
much weaker in this regard.
What further distinguishes topic familiarity from planning is that planning pushes
learners to higher structural complexity. The lexicalized language or chunks that are
more readily and speedily accessible due to topic familiarity would likely involve less
complex syntactic processing partly due to being lexicalised, as well as because limited
working memory capacity does not allow overly long utterances to be processed and
passed on for long-term memory storage. Therefore, a reasonable assumption here
would be that the prefabricated expressions which are available in long term memory
are usually relatively short expressions. A comparison with topic familiarity shows that
strategic planning helps learners not only to access formulaic language (Foster 2001)3
and hence achieve higher fluency, but also assemble the pre-fabricated chunks into
longer psychological units of planning (AS units), as shown in higher scores in the
two complexity measures (words per AS unit and clauses per AS unit). In addition,
strategic planning encourages learners to stretch their speech content, which results
in their more adventurous attempt to produce more elaborated language. This result
is consistent with most studies, that planning drives learners to take risks to produce
more elaborated language. To some extent, this study, combined with Foster (2001),
helps to better explain why task-external readiness can, but task-internal readiness
cannot, promote greater complexity.
Rather disappointingly, strategic planning does not seem to affect the words per
clause measure of clause length even though Ortega, Iwashita, Norris, and Rabie (in
preparation) argue that it is a better measure for more advanced learners. Bei (2010)
. Foster (2001) found that, given planning time, native speakers tend to use less formulaic language and be more creative, whereas non-native speakers will use more formulaic language after
planning.
81
82
Bui Hiu Yuet Gavin
conducted two factor analyses (one for a familiar task and the other for an unfamiliar tasks) that included most of the available task performance measures, and both
confirmed that words per clause appears to be very closely connected to the F-score
(Heylighen & Dewaele 1999), an index of formality, and less closely but significantly
with lexical sophistication. The F-score measures the extent to which nouns and
noun-associated word classes such as articles and adjectives are employed in speech,
while lexical sophistication is a yardstick for the frequency of rare words use. Taken
together, Bei (2010) argued that words per AS unit, together with the F-score and
lexical sophistication, belongs to a new construct noun phrase complexity which
should be treated distinctively from the syntactic or lexical complexity indentified in
the literature. The relationship between strategic planning and noun phrase complexity warrants further studies.
What remains opaque is the relationship between planning and accuracy. The
previous literature has been unclear in this respect (Ellis 2009), and the present
study did not find a significant accuracy effect from planning (but see discussion in
Pang & Skehan (this volume) and Wang (this volume)). A thorny question emerges
naturally at this point: if as mentioned above, planning enables L2 learners to better access their lexicalized language (formulaic chunks) as topic familiarity does, why
can topic familiarity raise accuracy but planning cannot? Possibly the puzzle can be
disentangled with the following three arguments. Firstly, planning drives learners to
embark on more complex language and in the process more pre-fabricated expressions need to be assembled into an AS unit. The more syntactic work there is, the more
errors there might be (Crookes 1989), especially when strategic planning is largely
concept-oriented with little attention to grammar. Secondly, from a limited processing
capacity point of view (Skehan 1998), there is likely to be a trade-off between accuracy and complexity (Skehan & Foster 1997). Learners L2 systems are, by and large,
controlled but not automatized, and so attentional resources allocated to the overwhelming workload when complexity is prioritised mean a reduction of attentional
focus on accuracy. Thirdly, it is possible that pre-task planning cannot affect on-line
monitoring (Skehan 2009) as what learners bring to the task from strategic planning
would most focus on getting the message across, whereas topic familiarity as a form of
task-internal readiness prepares learners anytime they speak, acting as both pre-task
and on-line readiness, and reduces the on-line processing workload to enable more
within-task monitoring.
Proficiency
The previous task-based literature has not seen proficiency as an area of primary and systematic concern. The few exceptions (e.g. Kawauchi 2005; Ortega 2005; W
igglesworth
1997), however, suggest that task performance, as influenced by strategic planning,
Task readiness
differs according to learners proficiency levels. The present study re-examines the
effects of planning at different proficiency levels, whilst adding to it a new dimension
of planning: topic familiarity. As mentioned in Section 3.3, measured through HKALE
results and the IELTS system, the current participants are at relatively high proficiency
levels. In contrast, participants in past studies were mostly at lower proficiency levels.
The following discussion will take this caveat into consideration.
In terms of the main effects, proficiency shows consistently strong effects on all
accuracy measures and some effects on complexity (p = .000, d = .52 for words per AS
unit; p = .067 for the conventional clauses per AS unit), with performances of learners at the higher proficiency level being more accurate and more syntactically complex. More advanced learners were also able to reduce the number of pauses between
clauses. However, proficiency seems to be, at least in this context, largely irrelevant to
fluency (either breakdown fluency or repair fluency), and even noun phrase complexity (Bei 2011). An emerging pattern from these results is that proficiency tends to have
much greater influence on syntactic than semantic aspects of performance.
Learners of higher proficiency consistently made fewer errors in performance
than their lower proficiency counterparts did, regardless of familiarity or planning
time. Furthermore, the 70% accuracy clause length measure indicates that the lower
error rate obtained by higher proficiency students was not achieved by the avoidance
strategy with which one might make fewer errors by resorting to shorter and simpler
utterances. Higher proficiency participants in fact spoke with longer error-free clauses
than the lower proficiency participants did. All this suggests that accuracy in performance is basically a by-product of ones underlying linguistic competence. The lack of
interaction effects between proficiency and the other two independent variables (planning and familiarity) further supports this claim, and might partly explain why accuracy in performance was less sensitive to task manipulations like strategic planning.
It could be argued that better performance in accuracy originates from two sources: a
well-developed linguistic system and a good ability to monitor speaking (see Li (this
volume)). A more advanced linguistic system plays a main role with error-free utterances and it almost becomes a clich to say that the actual performance is a reflection
of implicit competence. A more fully-fledged underlying system is usually a more
automatized one, which frees up more attentional resources for monitoring errors.
All this contributes to the significantly and consistently better accuracy performance
among the higher level learners in all three accuracy measures. The medium to large
effect sizes (Cohens d values ranging from .57 to .77) suggest that the difference in
accuracy between the two proficiency levels is substantial.
Only one out of the three complexity measures, namely words per AS unit, was
significantly affected by proficiency. However, the effects of proficiency nearly reached
significance in the conventional clauses per AS unit measure (p = .067). These results
suggest that proficiency does show its influence on syntactic complexity, though its
83
84
Bui Hiu Yuet Gavin
effects are not as big as those for accuracy. Compared to strategic planning, proficiency
is much less a driving force for higher complexity; compared to topic familiarity, proficiency is a much more important indicator for higher accuracy. Therefore, we might
postulate that L2 learners tend to opt for a conservative stance in speaking and try to
avoid mistakes. Planning time encourages them to be more willing to task risks and
use more elaborated language. Higher proficiency itself can liberate L2 learners from
their timidity only to a limited extent.
Regarding fluency, proficiency only has an effect on the number of end-of-clause
pauses (p = .027). Higher proficiency learners do not pause as frequently as their intermediate counterparts do between clauses, but there is no difference between the two
proficiency levels in terms of the frequency of mid-clause pauses. Mid-clause pauses
have been shown to be a trait of L2 speaking (Skehan 2009), so both high and intermediate proficiency learners in this study remained by and large L2 speakers whose
oral performance was not very native-like, as far as fluency is concerned. However, the
higher proficiency level did appear to reduce the hesitations between clauses. This was
probably because a more automatized linguistic system can assemble information in a
more coherent manner, making it less likely that the utterances will be fragmented or
loosely connected to each other.
When it comes to fluency and nouny language use (or rather, noun phrase complexity), however, higher proficiency seems of no great relevance in most cases. Past
research has shown that fluency and complexity were more easily affected by taskexternal influences (e.g. planning and task repetition), but fluency and complexity
were the two places in this study that proficiency had no effect or only a weak effect on.
Taken together, the possibility emerges that learner proficiency and task conditions
could stand in competition. That is, if a certain performance area, such as accuracy, is
merely a reflection of learners underlying competence, it is more likely to be resistant
to task conditions or task characteristics, such as planning. On the other hand, areas
less closely connected to proficiency (e.g. fluency in this study) are more prone to task
manipulations. There also appears to be a trade-off between task conditions and proficiency levels. Such a claim needs to be verified in future studies as it might suggest
limits as to how far task manipulations can go from short-lived performance enhancement to genuine competence improvement in an L2. If the much researched area of
task-external readiness has little impact on L2 proficiency, it would then be time for us
to turn to new areas. A few studies (Skehan & Foster 1997; Li, Chapter 5, this volume)
have begun to show promise in making significant influence on accuracy performance
by employing post-task activities. There is room for more research in how different
effects of task manipulation could be integrated.
Two interaction effects in the literature between proficiency and planning are
noteworthy, both concerning complexity. Wigglesworth (1997) found that the opportunity to plan allowed learners of higher proficiency, but not those at the lower level,
Task readiness
to produce more complex language. Similarly, Kawauchis (2005) high proficiency participants benefited most in the case of complexity (and fluency), with the lower proficiency participants gaining less (but they gained the most in accuracy, with the most
advanced learners benefiting the least). On the contrary (at first sight), a general pattern from the interactions between planning and proficiency in both clauses per AS
unit and words per AS unit in this study is that the intermediate learners were much
better than their high proficiency counterparts in making the most out of planning
time to achieve higher complexity. For the AS length measure, the difference between
high and intermediate participants was narrowed to virtual non-existence after planning. More significantly, in terms of the conventional clauses per AS unit measure,
the intermediate proficiency planners even slightly surpassed the high proficiency
planners, though the high proficiency non-planners were much better than the intermediate proficiency non-planners. As for accuracy, though Kawauchi (2005) found
that learners at a lower level gained the most in accuracy after planning, Wigglesworth
(1997) and Ortega (1999) claimed that planning helped learners at an advanced level
to achieve better accuracy in performance. The evidence in the present study does not
support either side in this disagreement. No matter whether given planning time or
not, the higher proficiency learners were always better than the intermediate ones (c.f.
the main effects of proficiency above).
Some inconsistency between the present study and the literature in terms of the
effects of planning on complexity and accuracy on different proficiency levels may
probably be attributed to the operationalization of the independent variable proficiency per se. As mentioned in Section 3.3, and the beginning of this Section (5.3),
the intermediate participants in this study were already quite proficient speakers
of English given the high entry requirement of their university. If the participants
of intermediate proficiency in this study are at a level similar to the high participants in Kawauchi (2005) and Wiggleswoth (1997) (and if the high here is equal to
the advanced in Kawauchi), then, instead of contradicting, this study could in fact
support Kawauchis results for complexity. That said, such a claim remains a speculation before a commonly acceptable way of equating different proficiency measures is
available.
The magnitude of task-internal and task-external readiness effects

Table 10 below sums up the effect sizes produced by topic familiarity and strategic planning. The reason for choosing effect sizes is obvious: effect sizes highlight the magnitude of the effects of the independent variables. Also, its existence per se indicates that
there is a significant effect. Bearing this in mind, we can conclude that familiarity and
planning displayed very similar patterns with the measures of fluency: total words,
breakdown fluency, repair fluency; but they differed in the formal or organizational
85
86
Bui Hiu Yuet Gavin
aspects of language (accuracy and complexity). Another interesting feature is that

topic familiarity showed small effects in most cases, whereas planning generally produced much higher values of Cohens d. Take breakdown fluency as an example. Effect
sizes for planning were almost always nearly twice as high as those for topic familiarity. Furthermore, in the formal features of L2 speaking, the effect sizes in complexity
produced by planning were also much higher than those in accuracy produced by
topic familiarity. Therefore, an answer to the question as to which is a stronger variable
seems to be emerging: planning, or task-external manipulation, appears to be a more
powerful influence on task performance than topic familiarity, a task-internal variable.
Table 10. Effect sizes produced by topic familiarity and strategic planning
Topic familiarity
Strategic planning
Pruned total words
.61
.44
Breakdown fluency
.17 .38
.32 .71
Repair fluency
.31 .40
.43 1.02
Complexity
.03 .17 (ns)
.39 .81
Accuracy
.22 .38
.08 .29 (ns)
Some recapitulation on the general framework of task-readiness (Table 1) would

probably help to explain this pattern. Task-internal readiness (including topic familiarity, schematic familiarity, task familiarity, and task repetition) is a form of implicit or
unconscious preparation that a learner brings to a task which will function both before
and during the actual performance. An important characteristic of task-internal readiness is that learners are not necessarily aware of the advantage that they enjoy. In contrast, task-external readiness (i.e. rehearsal, strategic planning, and online planning)
provides an explicit and clear push to help learners to be prepared for the subsequent
tasks. It would therefore be fair to say that task-external readiness constitutes more
of a preparation and thus becomes more powerful in areas that it has influence on
than most types of task-internal readiness. That said, one exception in task-internal
readiness, task repetition, is worth noting. Though it has not been tested in this study,
Bygate (2001) and Wang (Chapter 2, this volume) both reported very strong effects of
repeating a task. Different from the task-external rehearsal, people re-do the same task
a second or third time, but without the earlier encounters being explicitly signalled
as preparations for a later performance. It is therefore categorized as a task-internal
readiness (and implicit planning) (Bygate, personal communication, May 2013). On
this basis, task-internal readiness, in the form of task-repetition, can also have facilitative effects on fluency, accuracy and complexity comparable to task-external readiness
(see Wang, Chapter 2, this volume).
Task readiness
Task-internal readiness in form-meaning connection

Though strategic planning has a greater effect on fluency and complexity than does
topic familiarity, topic familiarity nevertheless was more able to influence learners
with accuracy, suggesting that task-internal readiness functions in different areas
from task-external readiness. Topic familiarity seemed to enable participants, to some
extent, to strike a balance between meaning and form, which might signal an integration of their linguistic knowledge into genuine performance. Bygate and Samuda
(2005) pointed out that a common learning and teaching problem is to get learners
to integrate knowledge that is available to them into their active language use (p. 37).
In this sense, providing learners with familiar topics to practise may better encourage
them to achieve this pedagogical end.
Strategic planning promotes pre-task readiness while on-line planning results in
real time preparation. Though much less powerful in comparison to each of these
two task-external means taken separately, task-internal readiness appears to consist
of the features of pre- and during-task readiness as it is inherent within each learner
and could take effects both prior to and during a task. As discussed earlier, the on-line
readiness nature of topic familiarity may probably contribute to the more accurate
performance. Therefore the integration of linguistic knowledge into communicative
use could be an important area in exploring task-internal readiness in future.
Compensation effects in fluency between task-internal

and task-external readiness
Data from the present study do reveal compensation effects in breakdown fluency, but
the conclusion is that planning could compensate for the unfamiliar topics much better than familiar topics could do for the unplanned conditions. With five breakdown
measures, planners reached almost the same fluency level with both familiar and unfamiliar topics. The adverse condition in fluency induced by their lack of domain knowledge was clearly removed when planning time was provided. However, the significant
difference between planners and non-planners continued to exist even when familiar
topics were involved. This result echoes the above discussion that task-external manipulation is a stronger driving force for many areas, especially fluency.
Implications
The pedagogical implications regarding task-external readiness (e.g. strategic planning)
have been researched in many studies (see Ellis 2005, 2009, for a detailed discussion),
but the benefit of using task-internal readiness has rarely been touched upon in the literature. Evidence from this study, however, showed that task-internal readiness should
not be ignored in language education, and this for a number of different reasons.
87
88
Bui Hiu Yuet Gavin
First of all, as noted above, previous research has shown that receptive language
use, namely reading comprehension (Shimoda 1993; Chang 2006; Lee 2007; Leeser
2007; Barry & Lazarte 1995; Bgel & Buunk 1996; Chen & Donin 1997; Johnson
1982; Lee 1986) and listening comprehension (Markham & Latham 1987; Long 1990;
Chiang & Dunkel 1992; Schmidt-Rinehart 1994; Leeser 2004), are greatly influenced
by background knowledge. The present study further provides evidence for the effects
of familiarity in L2 speech production, as productive language use. Familiarity may
therefore become an inevitable issue in test fairness. It is highly likely that one performs well not because s/he is in fact more proficient but simply because s/he is more
familiar with the topic. Matches and mismatches between test content and learner
background have to be taken into serious consideration in either language comprehension or production tests.
Second, one of the important issues in task-based language instruction is to
encourage learners to participate actively in various task activities. This study shows
that providing learners with more familiar topics will reduce learner anxiety and elevate
their willingness to communicate, as evidenced by the significantly longer accounts
they give with familiar topics. On the one hand, longer performance produced by an
L2 learner is an indication of his/her willingness and readiness to communicate. On
the other hand, this certainly helps to enhance learner confidence, which may work
especially for low to intermediate learners.
Third, strategic planning was shown to help learners produce more fluent and
more complex language. Accordingly, it would appear to be a good idea to allow
learners some time prior to any actual performance. Planning encourages learners to
embark on more elaborated language, attempting more complex structures through
which they could experiment with newly acquired linguistic knowledge. Planning also
serves to narrow the gap between high and low proficiency, and between familiar and
unfamiliar tasks, in terms of fluency and complexity. In classrooms, then, teachers may
take advantage of planning when learners are facing adverse situations (such as low
proficiency and unfamiliar topics).
Fourth, the results suggest that, learners should be provided with familiar topics in tasks if accuracy is the primary concern. Given the way familiarity seems to
function as mini-online-planning, it appears to help learners by providing them more
resources to attend to form. As mentioned above, this may increase their confidence
and reduce feelings of frustration.
Fifth, this study may also have implications for task sequencing. We have seen the
separate benefits for pedagogy from each individual variable, but it is far more important to examine how these different influences are organized to form a coherent and
organic whole. It is certainly too early to make any claims on the whole picture based
on the three variables in this study alone. Nonetheless, this study indicates that at the
pre-task stage planning is a useful tool, whilst at the during-task stages familiarity may
Task readiness
help. Then, beginners should receive the most familiar topics and planning time in
order that they could be fully supported in tasks. As their language ability develops, at
some points and to some extent it may be possible for either familiarity or planning to
be reduced so that they would face greater (but appropriate) challenges and be motivated to proceed further.
Last but not least, the present study supports content-based instruction (Mohan
1986) in language teaching. Topic familiarity proved to be a positive influence on fluency and accuracy, with indications that it helped to push learners to a more integrative approach to language learning. Compared to pure or intensive language teaching,
language seems more effectively taught when the domain knowledge (not linguistic
knowledge) is imparted to learners in their L2, leading to a genuine need to solve real
world problems, and that domain knowledge then serves as a continual reference point
for the growing language curriculum. In a language classroom where general knowledge is not the focus, language can still be taught using tasks involving connections to
real life so that tasks become the medium between classroom and the real world.
Conclusion
This chapter proposes a theoretical framework of task-readiness as an alternative conceptual model to various types of conventional planning. It is argued with evidence in
this study that while allowing extra time as explicit planning opportunity prior to or
during a task yields facilitative effects in doing L2 tasks, the inherent preparedness that
learners bring to the task (in the form of familiarity) helps to improve learner performance in a comparable manner. Exploring the individual effects of topic familiarity
(task-internal readiness), strategic planning (task-external readiness) and proficiency
levels (individual difference), and their interaction effects, this chapter has the following findings:
1. The concept of planning is better regarded as a component of task-readiness which
involves two macro dimensions: task-internal readiness and task external readiness, each with their micro dimensions. Task-internal readiness further subsumes
topic familiarity, schematic familiarity, task type familiarity, and task repetition,
while task-external readiness includes rehearsal, strategic planning, and online
planning. This proposal for a general framework of task-readiness can potentially
serve as a theoretical platform to unify and synthesize research in various types of
planning, familiarity, and even other kinds of preparatory activities for a task.
2. Both planning and topic familiarity raise fluency, indicating that participants with
task-readiness prioritize meaning expression. When planning or topic familiarity is present, proficiency appears to be largely overridden in its effect on fluency.
89
90
Bui Hiu Yuet Gavin
However, planning produces bigger effect sizes than topic familiarity with fluency.
Planning is also able to greatly reduce the gap between familiar and unfamiliar
topics in fluency. This leads us to the conclusion that task-external readiness is
in general more powerful than task-internal readiness in improving meaning-
oriented performance.
3. Planning raises syntactic complexity, while topic familiarity increased accuracy. It
would then appear that task-internal readiness encourages learners to a conservative stance (thus higher accuracy), but task-external readiness pushes learners to
task risks (hence higher complexity). Interestingly, higher proficiency produces
much higher accuracy and moderately higher complexity, confirming a close relation between syntactic performance and linguistic competence.
4. With the above points taken together, an intriguing pattern emerges task
influence and proficiency influence do not always complement each other. The
proficiency-oriented variables (e.g. accuracy) are affected more by proficiency
levels and less by task manipulations, whereas task-oriented variables (e.g. fluency) function just on the opposite. There are also intermediate variables, such as
complexity.
This study was conducted in the context of TBLT research and established very close
connections to prior studies, thus enabling cross-study comparisons. It is then the
hoped that this research will be a link between the literatures on planning and the
future studies on the extended concept of planning, that is task-readiness, to explore
task-based language learning from an even wider perspective.
References
Baddeley, A. (1997). Human memory: Theory and practice. New York, NY: Psychology Press.
Barry, S., & Lazarte, A. (1995). Embedded clause effects on recall: Does high prior knowledge of content domain overcome syntactic complexity in students of Spanish? Modern Language Journal,
79, 491504.
Banks, J. (2004). The impact of event familiarity on the complexity and coherence of children narratives of positive events. Unpublished MSc thesis. North Carolina State University.
Bei, X.G. (2010). Exploring task-internal and task-external readiness: The effects of topic familiarity
and strategic planning in topic-based task performance at different proficiency levels. Unpublished
Ph.D. thesis. The Chinese University of Hong Kong.
Bei, X.G. (2011). Formality in second language discourse: Measurement and performance. Interdisciplinary Humanities, 28(1), 2231.
Bei, X.G. (2013). Effects of immediate repetition in L2 speaking task: A focused study. English Language Teaching, 6(1), 1119.
Bui, H.Y.G. (In review). L2 fluency as influenced by content familiarity and planning performance
and methodology. Submitted to Language Teaching Research.
Task readiness
Bui, H.Y.G. (In preparation). Lexical diversity, lexical sophistication and lexical density in L2 speaking tasks. Unpublished manuscript.
Bortfeld, H., Leon, S.D., Bloom, J.E., Schober, M.F., & Brennan, S.E. (2001). Disfluency rates in
conversation: Effects of age, relationship, topic, role, and gender. Language and Speech, 44(2),
123147.
Bgel, K., & Buunk, B. (1996). Sex differences in foreign language text comprehension: The role of
interests and prior knowledge. Modern Language Journal, 80, 1531.
Bygate, M. (1996). Effects of task repetition: Appraising the developing language of learners. In
J.Willis & D. Willis (Eds.), Challenge and change in language teaching (pp. 136146). Oxford:
Heinemann.
M.Bygate, P. Skehan, & M. Swain (Eds). Researching pedagogical tasks: Second language learning, teaching and testing (pp.2348). Harlow: Longman.
Bygate, M., & Samuda, V. (2005). Integrative planning through the use of task repetition. In R.Ellis
(ed.), Planning and task performance in a second language (pp. 3774). Amsterdam: John
Benjamins.
Chang, C. (2006). Effects of topic familiarity and linguistic difficulty on the reading strategies and
mental representations of non-native readers of Chinese. Journal of Language and Learning, 4,
172198.
Chang, Y.F. (1999). Discourse topics and interlanguage variation. In P. Robinson (Ed.), Representation and process: Proceedings of the 3rd Pacific Second Language Research Forum (Vol. 1,
pp.235241). Tokyo: PacSLRF.
Chen, Q., & Donin, J. (1997). Discourse processing of first and second language biology texts:
Effects of language proficiency and domain-specific knowledge. Modern Language Journal, 81,
209227.
Chiang, C.S., & Dunkel, P. (1992). The effect of speech modification, prior knowledge, and listening
proficiency on EFL lecture learning. TESOL Quarterly, 26, 345373.
Cohen, J. (1992). A power primer. Psychological Bulletin, 112, 155159.
11, 367383.
Dornyei, Z., & Katona, L. (1992). Validation of the C-test amongst Hungarian EFL learners. Language Testing, 9, 187206.
Ellis, R. (2009). The differential effects of three types of task planning on the fluency, complexity, and
accuracy in L2 Oral production. Applied Linguistics, 30(4), 474509.
Ellis, R., & Yuan, F.Y. (2005). The effects of careful within-task planning on oral and written task performance. In R. Ellis (Ed.), Planning and task performance in a second language (pp. 167192).
Amsterdam: John Benjamins.
Foster, P. (2001). Rules and routines: a consideration of their role in task-based language production
of native and non-native speakers. In M. Bygate, P. Skehan, & M. Swain (Eds.), Researching
pedagogic tasks: Teaching, learning and testing (pp. 7597). Longman, London.
Foster, P., & Skehan, P. (1996). The influence of planning and task type on second language performance. Studies in Second Language Acquisition, 18, 299323.
Applied Linguistics, 21(3), 354- 375.
91
92
Bui Hiu Yuet Gavin

Gardner, R.C. (2001). Psychological statistics using SPSS for Windows. Upper Saddle River, NJ:
Prentice-Hall.
Good, D.A., & Butterworth, B. (1980). Hesitancy as a conversational resource: some methodological implications. In H. Dechert & M. Raupach (Eds.), Temporal variables in speech production
(pp.145152). The Hague: Mouton.
Heylighen, F., & Dewaele, J. (1999). Formality of language: Definition, measurement and behavioral
determinants. Internal report, Center Leo Apostel, Free University of Brussels.
Johnson, P. (1982). Effects on reading comprehension of language complexity and cultural background of text. TESOL Quarterly, 16, 169181.
Kawauchi, C. (2005). The effects of strategic planning on the oral narratives of learners with low
and high intermediate proficiency. In R. Ellis. (Ed.), Planning and task performance in a second
language (pp. 143164). Amsterdam: John Benjamins.
Kintsch, W. (1988). The role of knowledge of discourse comprehension: A construction-integration
model. Psychological Review, 92, 163182.
Kintsch, W. (1998). Comprehension: A paradigm for cognition. Cambridge: CUP.
Lee, J.F. (1986). Background knowledge and L2 reading. Modern Language Journal, 70, 350354.
Lee, S.K. (2007). Effects of textual enhancement and topic familiarity on Korean EFL students reading comprehension and learning of passive form. Language learning, 57, 87118.
Leeser, M. J. (2004). The effects of topic familiarity, mode, and pausing on second language learnres
comprehension and focus on form. Studies in Second Language Acquisition, 26, 587615.
Leeser, M.J. (2007). Learner-based factors in L2 reading comprehension and processing grammatical
form: Topic familiarity and working memory. Language Learning, 57(2), 229270.
Levelt, W.J.M. (1989). Speaking: From intention to articulation. Cambridge MA: The MIT Press.
Long, D.R. (1990). What you dont know cant help you: An exploratory study of background knowledge and second language listening comprehension. Studies in Second Language Acquisition,
12, 6580.
Markham, P., & Latham, M. (1987). The influence of religious-specific background knowledge on the
listening comprehension of adult second-language students. Language Learning, 37, 157170.
Merlo, S., & Mansur, L.L. (2004). Descriptive discourse: Topic familiarity and disfluencies. Journal of
Communication Disorders, 37, 489503.
Mohan, B.A. (1986). Language and Content. Cambridge, MA: Addison-Wesley.
Ortega L. (1999). Planning and focus on form in L2 oral performance. Studies in Second Language
John Benjamins.
Ortega, L., Iwashita, N., Norris, J., & Rabie, S. (in preparation). A multi-language comparison of syntacticcomplexity measures and their relationships to foreign language proficiency. Manuscript
in preparation.
Robinson, P. (2001). Task complexity, task difficulty and task production: exploring interactions in a
componential framework. Applied Linguistics, 22, 2757.
Samuda, V. (2001). Guiding relationships between form and meaning during task performance: The
role of the teacher. In M. Bygate, P. Skehan, and M. Swain (Eds), Researching pedagogic tasks:
Second language learning, teaching and testing (pp. 119140). London: Longman.
Task readiness
Schmidt-Rinehart, B.C. (1994). The effects of topic familiarity on second language listening comprehension. The Modern Language Journal, 78, 179189.
Shimoda, T.A. (1993). The effects of interesting examples and topic familiarity on text comprehension, attention, and reading speed. Journal of Experimental Education, 61, 93103.
Skehan, P. (2009). Lexical performance by native and non-native speakers on language learning tasks.
In B. Richards, H.M. Daller, D. Malvern, P. Meara, J. Milton, J. Treffers-Daller (Eds.), Vocabulary studies in first and second language acquisition: the interface between theory and application
(pp.107124). Basingstoke: Palgrave Macmillan.
language performance. Language Teaching Research,1(3),185211.
Skehan, P., & Foster, P. (1999). The influence of task structure and processing conditions in narrative
retellings. Language Learning, 49(1), 93120.
Skehan, P., & Foster, P. (2005). Strategic and online planning: the influence of surprise information
and task time on second language performance. In R. Ellis. (Ed.), Planning and task performance
R.Ellis. (Ed), Planning and task performance in a second language (pp. 239273). Amsterdam:
John Benjamins.
Thalheimer, W., & Cook, S. (2002). How to calculate effect sizes from published research articles: A simplified methodology. Retrieved April 18, 2009 from http://work-learning.com/effect_sizes.htm.
Tulving, E., & Thomson, D.M. (1973). Encoding specificity and retrieval processes in episodic memory. Psychological review, 80(5), 352373.
Yuan, F., & Ellis, R. (2003). The effects of pre-task planning and on-Line planning on fluency, complexity and accuracy in L2 monologic oral production. Applied Linguistics, 24(1), 127.
Wang, Z. (2009). Modeling L2 Speech Production and Performance: Evidence from Five Types of Planning and Two Task Structures. Unpublished Ph.D. thesis. The Chinese University of Hong Kong.
Wigglesworth, G. (1997). An investigation of planning time and proficiency level on oral test discourse. Language Testing, 14(1), 85106.
93
chapter 4
Self-reported planning behaviour and second

language performance in narrative retelling
Macao Polytechnic Institute / St. Marys University, Twickenham
The second language planning literature has been mainly quantitative in nature,
with very few qualitative investigations of planning (but see Ortega 2005). This
chapter tries to redress that imbalance and reports on a study of what second
language learners say they do when they plan. Participants were from a university
in Macao, and completed a narrative task, followed by retrospective interviews. The
interview data was coded, and a coding scheme emerged from this work which had
some affinity to the Levelt (1989) model of speaking. As a result, this may be of use
in other contexts. In addition, relationships between the self-reported planning
behaviours and actual performance on the task were explored. This suggested
some generalizations as to what planning behaviours are associated with higher
performance, and, interestingly, which are associated with lower performance. The
former tend to implicate the Conceptualiser stage of speech production and are
specific and limited in range, whereas the latter are frequently concerned with
over-ambition during the planning stage, a concern for form, and participants
attempting to do too much.
Introduction
Early publications on planning
What we think of as contemporary second language planning research started with
two publications (Ellis 1987 and Crookes 1989), which demonstrated the importance
of the opportunity to plan before doing a second language task, and which disagreed in
ways that are still interesting and influential. In the quarter century since Ellis publication, the field of second language planning research has grown enormously, and has
produced many research articles. A great deal has been found out regarding the effects
of different planning conditions (Mehnert 1998; Skehan & Foster 1997); the impact
of planning with different proficiency levels (Wigglesworth 1997; Tavakoli & Skehan
2005); the relevance of different task factors (Michel et al. 2007; Foster & Tavakoli2008);
96
the time perspective under which tasks are completed (Robinson 2011); the time when
planning occurs, with a contrast between pre-task or strategic planning compared to
on-line planning which takes place when time is available while speaking is taking place
(Ellis 2005b) (see other chapters in this volume). As a result, we are now in a much better position to predict what the impact will be on performance of manipulating different task features and different task conditions. Indeed, an index of the success of these
developments is that we have competing accounts of what produces the different effects
which have been reported. Whereas Skehan (2009a) argues that limitations in attention frequently lead to trade-offs between different areas, and that careful task choice
and use can mitigate the impact of these tradeoffs, Robinson (2001, 2011), through the
Cognition Hypothesis, analyses attention differently, as less limited, and proposes that
task difficulty can push for different aspects of performance (complexity, accuracy) to
be simultaneously raised.
The nature of performance itself has been the focus for much research, r eflecting
a wider interest in second language performance dimensions. Following Crookes
(1989), task researchers have usually focused on the dimensions of complexity, accuracy, and fluency. More recently measures of lexis have also started to become more
common in task research (Skehan 2009b). Essentially, these four aspects of performance are regarded as distinct, so that one or two might be raised, while the others are
not. This connects with one of the most interesting debates on the effects of planning:
whether it impacts on all areas of performance, or only on complexity and fluency.
Some researchers, including Crookes (1989), and Ortega (2005), argue that planning
raises these two areas only, whereas others, notably Foster and Skehan (1996), argue
that planning can also have an impact on accuracy.
Some experimental studies have tried to address some of the disagreements, (see
for example Wang (Chapter 2, this volume) and Bui (Chapter 3, this volume), as do
studies by Michel et al. (2007) and Revesz (2009). Their hope has been to design studies in such a way as to tease out differences and predictions from theoretical positions
which will resolve disagreements. Such research has been useful, but so far, not definitive. In reflecting on this it is important to point out that the vast majority of studies on planning have been quantitative in nature. Hypotheses have been framed, data
collected, coded, and scored, and quantitative techniques have been used to explore
the match or mismatch between hypotheses and results. Next, interpretations of the
experimental conditions and their effects in relation to the results have been put forward, hence the claims about Cognition, Tradeoff, accuracy versus complexity effects,
and so on. What has been strikingly absent is any tradition of qualitative research
used to gain insights into what happens during the period when planning is undertaken. Researchers seem to have been keener to try to infer the mental processes of
participants from quantitative patterns, rather than to discover what the participants
themselves have to say.
Self-reported planning behaviour and second language performance in narrative retelling
The major exception to this comes from two studies by Ortega (1995, 1999), and
the broader account she published of this work (Ortega 2005), where she herself wonders why there had been so little qualitative research in this area. Several years later, it
is even more striking that this state of affairs has not really changed, particularly given
that qualitative studies have the potential to unlock a great deal in relation to debates
within the literature. They can provide insights into the aims learners have during
the planning period, the processes they try to engage in, and their satisfaction subsequently with the activities they have engaged in. The present research tries to redress
this situation and report qualitative data on planners activities, an endeavour which,
if nothing else, will add to Ortegas initial work.
Ortegas research into planning

First, though, Ortegas research is worth reviewing, because it does make major contributions to our understanding of what learners say about what they do during planning. She
used narrative retellings of picture stories from Hill (1961) as the basis for her research.
The participants in her studies were learners of Spanish at the University of Hawaii. Her
first study involved learners she characterises as low intermediate, and the second study
focused on advanced learners. Participants first listened to a tape recording of the story
in their L1 (not a typical approach in the task literature). Then, following a planning
period, learners provided their own narrative account, after which they were engaged
in retrospective interviews, in which they were probed about what they had done while
planning. An important aspect of this research is the coding scheme that was developed
to work with the data. A strong influence here were schemes developed to taxonomise
learner strategies, particularly OMalley& Chamot (1990), but also Oxford (1990). The
categories in these two schemes were supplemented by examination of the transcripts of
the retrospective interviews, which enabled further categories and themes to emerge in
their own right. The coding system ultimately used was a mixture of the existing learner
strategies schemes (consisting of thiry-four categories), and the additional categories
which achieved salience in the transcripts, generating a further six new codes.
Ortega (2005) reports that retrieval and rehearsal strategies were particularly
important with her participants. She also reports that the advanced group had more
of a balance in their use of these two strategy types than the low intermediate participants, with these latter using more retrieval strategies. She also reported that a range
of monitoring strategies were used, such as production monitoring, although this was
less effectively done with the lower proficiency participants. In the main, her participants felt that planning was helpful. This was partly because it gave them more time
to organise, to formulate thoughts, to solve lexical problems, to practise and rehearse.
Writing notes was also regarded as helpful, enabling the formulation of thought,
helping lexical retrieval and lexical choice, and also helping to monitor grammar.
97
98
There were, h
owever, some reports of planning not being helpful. One reason proposed was that the narrative tasks were rather easy, and did not need much planning.
There were also reports of lack of transfer from planning to actual performance. Many
of these points are prescient with respect to the results to be reported here.
Ortega (2005) makes two more general points which have importance. First, she
draws attention to individual differences. She contrasts communicatively-oriented
participants and those more focused on form, and argues that participants brought
this predisposition to the way planning time is exploited. Second, she draws attention
to issues deriving from the presence of a listener. Participants were put in pairs, with
one as speaker and the other as listener. Both participants were valued in the encounter, but in fact, it was only the data from the speakers which was used. However, the
speakers were clearly aware of the listener as an important factor in the encounter.
Ortega (2005) reports this had a major impact on performance, with speakers reporting thinking of how to use planning time to make content more accessible, of how
they could retrieve easier vocabulary, and how they could avoid more advanced grammar. There was also a reluctance to self-correct and even a willingness to slow down
delivery in order to make comprehension easier for the listener.
Motivation for the present research

An earlier pilot study carried out by the researchers (Pang & Skehan 2006) was their
first attempt to find out what learners planned before doing a speaking task. The planners, undergraduates at a university in Hong Kong, were asked what they did during
the ten-minute planning time they were allowed, what they prioritized, and whether
the planning activities helped them in performing the actual task. The retrospective
interviews (generally conducted in Cantonese) were translated, transcribed and
coded. A preliminary coding scheme was developed from the codes and c ategories
which emerged through a process of iteration. This preliminary coding scheme was
very revealing, demonstrating the viability of using a model of speaking (Levelt 1989)
as the basis for the categories of codes which were developed.
Against this background of previous research, we can now outline the reasons
which motivated the present research. First, it is desirable to gather qualitative data
in more settings with different participants to clarify the robustness and replicability
of extant findings and generalizations. Second, there is the coding scheme. Ortega
(2005) based hers on the work of OMalley & Chamot (1990) and Oxford (1990).
There was a stage in which coding categories additional to those derived from these
two schemes could emerge. However, in Ortegas final coding scheme (Ortega 2005,
pp. 8485) the number of emergent categories is not large in relation to the total
used (as noted above), so that most of the coding category headings are those from
OMalley & Chamot (1990). A difficulty here is that these systems target learning
rather than speaking strategies, and so they may not be totally appropriate. In particular they are not based on any theory of speaking, either first or second language
(Levelt 1999; Kormos 2006). There would seem scope, therefore, to use an emergent
category approach from the beginning, but to have in mind while categories emerge
the possibility of relating them to a model of speaking, such as Levelts, which has
been applied to the second language case (DeBot 1992; Kormos 2006; Skehan 2009a).
In this way, from retrospective interviews, one could look for comments which might
relate to the different stages in the Levelt model (Conceptualisation, Formulation,
and Articulation), as well as associated processes such as monitoring. At the same
time, one could also explore whether and how far Levelts proposals on the functioning of a mental lexicon have relevance for the reports which are gathered.
A third motivation for the present research is that there is no research we are aware
of that explores the relation between participants reports on what they did during
planning and their subsequent performance. Ortegas work gives us excellent insights
into what participants say they do during planning, and also what they say about how
effective they think their planning was, but we have no information about whether
the reports on planning are associated with success or failure. This is important to the
extent that the planning literature is centrally concerned with trying to establish linkages between planning and performance, with performance generally conceived of in
terms of complexity, accuracy, fluency and lexis. Indeed, the debates within the literature are less about the desirability of planning (because in the main, researchers assume
its value), but rather how different interpretations of planning, perhaps with different
tasks, can impact upon different aspects of performance. So, to gather retrospective data
on planning, and to link this to performance, would relate to wider debates within the
planning literature. For instance, such data might shed light on which planning behaviours are most effective, in such a way that one could make relevant suggestions for pedagogic intervention, and even offer suggestions for how more effective planning could
be trained. This, too, would go beyond planning as being a rather monolithic and crude
pedagogic option, and suggest how it could be fine-tuned and targeted more effectively.
Method
Research Questions
This background generates two fairly straightforward research questions. It is inappropriate, given the exploratory, qualitative nature of the proposed research, to put
forward research hypotheses and predictions. Instead the two general questions are:
1. In the context of planning ahead of a narrative task, is a coding scheme based on
categories which emerge from the data consistent with models of speaking?
99
100 Francine Pang & Peter Skehan
2. Is frequency of use of the emergent categories related to different levels of performance on a narrative task, particularly in terms of complexity, accuracy, fluency,
and lexis?
Research methodology
Participants and setting
The participants were selected from a university in Macao. Five hundred and sixtyseven students who took English courses in the English Centre were invited to take a
listening comprehension test, which included a short conversation, a long conversation and a short talk, with 50 questions in total. Cronbachs alpha for this test is 0.82.
Students who got marks between 15 and 24 were considered to be low intermediate
EFL learners, and those who got marks higher than 34 as high intermediate EFL learners for this study. In total 48 students were selected to participate in the study including 24 high intermediate and 24 low intermediate EFL learners, with 24 male and
24female learners, and 24 students majoring in Language and Communication and
24 in Business Administration (see Table 1).
Table 1. Profile of the 48 selected participants
High
Intermediate
Low
Intermediate
Male
Female
Language &
Communication
Dyads:
2: M+F, 2: F+F, 2: M+M
Business
Administration
Dyads:
2: M+F, 2: F+F, 2: M+M
Language &
Communication
Dyads:
2: M+F, 2: F+F, 2: M+M
Business
Administration
Dyads:
2: M+F, 2: F+F, 2: M+M
24
24
Subtotal
Remarks
The researcher met each dyad of participants in an individual room. She briefed the
two participants that they would have ten minutes to plan a picture story individually
and would take turns to tell the story in English for about 2 minutes to each other
(see the instructions in Appendix 1). The listener could not see the pictures and was
told to ask at least one question after the story was told. All storytellings were audiorecorded. When participants finished telling the story, the researcher interviewed each
student individually about what they did during the 10-minute planning time. The
retrospective interview questions, trialed in the pilot study mentioned earlier, were
intended to collect planning activities in a range of areas including words, g rammar,
ideas, story structure. The interview was conducted in Mandarin or Cantonese Chinese
and was audio recorded.
Tasks
Selecting interesting and appropriate story pictures for this study was a challenging
process. Given the success of the Shaun the Sheep video narrative retellings (Wang &
Skehan, this volume), it was decided to adapt the video stories as the basis for making a cartoon series. A wide range of Shaun the Sheep episodes were initially identified. These were then scrutinised by the researchers and reduced to seven. The seven
selected episodes were converted into picture story series, and these were each evaluated by around 10 MA TESOL students. Based on the ratings of the story series given
by the graduate students in terms of clarity, humor, and depth, four picture series were
selected to be used in the pilot study. The two aims of the pilot study were to trial
the retrospective interview questions, and to select two story pictures for the present
study. Fourteen students from the same university in Macao participated in the pilot
study. Two picture series were finally selected based on the following criteria: amount
of useful retrospection about planning, variety of self-reported planning behaviour,
and ratings by the participants (clarity, humor, depth, and difficulty in completion).
The pilot also enabled a decision to be made about the length of planning time to be
used. The two selected story pictures adopted in this study each had 19 pictures in total
and were printed in color for use in the research sessions.
Procedures
The Retrospective Interviews (RIs) of the 48 participants were audio-recorded. All
the 48 digitally recorded RIs were transferred to computer as MP3 files. They were
transcribed (and translated at the same time) using Soundscriber, to facilitate control
over the sound file during transcription. The prompts for these interviews are shown
in Appendix 2.
The data handling for the narrative performances followed the procedures used
in most of the chapters in this volume, and therefore does not need to be discussed
in great detail. A broad transcription was made of each sound file. Transcriptions
were segmented into AS units, and copied to contain two identical lines for each AS
unit. The first line was then coded following CHAT conventions (MacWhinnie 2000),
and represented as the CHAT tier. The second line was coded following TaskProfile
conventions, containing clausal segmentation, error coding, measurement of pauses
longer than 0.40 seconds, and coding for a range of repair types (reformulation, repetition, etc.). In addition, the length of time taken, at millisecond level, for each AS
turn was recorded as a third line. These codings enabled TaskProfile to generate all the
measures required in the present study (for a fuller discussion of performance codings, see Chapter 1).
101
Performance measures
Five performance measures were used in this study. Complexity was measured using
the subordination measure described in Chapter 1: The total number of clauses was
divided by the total number of AS units, generating an index with a minimum value
of 1 (no AS unit contained anything other than a matrix clause, with no subordination), with most values falling between 1 and 2. Accuracy was measured as the
proportion of clauses that were error-free, and in this case, although the data was
coded for gravity of error, the measure which was actually used, on the basis of effective discrimination, was that based on all errors (rather than only serious errors, as
in Skehan and Shun (this volume)). Fluency was measured by two indices, both of
pausing: pauses per 100 words at AS boundary points, and pauses, again per 100
words, which occured mid-clause. Skehan (2009b) has shown that pauses at these
two locations need to be considered separately. Finally, lexical sophistication was
measured as the Lambda score, the index which captured the extent to which the
contribution of the speaker contained less frequent words (see Chapter 1 for a more
detailed account). These five measures constitute a basic set of those described more
fully in Chapter 1.
Segmenting the retrospective interviews

After the transcriptions of the retrospective interviews (RIs) were completed, the
segmentation of the 48 RIs was undertaken. There are different methods to segment
such qualitative data, such as grammatical-based, idea-based, and activity-based
approaches. The activity-based method was applied in this case, because the main
purpose in analyzing the RIs is to examine the types of planning activities ESL speakers would engage in before telling a picture story. Furthermore, using the activitybased approach to segment the RIs allowed the feasibility of building an RI corpus
which might result in more sophisticated analysis of the segments both qualitatively
and quantitatively.
First coding: Constructing the coding scheme

The first task was to code each segment, that is, each RI Unit (the term used in this
study) as a CODE, to represent a planning activity. There were several influences on
this stage. These included an earlier pilot study (Pang & Skehan 2006) based on different narrative tasks and participants, but which had generated a provisional coding scheme. The Ortega (2005) scheme was also examined and was kept in mind as
relevant. Finally, the first authors initial recognition of emerging planning activities
when doing the earlier transcription and segmentation of the RIs had sensitised her to
the sorts of behaviours which had been reported.
The provisional Coding Scheme consisted of little more than a list of codes with
a provisional description of each of them. Although the codes and the categories
were preliminary, they created an initial framework for coding the data in this study
by taking advantage of the similar features of the participants, tertiary Chinese non-
English major students in the Pilot Study and in this study. The 48 RIs were coded
using the method of constant comparison (Strauss & Corbin 1990). That is, the
researcher attempted to closely examine and re-examine each RI Unit, compare for
similarities and differences, and ask questions about the planning activities reflected
in the retrospective interviews. In addition, every effort was made to ensure the
coding system was sensitive to the variety of planning activities the ESL speakers
might use. Detecting all the planning activities reported by each participant, (either
reported directly by the participants or inferred from the interview) was the starting
point for this.
Having constructed the provisional Coding Scheme and having coded the 12
RIs in the previous pilot study, the first author became more sensitive throughout the
investigation to the possibility of any emergent planning activities. The view is that
the more exhaustive the analyses, the more realistic they would be with respect to the
planning activities used by both groups of speakers, the high intermediate and the low
intermediate groups. Therefore, in constructing the Coding Scheme, no quantitative
criterion was used; that is, regardless of the frequency with which a planning activity occurred, it was incorporated into the Coding Scheme. For example, the following three planning activities are included in the Coding Scheme although they only
occurred once: (A) 2. Macro Planning Plan sequence: Look at pictures one by one, then
describe; (D) 6. Lexical choice: Advanced words; (E) 8. Lexical: Planned but not used/
correctly used; (see Appendix 3).
This first pass through the 48 RIs to develop coding categories took a significantly
longer time than was anticipated. This is not only because of coding the data itself (as it
would have taken much less time if a fixed, established coding scheme had been used)
but also because the final Coding Scheme underwent an iterative process of multiple
revisions. The action of examining, re-examining, comparing and asking questions
challenges the RI Units themselves and the method of coding the RI Units continually.
In other words, how to match an RI Unit to a code subtly and how to code all RI units
consistently created considerable complexity.
The first coding of all the 48 RIs resulted in the identification of 42 planning activities (i.e. 42 codes). This Coding Scheme included: the codes of the planning activities,
a description of each code, and an appropriate example, demonstrating the planning
activity. As a brief example of the scheme in operation, the code Connect the pictures
to develop the story plot (described as Try to connect or structure the pictures to develop
the story plot) is the result of conceptualizing an RI Unit I emphasized on the plots.
Iconsidered how to connect the pictures to make the story better.
103
The next stage was to explore whether any order could be brought to the codes.
It proved possible to organise the entire set of 42 planning activities (i.e. the codes)
into five groups reflecting their functions. These were (A) Macro planning; (B) Micro
planning; (C) Lexical and grammar planning; (D) Metacognitive planning, and (E)
Post-task perception and evaluation. The last category will not be pursued further in
this chapter, for reasons of space. This first Coding Scheme was not intended to be a
complete representation of all possible planning activities. Nevertheless, it does represent an exhaustive list of the planning activities the participants used to prepare for
telling a story from a series of pictures.
Second coding: Real coding

The 48 RIs were coded a second time based on the final Coding Scheme (shown in
Appendix 3). During the first coding, while the Coding Scheme was evolving, an
RI Unit that was examined after a code had been added or deleted might be coded
differently to an RI Unit coded earlier although both exemplified the same concept.
Therefore, the second coding was regarded as the real coding, which would be the
basis for the quantitative analysis and the associations with actual performance.
At this point, the first authors theoretical familiarity and sensitivity had grown
significantly from coding the 12 RIs in the previous Pilot Study (Pang & Skehan 2006)
and the coding of the 48 RIs during the first coding. This sensitivity helped in distinguishing the slightly different concepts between closely-related codes and therefore
made it easier to detect examples of codes in use, and also to make final decisions
with some RI Units which were close to falling between two codes or even three
codes. For example, the concept is very similar for the two codes: (D) 5. Metacognitive:
Rehearse: Memorization and (D) 8. Metacognitive: Memorization and so the actual RI
Unit needed to be closely reexamined with coding sensitivity to decide which one was
appropriate in any particular case.
Finally, all the coded retrospective interviews were incorporated within an Excel
file. The same codes were grouped together, and as a result the same RI units could
be reexamined and compared to see whether they fitted into each code so that final
revision of the coding system could be undertaken. In addition, at this stage, the
description of each code was modified to reach its final form.
Results
Basic quantitative analyses
Table 2, below, shows the raw values for each of the coding categories that emerged
from the retrospective interviews. Each line gives one of the coding categories (and
the fuller version is provided in Appendix 3), together with the raw frequencies for
Table 2. Frequency data for all codes

Low prof.
High prof.
Total
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
Plan sequence: Scan, then describe
0 24
0 24
0 48
Plan sequence: Look at each picture,

describe
23 1
24 0
47 1
Understand pictures in detail
5 10 7 2 0
10 6 4 3 1
15 16 11 5 1
Plan general things
16 8
12 12
28 20
Plan small details
8 14 2
2 12 10
10 26 12
Plan how to tell the story
8 11 4 1
12 9 3 0
20 20 7 1
Plan how to express oneself better
13 7 4
13 7 4
26 14 8
Think of ideas beyond the pictures
17 7
10 12 2
27 19 2
Organise the ideas developed from

pictures
17 6 1
14 8 1
31 14 2 1
Connect the pictures to develop

the story
14 9 1
6 12 4 2
20 21 5 2
Lexical: general retrieval
12 12 0
7 12 5
19 24 5
Lexical Choice: Appropriate words
21 3
19 3 2
40 6 2
Lexical Choice: Simple words
16 7 1
20 4 0
36 11 1
Lexical Choice: Connective words
17 6 1
12 12
29 18 1
Lexical Choice: Personification

words
24 0
22 2
46 2
Lexical Choice: Advanced words
24 0
23 1
47 1
Lexical Choice: Variety of words
23 1
21 3
44 4
Lexical Choice: Accurate words
24 0
19 5
43 5
Lexical Compensation:
Approximation
21 3
21 3
42 6
Lexical Compensation:
Circumlocation
15 6 2 1
15 7 2 0
30 13 4 1
Lexical: No or little concern
19 5
23 1
42 6
Grammar: General use

considered
18 5 1
19 5
37 10 1
Grammar: Correct use of tense
17 7
15 9
32 16
Grammar: No or little concern
12 12
11 13
23 25
Number of mentions
Macro Planning
Micro Planning
Lexical and grammar planning
(Continued)
105
Table 2. Frequency data for all codes (Continued)

Low prof.
High prof.
Total
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
Rehearse: General
9 15
16 8
25 23
Rehearse: To be accurate
18 6
17 6 1
35 12 1
Rehearse: To be fluent
15 9
13 11
28 20
Rehearse: To be logical or clear
24 0
19 4 1
43 4 1
Rehearse: To help to memorise
24 0
22 1 1
46 1 1
Take notes to help plan
16 7 0 1
15 7 2 0
31 14 2 1
Take notes: No need
11 13
13 11
24 24
Memorisation
24 0
22 2
46 2
Try to be aware of the listener
3 18 2 0 1
6 11 3 4
9 29 5 4 1
Number of mentions
Metacognitive Planning
the lower proficiency group and the higher proficiency group, as well as the entire
group. This shows the number of times a given coding was made. So, for the first
line (Macro Planning: plan sequence; scan then describe), in the low proficiency group,
no-one coded zero (i.e. no one failed to report this planning behaviour), so that the
table shows that all twenty-four lower proficiency participants reported using this
behaviour. This contrasts with the first of the micro planning behaviours: Understand
pictures in detail. Here, five participants (still with the low proficiency group) did not
report using this behaviour, ten participants reported it once, seven reported it twice,
two reported it three times, and no-one in this group reported using it four or more
times (although in the high proficiency group there was one participant who reported
this behaviour four times). These raw values have been reported, rather than averages
or percentages, because they communicate the situation more effectively.
The table gives a general picture of the frequency with which different codings
were found for the participants. Not a great deal will be said about these figures at
this stage, since more discussion will be provided when we link coded behaviours to
actual performance. Even so, a number of points are worth making. First, there is the
issue of discrimination and whether there is variation between participants in the
use of a particular code. If the goal were simply to characterise self-reported planning behaviours, this would not be important. They are what they are, and it is the
patterns amongst them which would be of interest. But if we are to link such behaviours to performance, it is helpful if there is dispersion. On these grounds, then,
both of the macro behaviours (the two Plan sequence behaviours), as well as several
of the lexical codings, (e.g. Advanced words, plus one or two of the metacognitive
behaviours) are problematic, since effectively only one value is reported, with this
often being zero or one, indicating that the behaviour is simply not reported by the
vast majority of the participants. A second point which stands out is the connection
with proficiency level. Several of the codes show interestingly different patterns for
the low and high proficiency students. For example, the micro planning behaviour
Connect the pictures to develop the story sees fourteen of the low proficiency participants code 0, while only six of the high proficiency participants code in this way.
One wonders, therefore, if this behaviour is facilitated by students higher levels of
proficiency.
The codes and the Levelt model

We can now discuss the different codes in relation to the Levelt model of speaking. The
broad categories are Macro Planning, Micro Planning, Lexical and Grammar Planning, and Metacognitive Planning. Both Macro and Micro Planning are concerned
with Levelts Conceptualiser stage, and with the speakers processing of the ideas that
are going to feed into the pre-verbal message. These codes vary in focus (pictures,
ideas, outside the ideas, manner of expression) and also in scale (local versus general)
but they are all concerned with thinking. Lexical and Grammar planning are much
more focused on performance, and are, fairly obviously, linked with the Formulator stage in speech production (though Formulation processes clearly take place in
relation to prior conceptualisations). Several of the codes are clearly concerned with
lexical retrieval, others are focused on grammar and the building of syntactic frames.
Yet others involve a mixture of the two, both lexical and grammatical, and are also
implicated in Formulation processes that are anticipated, such as handling difficulty
in communication. The one higher order coding category which straddles Leveltian
stages are the Metacognitive codes. Here we have some which are concerned with
Conceptualisation, where the focus is on ideas, or on the communication as part of
a speech event with a listener present. There are a number of codes concerned with
rehearsal, and these seem directed at Formulator operations (although even here, one
is concerned with a form of cognitive monitoring). In any case, it is not difficult to
link the coding categories which emerged with aspects of the Leveltian analysis of
speaking.
Quantitative data
Table 3 shows the performance results for the participants. The data is shown for one
measure of accuracy (error free clauses), one of complexity (subordination), two pausing measures (number of pauses at AS boundaries per 100 words, number of pauses
mid-clause, per 100 words), length of run, and Lambda, so covering accuracy, complexity, fluency, and lexis. The measures are given for the entire group, and then also
107
for low and high proficiency participants. In this case, significance values are also
given, based on between subjects t-tests. The Length of Run measure is only given to
provide a limited degree of comparison with Ortega (2005) who used a very similar
index. For all values in this table the N is 48 overall, with 24 low proficiency participants and 24high proficiency participants.
Table 3. Performance measure for participants
Performance measure
Total group
mean
(St.Dev.)
Low
profic.group
mean
(St.Dev.)
High
profic. group
mean
(St.Dev.)
Error free clauses
.40
(.16)
.34
(.14)
.47
(.15)
0.002
Subordination
1.66
(.30)
1.62
(.26)
1.69
(.33)
0.410
AS boundary pauses
3.38
(1.26)
3.57
(1.48)
3.20
(.99)
0.31
Mid-clause Pauses
3.64
(1.98)
4.07
(2.34)
3.21
(1.48)
0.14
1.85
(.26)
1.80
(.27)
1.91
(.26)
0.15
6.82
(1.44)
0.001
Lambda
Length of run
6.12
(1.42)
5.42
(1.0)
Significance
low vs. high
proficiency
We will first consider the general values in this table. There are now a wide range
of studies (including in this volume) which use the set of measures shown in the table,
and so one can relate the values shown here to the wider literature. In this respect, the
values for accuracy are relatively low (although there may be issues here with Chinese
L1 EFL learners of English, they are somewhat lower than those reported by research
conducted in a U.K. context). The values for fluency are fairly normal for the sorts of
narrative retellings involved here, and are low, if anything, relative to other research
studies. The remaining values, though, are in fact, higher than is normally reported.
The subordination values for tasks of this sort are usually a little lower, possibly more
in the 1.4 to 1.5 range for planned performances. Similarly, the Lambda values, as an
index of lexical sophistication, are definitely higher than is typical, probably around
0.2 to 0.5 above what one normally finds (Skehan 2009b). Finally, the length of run
values are also clearly higher than the norm, since one often sees values here of 3 or 4
rather than the 5.42 and 6.12 shown for the two groups. Indeed, these values approach
those found with native speakers. If one makes a rather tentative comparison with
Ortegas (2005) words within one intonation contour index, one would estimate that
the participants in this study are between her two groups, and perhaps closer to her
Self-reported planning behaviour and second language performance in narrative retelling 109
advanced group. In all, one could characterise these performances as lacking in accuracy, but lexically rich and complex.
Turning to the comparisons of the low and high groups, it is surprising how few significant differences are reported. In fact, there are only two, for accuracy and for length
of run, although each of these comparisons reaches quite a high level of significance.
So the advanced group is more accurate (consistent with Bui (this volume)), and they
produce smoother language, producing units of speech which are longer. In contrast,
with fluency and with lexical and structural complexity, there are no significant differences. It is noteworthy that in all these cases, the high proficiency group produces
arithmetic means which indicate a higher level of performance, but none of these comparisons actually reach significance. But one important point does need to be made here
regarding the accuracy contrast. In the next section we are going to turn to the associations between self-reported planning behaviours and actual performance. For structural
complexity, for lexical sophistication and for fluency, we shall treat the two proficiency
groups as effectively one group, given the lack of significant differences found. But the
two groups have to be considered differently for accuracy. As we will see, this poses a
problem for interpretation. If certain coded behaviours are associated with higher levels
of performance and also discriminate between the two proficiency groups, we cannot
tell whether these behaviours are enabled by the higher proficiency, or whether they
contribute to that higher level of performance. We will return to this problem below.
Frequency of reported planning behaviour and performance levels

We turn next to exploring the relationship between the different coded behaviours
and performance in the narrative retellings. To this end, every code (with its values
of 0, 1, 2, 3 for the use of that code) was related to performance in terms of accuracy,
complexity, fluency (both pausing measures), and lexical sophistication. For example,
Lexical Code 1, General retrieval tallied as 19 individuals with 0 occurrences, 24 with
1 occurrence, and 5 individuals with two occurrences. These three values (0, 1, 2) and
the associated participants were then examined for mean error-free clauses scores, as
shown in Table 4. These figures, based on Lexical Code 1, suggest that there is some
association between the use of this behaviour during planning and level of accuracy,
in that greater use of this code seems to be associated with greater accuracy. It should
be made clear that, given the exploratory nature of this study, and the strong reliance
on qualitative data, these figures are only presented descriptively, to gain some insight
into what is associated with higher performance.
The same descriptive procedure was followed with all the codes used in this study.
For each code, number of uses of a code was the basis for calculating a mean score for
those individuals who used the code that often, and then frequency of use was related
to the mean performance score (accuracy, complexity, fluency, lexis) involved. Finally,
110
Table 4. Frequency and Error-Free clause mean scores: Lexical Code 1 General retrieval
Code
Number of reports
Mean Error free clauses score
19
.37
24
.41
.45
for the different performance areas, all codes which seemed to have some association
with performance were brought together (whereas codes which generated no such
relationship were not included). Table 5 presents the findings for the Error-free clauses
measure, both positive and negative. The first two columns give information about the
coding involved, and fuller information is provided in Appendix 3. The third column
shows whether there is a linkage to proficiency, and the fourth column shows the distribution of coding frequency (0, 1, 2, etc.) and also the mean error-free clauses score
associated with each frequency.
Table 5. Codes associated with greater accuracy
No. CODE
Description of CODE.
Proficiency
linkage
Performance influence
LE1: Lexical:
General retrieval
Try to remember the words used in

the task
Yes
Distrib: 19 -24 5
EFC .37/.41/.47
LE8: Lexical choice:

Accurate words
Try to use accurate words
Yes
Distrib: 43 5
EFC: .40/.46
Meta2:
The notes taken are helpful to
Comment on notes: structure a clear story
Structure clear story
No
Distrib: 35 -12 1
EFC: .38/.47/.54
Meta3:
Rehearse: Fluent
Rehearse to be fluent
No
Distrib: 28 - 20
EFC: .36/.47
Meta4:
Rehearse:
Logical or clear
Rehearse to check whether what is

planned is logical or clear
Yes
Distrib: 43 4 - 1
EFC: .39/.55/.50.
Micro 8:
Connect the
pictures to develop
the story plot
Try to connect or structure the

pictures to develop the story plot
Yes
Distrib: 19 -21 5 - 2
EFC: .35/.43/.43/.56
Le 10: Lexical
compensation:
Circumlocution
Use a few words or another way

to replace a word which cannot be
recalled
No
Distrib: 30 -13 - 4
EFC: .43/.38/.27
Mic5: Plan how to

express better
Plan how to describe the story more

vivid, interesting, or clearer etc.
No
Distrib: 26 -14 - 8
EFC: .43/.38/.-37
Positive
Negative
Only the relations with error-free clauses have been shown here, and the other performance areas are covered in the tables which follow. It may be, therefore, that codes
appear in more than one table, if a particular code has a relationship with more than
one performance area. Intriguingly, one or two codes have positive associations with
one performance area and negative associations with another. For example, LE1, Lexical General Retrieval, the first coding shown in Table 5, and with a positive relationship,
also has a negative relation with mid-clause pausing (negative, that is, with greater fluency, in that LE1 is associated with more mid-clause pausing). In any case, and setting
aside proficiency linkages for the moment, it appears that higher accuracy scores are
obtained when more focused behaviour is involved, whether this is rehearsing specific
words or rehearsing in a more targeted way. Rehearsing (note-taking) linked to structure
(Meta2) is beneficial, consistent with focused behaviour within planning being effective. In contrast, lower accuracy is associated with planning leading to problems, either
through lexical difficulties or with over-extension in what is attempted. It is interesting
that these suggest the potential for planning to be two-edged: It can lead the speaker
into trouble, as well as be helpful, depending on how the planning time is used.
Table 6 below shows the comparable information for the Subordination measure.
Once again the linkage with proficiency level is shown, but it should be borne in mind
that there was no significant contrast here between the two proficiency levels.
An obvious first remark here has to be that the number of positive and negative
associations is somewhat reversed for those with accuracy. There, positive associations predominated. Here we find quite a few more negative connections than positive. Regarding the positive influences, it does come across that ideas lead the way
when subordination increases, and grammar takes a back seat. It also appears that
organising, and going from big to small, is a good strategy to raise subordination. The
literature on tasks suggests that task structure is generally a good thing, and that earlier
findings that accuracy was advanced by structure have now been extended to suggest
that complexity is also affected. The results in this section are that if learners can use
planning time to bring their own structure to the narrative retelling, this seems to
raise the subordination measure. The negative influences do seem to have a common
quality here. It is over-ambition. This could be over-ambition in lexical choices, or
over-ambition generally in what can be done during planning time. This seems to be
associated with performance being damaged to some degree, particularly with regard
to complexity. In addition, it seems as if those who focus on grammar do so at the
expense of complexity, at least as indexed by subordination.
Next, we turn to the two measures of fluency, AS pauses and mid-clause pauses.
They are treated separately, mainly because while there is a certain amount of overlap
in the associations found, mostly the two fluency measures are influenced by different
things, and in one case a code is negatively associated with one, and positively associated with the other, as we shall see. The information for AS-pauses is given in Table 7.
A slight difference in this compared to the other tables is that where there is a relevant
111
112
Table 6. Codes associated with higher and lower subordination scores

Positive
Le 14: Grammar:
No/Little concern
Mi3: Plan small
details
Mi4: Plan how to
tell the story
Mi7: Organize the
ideas developed
from the pictures
No or very little concern about

grammar use
Plan small details, such as detailed
description
Plan how to tell the story in general
No
Try to organize the ideas developed

from different pictures to have a clear
story plot
No
Le2: Lexical choice:

Appropriate words
Choose appropriate words for telling

the story
No
Distrib: 40 - 6 - 2
Negative
Subord.: 1.68/1.58/1.56

Connective words
Choose to use some connective words
Yes
Distrib: 29 - 18 - 1
Negative,
Subord. 1.71/1.58/1.37

Various words
Try to use various words to talk about

the same thing
No
Distrib: 44 - 4
Negative,
Subord: 1.67/1.46

Accurate words
Yes
Distrib: 43 - 5
Subord: 1.67/1.58
Le9: Lexical
compensation:
Approximating
Use a word similar in meaning to

replace the word which cant be
recalled
No
Distrib: 42 - 6
Subord: 1.67/1.53
Le 12: Grammar:
General use
Think of general use of grammar
No
Distrib: 37 - 10
Subord: 1.68/1.62
Le 13: Grammar:
Tense
Think of the use of correct tense
No
Distrib: 32 - 16
Subord: 1.7/1.57
Mi2: Plan general

things
Plan general things, such as describing Yes

the general plot of the story
Distrib: 28 - 20
Subord: 1.69/1.61
Mi5: Plan how to

express better
Plan how to describe the story more

vivid, interesting, or clearer etc.
No
Distrib: 26 14 - 8
Subord: 1.69/1.63/1.59
Mi8: Connect the

pictures to develop
the story plot

Yes
Distrib: 19 21 - 5 - 2
Subord: 1.71/1.64/1.56/1.5
Yes
Yes
Distrib: 23 - 25
Subord: 1.62/1.69
Distrib: 10 - 26 - 12
Subord: 1.6/1.62/1.78
Distrib: 20 20 - 7 - 1
Subord: 1.65/1.63/1.72/1.89
Distrib: 31 14 2 - 1
Subord: 1.63/1.7/1.72/1.63
Negative
mid-clause pausing association, that too is shown. The same thing is done in Table 8,
since AS pause information is shown in a table focusing on mid-clause pauses. In this
way, the two tables provide separate information on these two aspects of dysfluency,
but also facilitate interpretations of shared influences, such as Lexis 14, in the positive
section of Table 7, or discrepant influences, such as Lexis 8, in the Negative section of
the same table. Note also that greater fluency is indexed by lower pausing scores.
Table 7. Codes associated with differences in AS clause boundary pausing per 100 words
Positive
Various words
Try to use various words to talk

about the same thing
No
Distrib:
AS pausing:
44 - 4
3.41/3.07
Le 10: Lexical
compensation:
Circumlocution
Use a few words or another way

to replace a word which cannot
be recalled
No
Distrib:
AS pausing
30 - 13 - 4
3.62/2.96/3.06
Le 11: Lexical:
No/Little concern

vocabulary.
Distrib:
AS pausing
42 - 6
3.42/3.16
Le 14: Grammar:
No/Little concern

grammar use
No
Distrib:
AS pausing
Positive,
Mid pause:
23 - 25
3.88/2.93
Me4: Rehearse:
Logical or clear
Rehearse to check whether what

is planned is logical or clear
Yes
Distrib:
AS pausing
43 - 4 - 1
3.44/2.88/3.05
Mi3: Plan small

details
Plan small details, such as

detailed description
Yes
Distrib:
10 26 - 12
AS pausing 3.68/3.27/3.37
Positive,
Mid-pause: 5.25/3.43/2.77
Mi7: Organise the

ideas developed
from the pictures
Try to organise the ideas

developed from different pictures
to have a clear story plot
No
Distrib:
AS pausing
31 14 - 2
3.53/3.16/2.16

Appropriate words
Choose appropriate words for

telling the story
No
Distrib:
AS pausing
40 - 6 - 2
3.34/3.54/3.72

connective words
Choose to use some connective

words
Yes
Distrib:
29 - 18
AS pausing 3.15/3.65
Negative,
Mid-pause: 3.54/3.79

Accurate words
Yes
Distrib:
AS pausing
Positive,
Mid-pause:
43 - 5
3.35/3.69
37 - 103.18/4.0
4.22/3.10
Negative
3.8/2.26
Le12: Grammar:
general use
Think of general use of grammar
No
Distrib:
AS pausing
Le13: Grammar:
Tense
Think of the use of the correct

tense
No
Distrib:
32 - 16
AS pausing 2.98/4.19
Negative,
Mid-clause: 3.18/4.56
Mi6: Think of ideas

beyond pictures
Think of more ideas beyond the

pictures
Yes
Distrib:
AS pausing
Positive,
Mid-pause:
Mi8: Connect the

pictures to develop
the story

Yes
Distrib:
19 21 5 - 2
AS pausing 3.17/3.48/3.58/4.08
Negative, Mid, 3.37/3.99/3.93/2.36
27 19 - 2
3.32/3.42/3.93
3.82/3.49/2.6
113
114
AS pauses, clearly, are at a natural pausing point in speech, as evidenced by native

speakers pausing at this point also (Skehan 2009b). But the issue with them, with second language speakers, is when pausing at this point is greater than would be typical of
native speakers. The data here are consistent with this interpretation, as indexed by the
native speaker data given in Skehan (2009b). Once again, we can consider the positive
influences, that is, those that lead to less pausing at this point, and negative influences,
that lead to more pausing, separately. Regarding the content of what is said, it appears
that AS-pausing is reduced when ideas are organised, and so speakers embark on more
extensive contributions. With lexis, what comes across here is that it is important to
be resourceful and flexible with words, and that if one is, pausing is reduced. Perhaps
going along with both these points, it seems to be advantageous for this aspect of fluency if one rehearses and works small (presumably so that what is prepared transfers
from planning to performance more easily). Finally, it seems beneficial for fluency if
the speaker avoids a focus on form. Most of the negative influences on AS boundary
fluency are the reverse of the positive influences. Being fixed and inflexible with words
is not good for fluency, as one might expect. Being very concerned with form, similarly, leads to more AS pausing. And finally, it seems harmful for fluency if the speaker
tries to be ambitious and fancy in what they say.
A final note on these results is the degree of correspondence between AS pausing and mid-clause pausing. Two positive codes are shared (Le14 (dont worry about
grammar) and Micro3 (plan small details)), as are three negative codes (Le4 (use connective words), Le13 (think of the correct tense), and Micro8 (Structure the pictures)).
Surprisingly, two codes diverge. Le8 (try to use accurate words) and Micro6 (think of
ideas beyond pictures) are negative for AS pausing and positive for mid-clause pausing,
which is, to say the least, intriguing.
We turn next to Table 8, which shows the data for the influences on mid-clause
pausing. For the positive influences here, the general theme seems to be one of avoiding specific sources of difficulty, of avoiding the areas which might cause difficulty.
These, of course, include avoiding preoccupation with form, but more generally the
strategy seems to be to find paths where difficulties are less likely. There is certainly a
sense of anticipating difficulty so as to be better prepared when it does occur (as perhaps it inevitably will), and this is allied to a sense of what to do when that difficulty
arises. There is also a (grudging?) willingness to be pushed to be more ambitious, but
only when this does not take the speaker into waters which are too uncharted. The
negative influences are also interesting. First of all, there is the possibility that one
can be pushed too far. Once again, being too fancy does not seem to be a good influence. Similarly, (and this chimes with other performance areas), focusing on form,
and being general, seem to have some costs in terms of mid-clause pauses. Finally, and
perhaps surprisingly, taking notes can contain dangers, perhaps because this too leads
speakers to go beyond their abilities.
Table 8. Codes associated with differences in mid-clause pausing

Positive
simple words
Choose to use simple words
No
Distrib:
36 11 1
Mid.Cl.Pausing: 3.95/2.67/3.26

accurate words
Yes
Distib:
43 - 5
Mid.Cl.Pausing: 3.8/3.26
Negative: AS:
3.35/3.69
Le9: Lexical
compensation
Use a similar word to that

which cant be recalled
No
Distrib:
42 - 6
Le14: Grammar,
Little concern
No or very little concern

about grammar use
No
Distrib:
23 25
Positive, AS:
3.88/2.93
Mi3: Plan small

details
Plan small details, such as

detailed description
Yes
Distib:
10 26 - 12
Positive, AS:
3.68/3.27/3.37
Mi6: Think of ideas

beyond pictures
Think of more ideas beyond

the pictures
Yes
Distrib:
27 - 19 - 2
Negative, AS:
3.32/3.42/3.93
Le1: Lexical:
General retrieval
Try to remember the words

used in the task
Yes
Distrib:
19 24 5
Le4: Lexical:
Connective words
Choose to use some

connective words
Yes
Distrib:
29 18
Negative, AS:
3.15/3.65
Le13: Grammar:
Tense
Think of the use of correct

tense
No
Distrib:
32 16
Negative: AS:
2.98/4.19
Meta2: Rehearse,
Accurate
Rehearse to be accurate
No
Distrib:
35 12
Meta7: No need to
take notes
There is no need to take

notes
No
Distrib:
24 24
Mid.Cl.Pausing: 3.48/ 3.8
Mi8: Connect pictures

to develop story
Try to connect or structure

the pictures to develop the
story plot
Yes
Distrib:
19 21 5 - 2.
Mid.Cl.Paus.: 3.37/3.99/3.93/2.36
Negative: AS:
3.17/3.48/3.58/4.08
Negative
The final area to be considered is that of lexical sophistication, as indexed by the

value Lambda. The relevant data are shown in Table 9 below. Perhaps the first thing
to say here is how disappointingly small this table is. It would have been nice to have
a range of positive and negative associations between coded planning behaviours and
lexical performance, similar to other performance areas. In the event there are only
two positive influences and two negative, which seems remarkably slight. Even they
are not terribly revealing. Positively, we have a focus on words and the desirability of
115
116
being general with ideas. Negatively, there is the behaviour of avoiding lexical difficulty, of being (too) unambitious, and of relatively unfocussed rehearsal. There may
be connections here with other areas level of achievable ambition is relevant, as is
the issue of generality-specificity, although here the influence is positive for generality
for ideas, and negative for rehearsal. The most we can draw from this is that there does
seem to be a specific attraction or rejection of words for different participants.
Table 9. Lexical sophistication linked to coding
Positive
Accurate words
Yes
Distrib:
43 5
Lex.Sophist.: 1.83/2.02
Micro5: Plan how

to tell the story

in general
Yes
Distrib:
20 20 7
Lex.Sophist.: 1.83/1.87/1.92

Simple words
Choose to use simple words
No
Distrib:
36 11 1
Lex.Sophist.: 1.9/1.73/1.59
Meta1: Rehearse,
General
Rehearse what is planned to say

(and hopefully remember better)
Yes
Distrib:
25 23
Lex.Sophist.: 1.91/1.68
Negative
Discussion
In this discussion section we will first explore convergence and contrast with Ortega
(2005). Then we will examine successes and failures within the present coding scheme
in terms of its broader categories, notably the macro, and lexico-grammar categories,
and then move to relate this data to dimensions of performance (complexity, accuracy,
fluency, lexis) as well as the Levelt model. We will finish by briefly discussing debates
in the planning literature, as illuminated by the present study, and then make some
suggestions for training and pedagogic applications.
The coding scheme developed in the present research has proved usable and illuminating. It contains the broad categories of Macro Cognitive Codes, Micro Cognitive
Codes, Lexical and Grammar Codes, and Metacognitive Codes. This packaging may
contain a number of similar elements to those in Ortega (2005), but their arrangement
is sometimes different. The major change is the introduction of macro and micro codes.
Some of the detailed codes here relate to Ortegas metacognitive strategies, since they
concern organisational planning and various forms of attentional focus, (though, they
do not emphasise monitoring and evaluation as much as she does in relation to the
Meta-cognitive codes.) Also the present scheme has a number of lexical and grammar
planning codes, which in principle connect with cognitive strategies. However, here
the overlap with Ortegas scheme is not extensive. In fact, our system treats rehearsal
as part of metacognitive planning (on the basis that there is self-awareness involved),
whereas Ortega includes it as a form of cognitive strategy. Our rehearsal codes are also
more explicitly linked with aspects of performance, which can be closely associated
with different stages of Levelts model.
We now shift focus a little and consider how the codes are associated with differences in performance. Here we have some interesting contrasts. The two Macro Codes
(Table 2) were of little help in the analysis, in that they totally failed to discriminate
between participants. (As a result, there was no scope for any association with performance to emerge.) In contrast, the Micro planning codes were far more successful, though only one (Micro1) had no connection with performance. The other seven
discriminated, and also had associations with different performance areas, with a tendency to link with complexity (Table 6) and pausing, both AS pausing (Table 7) and
mid-clause pausing (Table 8). Turning to the Lexical and Grammatical codes, their
contribution was more mixed, with two lexical codes yielding nothing. The remainder,
though, did make some contribution to the associations with performance. The lexical
codes had associations with all performance areas, both negative as well as positive,
although fluency (Tables 7 and 8) was the commonest area to crop up. Interestingly,
the three grammar codes, which all revealed associations with aspects of performance,
were consistent: they showed no relation to accuracy (Table 5), and showed negative
relationships with subordination (Table 6) and pausing (Tables 7 and 8)! Finally, the
metacognitive codes were a mixed bag. Around half either failed to discriminate or
had no connection with performance. The focus here is on rehearsal, and while this
was important for accuracy (the most consistent sub-area to have this connection:
Table 5), other metacognitive codes, such as involving notes, or memorisation, or
being aware of the listener, had little to provoke discussion. Clearly these results suggest that some areas of reported planning behaviour are more rewarding than others.
We turn next to exploring in greater depth why codes were associated with performance success. We will propose five wider principles which seem to subsume the more
detailed codes which have been discussed so far. The principles are:
Build your own structure

Avoid trouble, and be realistic
Handle trouble when it occurs
Plan small or specific (versus plan general)
Avoid grammar focus
The principles emerged from a close scrutiny of Tables 5 to 9, from which wider
generalizations seemed warranted. These generalizations were based on the specific
codes which were associated with stronger and weaker performance. The generalizations always subsumed the separate codes into the more general principle concerned.
117
118
I nterestingly, four of the five principles cut across the performance areas, and so they
are proposed as general principles for good use of planning time, though they manifest
themselves in different performance areas in different ways. Only the fifth is directly
connected with any particular performance area. In fact, the first two code bundles
are associated with widespread improvements in performance, whereas as we go on,
the focus for improvement is narrowed somewhat.
Build your own structure

Several codes are relevant here. Plan how to tell the story (Micro4), Organise the ideas
(Micro7), Rehearse to check whether what is planned is logical and clear (Meta4). They
have in common that the speaker imposes some organisation on what is to be said. This
is particularly interesting given the independent strand of task research (see S kehan&
Shun (this volume) and Wang & Skehan (this volume)) which argues that tasks
containing structure lead to higher peformance (Skehan, this volume, C
hapter 10).
Equally interesting is that, consistent with the task structure literature, many aspects
of performance are involved, including accuracy, complexity, and AS-boundary pausing, a powerful combination. One assumes that having a clearer structure makes it
more likely that the speaker can be ahead of the game, as it were, and create on-line
planning opportunities, enabling errors to be avoided, more ambitious language used,
and also, with favourable conditions building on favourable conditions, more effective
clause boundary pausing to be possible, which in turn enables easier marshalling of
resources during speech.
Avoid trouble, and be realistic

Again, a number of codes are relevant. Choose to use simple words (LexGram3), Have
no or little concern about vocabulary (LexGram11), Have no or little concern about
grammar use (LexGram14), Rehearse (To be Accurate-Meta2: Fluent-Meta3, Logical
or Clear-Meta4), with these being positive relationships with performance, and then
Plan general (Micro2), a negative relationship, versus Plan small details (Micro3), a
positive relationship, and Plan how to express better (Micro5, a negative relationship,
Lexical choice appropriate words (LexGram2) a negative relationship, Use accurate
words (LexGram8) which was positive with accuracy, but negative for complexity and
fluency, and Plan how to express better (Micro5) a negative relationship with accuracy
and complexity. When one reflects on the positive and negative influences here, the
common thread seems to be better performance on the occasions when speakers use
planning time to prepare simpler things and to work within their limited abilities. The
negative relationships reflect the ways a speaker can go wrong and overdo things, and
thereby create trouble for themselves. So speakers who give themselves reasonable challenges (plan small and simple, dont worry about grammar or v ocabulary, rehearse),
and do not plan their way into trouble (plan general, plan how to be better, and to use
more appropriate words), do better. Most interesting of all, this strategy for using planning time seems to raise accuracy, complexity and both aspects of fluency.
Handle trouble when it occurs

Of course, for second language speakers, trouble is likely to be inevitable, and so an
important positive attribute is to be able to deal with that trouble when it arrives. One
aspect of this connects with the first construct proposed above, build your own structure. Skehan (Chapter 10, this volume) proposes that one of the benefits of structure is
that it enables the speaker, after there has been difficulty, to rejoin the discourse thread
that the structure makes possible. So, rather than fall behind further, the speaker can
find a new starting point, and in that way, and perhaps with a bit of luck, recover
a parallel mode of processing, as opposed to the serial mode that is provoked by a
mental lexicon giving the speaker retrieval problems (Kormos 2006). There are other
planning codes that are relevant here. LexGram9 (Lexical compensation) has a positive relationship with mid-clause pausing, while LexGram10 (Lexical compensation
and circumlocution), is positive in its effects on AS-boundary pausing. In each case,
though, there is also a negative relationship with some aspect of form, so dealing with
trouble has some costs, too.
Plan small or specific (versus Plan General)

The main evidence here comes from the Micro codings, Micro2, Plan general, which
has a negative relationship with subordination, compared to Micro3, Plan small details,
which is positive in its effects on subordination and pausing, and also Rehearse General
(Meta1) which has a negative association with lexical sophistication), and Meta2 and
Meta3, Rehearse Accurate and Rehearse Fluency, respectively, which are positive. An
issue which perhaps underlies these factors is that of transfer from the planning phase
into actual performance. It appears here as though being focused and specific in what
one does (and this also means plan small, a further example of avoiding trouble),
enables the more limited aims to be converted from the earlier planning phase into
the focused performance that is intended. General planning processes, in contrast, do
not seem to endure, and do not have the same sort of impact. One way or another, the
impact of planning small, and avoiding more general planning (Micro 2), can impact
on pretty much all aspects of performance.
Avoid grammar focus

The key codings here are LexGram12 (General grammar use), LexGram13 (Grammar,
Tense), and LexGram14 (Grammar, no concern). None of these has any impact upon
119
accuracy or error. In contrast, the first two, implying a concern for grammar, are associated negatively with subordination and fluency, and the third, a negative portrayal of
grammar, is positively associated with subordination and fluency. So, the reverse effect
is found to what might be expected here, and it suggests that focusing on grammar
confers only disadvantages, and no advantages.
Reflecting on these five principles for effective use of planning time, the most
notable reaction one is likely to have is how negative they are, and that they seem to
convey that we have learned more about how to use planning badly than how to use
planning well. There are suggestions of some principles associated with a good use
of planning, such as to impose structure, to be led perhaps by ideas and not form,
and to plan small and detailed. But in general, it is what not to do that emerges more
strikingly.
Indeed, the preponderance of the codes where positive relationships are involved
are concerned with complexity and fluency. This is consistent with the generally
accepted finding in the literature (e.g. Ortega 2005) that planning has a beneficial effect
on complexity and fluency. So it seems as if our results are consistent with the broad
quantitative results that have been published (although of course, there are exceptions,
such as Ellis (2009) showing that accuracy effects are not at all uncommon). In other
words, the range of influences we have just covered are consistent with the findings
from the literature, and provide, therefore, some detailed support in terms of what
second language speakers say they do when they plan.
To put this another way, the reported behaviours which are associated with raised
accuracy and with raised lexical sophistication are not so numerous, which is something of a disappointment. There are some, however, such as Lexical Choice, Accurate Words, as well as General Lexical Retrieval, which make intuitive sense, as well
as the Micro code of Connect or Structure the pictures, implying the value of giving a
structure to the story for accuracy. But Rehearsal, Fluency, and Rehearsal to be logical
or clear are less obviously relevant. So what underlies the association between selfreported planning behaviours and greater accuracy remains a puzzle. In addition, the
Accurate Words coding, while good for accuracy, seems bad for Complexity and Fluency, suggesting some degree of trade-off in performance. Accurate Words is, though,
the main positive influence on lexical sophistication, while the other influences in this
performance area are the more generalised negative influences of focusing on simple
words, and engaging in general rehearsal. We are left, in other words, with few clear
and convincing influences on either accuracy or lexical sophistication, which may
have some relevance for the less consistent findings in this area. Behaviours generally
concerned with rehearsal of specific things seems good, but the avoidance of harmful
influences (in general) seems just as important.
The other issue arising from the literature which is worth commenting on is the
respective claims of the Trade-off approaches and the Cognition Hypothesis. Basically,
the present study has little direct to say on this. But perhaps two points are worth making. First, there is often competition between different performance areas when one
looks at the influence of individual coded behaviours, and so raised performance in
one area is often associated with lower performance in another. In general, it looks as
if complexity and fluency often go together, and do not compete. In contrast, accuracy
does seem to compete with other performance dimensions. The Cognition Hypothesis proposes that under certain circumstances complexity and accuracy can both be
raised. When one looks at the way influences on these two areas which arise out of
planning seem rarely to go together, one can conclude that the present study does not
seem particularly supportive for the Cognition Hypothesis. The second point relates to
the way task complexity is seen as the driver, in the Cognition Hypothesis, for raising
accuracy and complexity jointly. One can see from the present study the existence of
influences which cause ideas to predominate and therefore push the speaker to make
the task more complex. On these occasions, however, there is no sign that there is a
beneficial influence on accuracy as well.
Some of these issues are taken up in the final chapter of this volume. Two other
chapters (Bui (this volume) and Wang (this volume)) explore other facets of planning
quantitatively. In the final chapter these different findings will be brought together
to offer a more general account of the role of planning in second language task
performance.
Conclusion
In general, the planning research supports the idea that using planning in pedagogic
contexts where a communicative approach is prevalent is a good thing (Ellis 2005a).
The literature suggests that it is rare to find studies which suggest that planning is
harmful. There are studies which point to its limitations, but it is clear that using
planning is far better on most occasions than not using it. But at the same time, we
have not made that much progress over the last twenty-five years in making suggestions as to when planning is most effective, what alternatives are available to the
planner, and ultimately, whether effectiveness in planning can be trained. The present study provides no clear answers to any of these issues, but it is suggestive. The
differences in reported planning behaviours certainly suggest that not all planners
do the same things, and that some behaviours are likely to be more effective than
others. The discussion above suggested that planning choices can be most effectively
regarded as a set of operating principles (build your own structure, avoid trouble,
dont over-extend, handle trouble, plan small, dont be too focused on grammar). It
appears, then, that it would be worth exploring whether these principles are general;
whether speakers who do not at first think of using them could be trained to use
121
122
them (and whether this would then be more effective); and perhaps most ambitiously of all, whether speakers could be induced to use planning behaviours likely
to target particular performance dimensions. The research possibilities here are considerable, and they have the important quality that they would seem likely to have
considerable pedagogic payoff.
References
Bei, X. (2010). The effects of topic familiarity and strategic planning in topic-based task performance at
different proficiency levels. Unpublished Ph.D. thesis. Chinese University of Hong Kong.
11, 367383.
De Bot, K. (1992). A bilingual production model: Levelts Speaking model adapted. Applied Linguistics, 13, 124.
Ellis, R. (Ed.). (2005a). Planning and task performance in a second language. Amsterdam: John
Benjamins.
Ellis, R. (2005b). Planning and task-based performance: Theory and research. In R. Ellis (Ed.), Planning and task performance in a second language (pp. 336). Amsterdam: John Benjamins.
Foster P. & Skehan P. (1996). The influence of planning on performance in task-based learning. Studies in Second Language Acquisition 18(3), 299324.
Hill L.A. (1961). Picture composition book. London: Longman.
Kormos J. (2006). Speech production and second language acquisition. Mahwah, NJ: Lawrence
Erlbaum Associates.
Levelt, W.J. (1999). Language production: a blueprint of the speaker. In C. Brown & P. Hagoort (Eds.),
Neurocognition of language, (pp. 83122). Oxford: Oxford University Press.
MacWhinney B. (2000). The CHILDES Project: Tools for analysing talk, Volume 1: Transcription format and programs (3rd ed.). Mahwah, NJ: Lawrence Erlbaum Associates.
Mehnert, U. (1998). The effects of different lengths of time for planning on second language discourse. Studies in Second Language Acquisition, 20, 5283.
Michel, M.C., F.Kuiken, & Vedder, I. (2007). Effects of task complexity and task condition on Dutch
L2. International Review of Applied Linguistics, 45(3), 241259.
OMalley, J.M., & Chamot, A.U. (1990). Learning strategies in second language acquisition. Cambridge:
CUP.
Ortega L. (1995). The effect of planning on L2 Spanish narratives. Research Note 15. Honolulu, HI:
University of Hawaii Second Language Teaching and Curriculum Center.
Ortega L. (1999). Planning and focus on form in L2 oral performance. Studies in Second Language
John Benjamins.
Oxford R. (1990). Language learning strategies: What every teacher should know. Rowley, MA:
Newbury House.
Pang F., & Skehan P. (2006). What do learners do when they plan: a qualitative study. Paper p
resented
at St. Marys University College, Twickenham.
Revesz A. (2009). Task complexity, focus on form, and second language development. Studies in
Second Language Acquisition, 31(3), 437470.
Robinson P. (2001). Task complexity, task difficulty, and task production: Exploring interactions in a
componential framework. Applied Linguistics, 22, 2757.
Robinson P. (2011). Second language task complexity, the Cognition Hypothesis, language learning,
and performance. In P. Robinson (Ed.), Second language task complexity: Researching the Cognition Hypothesis of language learning and performance (pp. 338). Amsterdam: John Benjamins.
Skehan, P. (2009a). Modeling second language performance: Integrating complexity, accuracy, fluency and lexis. Applied Linguistics, 30(4), 510532.
tasks. In B. Richards, H. Daller, D. Malvern, & P. Meara (Eds.), Vocabulary studies in first
and second language acquisition: The interface between theory and application. (pp. 107124).
London: Palgrave Macmillan.
language performance. Language Teaching Research, 1(3), 185211.
Strauss, A., & Corbin, J. (1990). Basics of qualitative research: Grounded theory procedures and techniques. Newbury Park, CA: Sage.
Tavakoli, P., & Foster, P. (2008). Task design and second language performance: The effect of narrative type on learner output. Language Learning, 58(2), 439473.
Tavakoli, P., & Skehan, P. (2005). Planning, task structure, and performance testing. In R. Ellis (Ed.),
Planning and task performance in a second language, (pp. 239276). Amsterdam: John Benjamins.
Wang, Z. (2009). Modelling speech production and performance: Evidence from five types of planning
and two task structures. Unpublished Ph.D. thesis. Chinese University of Hong Kong.
Wigglesworth, G. (1997). An investigation of planning time and proficiency level on oral test
discourse. Language Testing, 14(1), 85106.
Appendix 1
Instructions for the Picture Story Telling Task
In this task you will look at a series of pictures, and then you will need to tell the story
in these pictures to the other student. When you finish the story, the other student will
ask you a question or more than one question.
You will have ten minutes, by yourself, to plan what you are going to say during
the picture story telling. Use the ten minutes effectively to tell a good and clear story.
If you go first, then you will describe the story in your pictures to the other student. Do not show the pictures you are describing to the other student. The other
student does not have these pictures to look at.
If you go second, the other student will describe the story in their (different) pictures to you.
Try to talk for 2 minutes or more.
123
Instructions when you are a listener

Listen to the other student telling the story in their pictures. When they finish, try to
ask a question or more than one question.
Ask a question if you do not understand something
Or ask a question if you would like the other student to say more about the series
of pictures.
Appendix 2
Retrospective interview prompts
Name:
Major: Gender:
General
Prompts
Planning
(Self report)
What did you do? How did you plan? Tell me what you did during the
time youre planning?
Then (Notes)
Here are your notes. Do they help you remember more about what
you did when you planned? Did making notes help in the actual task
although youre not allowed to look at them?
General reactions to
self report
(more focus Qs)
Do you think the planning time was useful? Why or why not?
Do you think the planning was useful for the general way or for small
things? Why?
What effect did the planning have on the way you did the task?
Emphasis
(Self report)
What did you emphasise?
More specific questions on

emphasis.
(If nothing said on each of
the expected area, ask one
by one.)
1. WORDS
To remember words that might be useful?
To use words that might be diverse, accurate?
2. GRAMMAR
To remember grammar?
To avoid mistakes (correctness)?
3. IDEAS
To think of ideas?
To organize ideas?
4. STRUCTURE OF THE STORY (Picture Story Telling Task)
To structure the flow of pictures
To use language to show how the pictures are connected and
developed
5. REHEARSE
To be accurate
To be fluent
6. To THINK HOW TO SAY things to make it easier and clearer for
your partner?
Appendix 3
Coding Scheme: Planning activities of ESL speakers doing picture
story telling task
No. CODE
Description of CODE
Example
(A) Macro planning

1.
Plan sequence:
Scan all the pictures,
I first went through the pictures. After I got
Scan pictures, then understand what the story a general idea, I began to think how to tell
describe
is about, then think of
the story in English.
how to describe the story
2.
Plan sequence:
Look at picture
one by one, then
describe
Look at the pictures one

by one and at the same
time think about how to
describe each picture
I looked at the pictures one by one, and

think about how to tell each picture at the
same time.
Try to understand the

pictures in details and
know what the story is
about
I first read the instruction, but I had no

idea when I saw so many sheep in the
grids. I went through the pictures first and
then looked at the details.
(B) Micro planning

1.
Understand the
pictures in details
2.
Plan general things Plan general things, such

as describing the general
plot of the story
I planned general things. I needed to have

a concept first, and the details could be
added while I spoke.
3.
Plan small details
Plan small details, such

as detailed description
I think it was useful for the subtle things.

Maybe I could tell the story in a general
way right away but I might make it a mess.
4.
Plan how to tell

the story

in general
During the ten minutes, I was thinking

how to tell the story.
5.
Plan how to
express better
Plan how to describe

Based on the pictures, I considered how to
the story more vivid,
describe them in a better way and
interesting, or clearer etc. I organized my words.
6.
Think of ideas
beyond pictures
Think of more ideas

beyond the pictures
With some time to plan, I could add my

ideas if the links were strange or if
I couldnt understand the pictures.
7.
Organize the ideas

developed from
the pictures
Try to organize the ideas

developed from different
pictures to have a clear
story plot
Yes, I read it. Then, I turned to the details

and figured out how to organize them.
I sorted out how many times the sheep was
weighted and how many different sports
it did, and I also concluded that at last it
succeeded to be the same size as the other
sheep.
8.
Connect the
Try to connect or
pictures to develop structure the pictures to
the story plot
develop the story plot
I emphasized on the plots. I considered

how to connect the pictures to make the
story better.
(Continued)
125
Appendix 3(Continued)
No. CODE
Description of CODE
Example
(C) Lexical and grammar planning

1.
Lexical: General
retrieval
Try to remember the

words used in the task
Looking at a picture, I needed to consider

and recall the English words to be used for
that picture.
2.
Lexical choice:
Appropriate words
Choose appropriate
words for telling the
story
I wanted Yuni to know that the tree is

very important. I would think about
the language of every scene and the
appropriate words to the scenes.
3.
Lexical choice:
Simple words
Choose to use simple

words
I considered using simpler words which

could be understood most easily.
4.
Lexical choice:
Connective words
Choose to use some

connective words
Secondly, I added some connective words

between sentences to connect all the
pictures together.
5.
Lexical choice:
Personification
words
Choose to use some

personification words
and added some personification words.
6.
Lexical choice:
Advanced words
Try to use some

advanced words
and also use some relatively advanced

words.
7.
Lexical choice:
Various words
Try to use various words

to talk about the same
thing
A little. For example, for the cake I

thought of two words, tempt and attract.
8.
Lexical choice:
Accurate words
Try to use accurate

words
I tried to use some accurate words to

describe the pictures.
9.
Lexical
compensation:
Approximating
Use a word similar in

meaning to replace the
word which cant be
recalled
If I had any words that I didnt know how

to express, I tried to replace them with
other words.
10.
Lexical
compensation:
Circumlocution
Use a few words or

another way to replace
a word which cannot be
recalled
Secondly, for those words I didnt know or

forget how to say, I need time to find other
ways to explain them.
11.
Lexical: No/Little
concern

about vocabulary.
No, not particularly think of the use of

words.
12.
Grammar: General Think of general use of

use
grammar
Yes, I considered the use of grammar as

grammar plays an important role in story
telling.
13.
Grammar: Tense
Think of the use of

correct tense
I feel its necessary to consider it. I thought

if I should use present, past or continuous
tense.
14.
Grammar: No/
Little concern

about grammar use
For me, sometimes I didnt pay much

attention on grammar. Instead, I rely more
on my sense of the language and didnt
think much about the correctness.
No. CODE
Description of CODE
Example
(D) Metacognitive planning

1.
Rehearse: General
Rehearse what is planned I did not rehearse word by word. I just

to say (and hopefully
considered the sequences of what I would
remember better)
say.
2.
Rehearse: Accurate Rehearse to be accurate
At the same time, I think the rehearsal

could help me to be more accurate.
3.
Rehearse: Fluent
Rehearse to be fluent
I rehearsed as I want to tell a more fluent

story.
4.
Rehearse: Logical
or clear
Rehearse to check
whether what is planned
is logical or clear
I think the rehearsal could help to check

the logic of the story, and to see whether
my story is in accordance with the pictures
as well as our common way of thinking.
5.
Rehearse:
Memorization
Rehearse to help
memorizing what is
planned
The rehearsal could also help me to

memorize some of the things Ive planned
in order to deliver my story more
smoothly.
6.
Take notes
Take notes to plan the

task (although cannot
see it when doing the
task)
I made some point-form notes on the draft

paper. I usually make point-form. I noted
down the flow.
7.
Take notes: No
need
There is no need to take

notes
I didnt since the story is short; if its a long

one, I would take notes.
8.
Memorization
Try to memorize what

is planned before doing
the task
Finally, I went through my notes, referring

to the pictures; and memorize the points
planned.
9.
Aware of listeners
understanding
Try to make the listener

understand the story,
such as using simple
grammar or words
Because the task has no words but just

pictures, the planning time could help me
think about how to say things to make my
partner more easily understand.
8.
Lexical: Planned
but not used/
correctly used
During planning, lexical Though I did think about the word, when
is considered, but it is
I was telling the story, I forgot these things
not used or not correctly and just said whatever in my mind.
used when doing the task
9.
Grammar:
Planned but not
used/ correctly
used
During planning,
grammar is considered,
but it is not used or not
correctly used when
doing the task
Yes, Ive thought about grammar, but I

didnt pay much attention on it when
actually doing the task.
127
chapter 5
Get it right in the end

The effects of post-task transcribing
on learners oral performance
Li Qian
Guangdong University of Foreign Studies, China
Given the small body of existing research concerning focus on form at the post-task
stage in task-based language teaching, the present study uses a post-task transcribing
condition as a focus on form activity and explores the effects of transcribing under
various conditions. Eighty participants, divided into four experimental groups and
one control group completed four tasks with a one-week interval between each task.
Different experimental groups were assigned various post-task activities respectively.
No post-task activity was adopted in the control group. Task performance was
measured in terms of complexity, accuracy and lexical performance. The findings
are multifaceted. First of all, the adoption of post-task transcribing, in general, was
found to be efficient for different formal aspects of task performance. In the second
place, pair-based transcribing led to more syntactically complex language, whereas
the individual-based transcribing at the post-task stage led to an improvement in
lexical sophistication. Thirdly, further revision after transcribing had mixed effects
on accuracy and complexity. The findings are discussed in light of the concepts of
noticing and attention, interaction theory and other related SLA theories. Based on
the theoretical discussion, pedagogical implications are proposed.
Introduction
In second language pedagogy, one of the major issues is what Stern (1983) called
the code-communication dilemma (Ellis 2008). This is reflected in the dichotomy
between instructed and naturalistic L2 learning. There are advocates of grammar teaching who view grammar instruction as the foundation for second language
acquisition (Bialystok 1990; McLaughlin 1990; Rutherford 1988). By contrast, there
are also advocates of the zero option (Ellis 2008) which proposes abandoning formal instruction and allowing learners to construct their interlanguage naturally
through communication rather than for communication (Krashen 1982, 1985;
Prabhu 1987).
130 Li Qian
In the past two decades or so, researchers have come to realize that adopting a onesided approach, either communicative-based or grammar-based, leads us nowhere.
Pedagogical interventions need to be interwoven into primarily communicative activities so as to overcome the limitations of both traditional grammar instruction and
communicative language teaching (Doughty & Williams 1998c). Among the variety
of proposals regarding the incorporation of formal instruction in communicative settings (see the review in Norris & Ortega 2000), Focus on Form (FonF) has received
increasing interest in the last two decades. In his seminal work, Long (1991) initially
introduced the notion as follows: focus on formovertly draws students attention to
linguistic elements as they arise incidentally in lessons whose overriding focus is on
meaning or communication (Long 1991, p. 4546). Based on this theoretical notion,
Long and Robinson (1998) later raised a more pedagogically applicable definition:
Focus on form consists of an occasional shift in attention to linguistic code features
by the teacher and/or one or more students triggered by perceived problems with
comprehension or production.
(Long & Robinson 1998, p. 23)
A plethora of studies have been conducted either in L2 classrooms or in laboratory

settings (Doughty 2003; Doughty & Williams 1998c; Ellis 2001; Fotos & Nassaji 2007;
Norris & Ortega 2000; Spada 1997). There is now substantial evidence that when
learners attention is directed at linguistic forms and the meanings that they encode
in the context of meaning-focused activities learning takes place (see reviews in
Ellis 2001, 2008). Focus on form is claimed to be necessary to push learners beyond
communicatively effective language use toward target-like second language proficiency
(Doughty & Williams 1998b). It is, therefore, desirable that the focus on form option
should be embedded in a communicative context for the learner to achieve both communicative effectiveness and high levels of language performance. As a strong version
of communicative language teaching, task-based instruction provides such a setting
for the operationalization of focus on form.
Literature review
Based on such general agreement on the importance of a focus on form in a communicative setting, researchers have explored, in a variety of effective ways, how to
achieve form focus. Within a task-based framework (Ellis 2003; Skehan 1998), most
of the previous focus on form studies have concentrated on the pre-task or duringtask stages. In these studies, focus on form options, such as pre-task planning, negotiation of meaning, feedback, and collaborative dialogue tend to occur before or during
task performance (Ellis 2003, 2008; Leeser 2004; Lightbown & Spada 1990; Loewen
2005; Pica 2002).
In contrast, there is, rather surprisingly, a paucity of research with regard to the
effects of focus on form at the post-task stage. One of the reasons may be that in a
research context, some researchers have viewed task-based instruction as consisting
only of pre-task activity and task performance (or even task performance only in some
cases). Once the task is accomplished, the job is done. However, that should not necessarily be the case in L2 language pedagogy. Willis (1996) pointed out that:
task-based learning is not just about getting learners to do one task and then another.
If that were the case, learners would probably become quite expert at doing tasks and
resourceful with their language, but they would almost certainly gain fluency at the
expense of accuracy. .to promote constant learning and improvement, we should
see it (to do a task) as just one component in a larger framework. (p. 40)
Willis (1996) outlined a framework for task-based learning as consisting of three

phases: a pre-task phase, a task cycle and a post-task language-focus stage, and argued
that in addition to exposure and use of language, it is necessary to provide certain
kinds of form-focused instruction, either teacher-led or student-centered, after the
task cycle. Similarly, task researchers (Ellis 2003; Skehan 1996, 1998) found it necessary to include a post-task stage and distinguish three major stages for the implementation of tasks (pre, during and post-task stages). In particular, Skehan (1996,
2007) has suggested that post-task activities cannot be neglected, and he proposes that
learners foreknowledge of a post-task activity would influence how they allocate their
attention during an actual task.
Skehan and Foster (1997) examined the effect of the anticipation of a certain
post-task activity (to redo the task publicly) on three tasks (a personal, a narrative and a decision-making task). As they predicted, foreknowledge of public performance did not influence fluency and complexity, but had a significant effect on
accuracy in the decision-making task. Although only one task (out of three) showed
a significant accuracy effect, the research results were still encouraging in that the
study showed that first, a post-task activity can be functional when it is operationalised in a suitable task context; secondly, the clearest effect of a post-task activity is
on accuracy. This supports the claim that the post-task stage may be effective in promoting a focus on form. In an attempt to account for the lack of an effect of the public performance condition on two of the tasks, Foster and Skehan (2013) proposed
that the weakness of the effect in Skehan and Foster (1997) might have been due
to the nature of the activity that is, public performance might have had a remote
influence on all the participants since only a small number (not all) of the students
would be selected to perform in public. With regard to individual differences, public
performance may not be viewed as a threat by certain participants (especially the
more confident learners). In such an activity, for learners with different orientations
in their performance (complexity-oriented, fluency-oriented or accuracy-oriented),
131
132
Li Qian
their attention may not necessarily be directed to form to achieve higher levels of
accuracy. In brief, public performance does not represent the whole picture of what
post-task activities are about.
In a subsequent study, Foster and Skehan (2013) used an alternative post-task
activity transcribing ones own performance, in order to deal with the restrictions
of public performance. Transcribing is an individual activity in which every participant re-examines their own task performance. When pushed to transcribe their own
performance, all learners are forced to pay attention to the formal aspects of their own
language during the task (Foster & Skehan 2013). The researchers divided the participants into experimental and control groups. The experimental group transcribed
extracts of their own task performance as a post-task activity, while the control group
did no post-task activity. Both groups performed two types of tasks: narrative and
decision-making, in two counterbalanced task cycles. The results showed that (1)foreknowledge of transcribing as a post-task activity had a significant accuracy effect
on task performance in both narrative and decision-making tasks, (2) significantly
greater complexity was found for the experimental group in the decision-making task,
(3) with regard to fluency, length of run (i.e. the number of words uttered before any
breakdown or repair occurs) was significantly greater for the post-task condition in
the decision-making task. As compared with Skehan and Foster (1997), the results of
this study are clearly more positive and supportive not only for the accuracy effect,
but also with regard to the effects of a post-task transcribing phase on complexity and
one measure of fluency. As far as task type is concerned, the findings suggested that
the decision-making task was more sensitive to post-task effects, not only in terms
of accuracy, but also of complexity. In contrast, narrative performance appears to be
more difficult to influence. These results may push researchers to pay more attention
to differences in task type and how these interact with other task conditions (Skehan&
Foster 2001; Skehan2007).
Skehan and Fosters post-task research is pioneering in the task-based research
field and shows some preliminary results which could inspire further studies. In addition, the post-task phase has also attracted some attention in the literature on second
language pedagogy. From a teachers perspective, Lynch (2001, 2007) investigated the
process of transcribing as a post-task activity in an L2 classroom setting. Based on
observation of classroom learning, two limitations were identified by Lynch (2001).
First, learners received too much feedback on learner activity, and had too little time
for reflection on the forms of the language. They kept on doing activities with few
opportunities to reflect on their language gains and deficiencies. Second, given that
only high-proficiency learners were aware of the changes in their language during the
activity (Lynch & Maclean 2001), devising ways to help learners analyse their performance is necessary. In such a context, Lynch employed transcribing as a reflective
noticing activity for classroom learning. In contrast to employing transcribing as an
individual task, as in Foster and Skehans (2013) study, Lynch (2001, 2007) designed
the activity to be conducted in pairs as a form of collaborative transcribing. In Lynchs
(2001) study, learners were asked to transcribe together, then discussed and revised
the transcripts, and submitted the revised transcripts to the teacher for further corrections and reformulation. The analyses of the process and product of these cycles suggested that collaborative transcribing and revising can encourage learners to focus on
form in a relatively natural way. Furthermore, the teacher can play an important role in
this post-task intervention, especially with regard to the improvement of vocabulary
use (Lynch 2001).
In a subsequent study, Lynch (2007) compared the effects of two different transcribing groups student-initiated transcribing (students in pairs transcribed their
own performance and then revised the transcripts) and teacher-initiated transcribing
(the teacher transcribed problematic extracts of learners performance, and the transcripts with errors were given back to the pairs for their revision). The analyses of the
subsequent performance showed that both procedures are manageable under normal
classroom conditions, and suggested that the student-initiated transcribing was more
effective in helping the learners to maintain higher accuracy levels in the highlighted
forms which were revised by students themselves.
The previous studies, either from a researchers or from a teachers perspective,
show encouraging results with regard to the effect of post-task activities. Comparing
the findings of two post-task activities (redo the task publicly or transcribing task performance) in Skehan and Fosters two studies (1997, 2013), we find that transcribing
may be adopted as a more feasible, manageable and effective post-task activity in communicative language classrooms. Similarly, Willis and Willis (2007) also suggest that
transcribing as an effective post-task pedagogical choice is appealing to both teachers and learners. Some teachers have already begun to employ this activity in their
classrooms and found positive effects of transcribing on students oral performance
(Clennell 1999; Mennim 2003, 2011; Stillwell et al. 2010).
In view of the relative infancy of the research base on post-task activities, it is
not surprising that the available studies have some serious limitations. For instance,
in Lynchs two studies and some L2 classroom-based studies (Mennim 2003, 2011;
Stillwell et al. 2010), the sample sizes were not large enough for statistical analyses,
and so no statistical results were reported, which causes problems in generalizing
to other classrooms. In Foster and Skehan (2013), post-task transcription was performed outside the actual classroom. So, some unexpected intervening variables may
have been at play with transcribing taking place beyond the supervision of the teachers. In addition, in the previous studies (Clennell 1999; Mennim 2003, 2011; Stillwell
et al. 2010; Foster& Skehan 2013), participants were engaged in the activity of posttask transcribing only twice, which may not be sufficient to demonstrate strong treatment effects.
133
134 Li Qian
Aims of the present study

Given these limitations with the previous studies, the present study explores the effects
of post-task transcribing under various conditions. First of all, transcribing is carried
out either individually or in pairs. In Foster and Skehan (2013), transcribing was performed individually. That is, each participant was asked to transcribe their task performance (including interactive task performance) by him/herself. In contrast, Lynch
(2001, 2007) adopted a pair work style which he called collaborative transcribing. The
present study aims to explore which of these two conditions (individual versus collaborative work in pairs) is more effective in terms of enhancing language performance.
Secondly, some of the experimental groups will take part in a revision condition.
In Lynch (2001, 2007), when they were involved in revising and reformulating, learners showed a clear performance improvement. Given the small sample size in Lynchs
study (N = 16), no generalizations could be made concerning the effects of revision.
Participants in the current study (N = 80) will be engaged either in reflective self revision, interactive peer revision or no revision conditions to disentangle the distinctive
impacts of these alternative procedures.
In particular, the goals of the current study are (i) to investigate whether transcribing as a post-task activity has an effect on formal aspects of task performance; (ii) to
compare the effects of different transcribing conditions (individual work versus pair
work, revision versus no revision) on language performance.
In addition, as compared with previous research, participants in the present study
are engaged for a longer period, that is, four task sessions with a one week interval
between each session. Adopting multiple task sessions is expected to make the measurement of the treatment effect more revealing in exploring learners performance
changes over time.
The research questions which guided this study are the following:
1. Does post-task transcription have an effect on second language learners oral
performance?
2. Do individual transcription and pair transcription have differential effects on
second language learners oral performance?
3. Do transcribing-only and transcribing-and-revising conditions have differential
effects on second language learners oral performance?
Research methodology
Participants
Eighty participants were included in this study. All of them were second-year university students from a South China university aged between nineteen and twenty-one.
They were non-English majors, among whom forty-one were female and thirty-nine
were male. They had been studying English for 710 years, with low intermediate to
intermediate English proficiency. The participants were randomly divided into four
experimental groups and one control group only according to the time slots which
would be available for them to attend the study.
Prior to the experiment, to explore the comparability of the five groups regarding English proficiency, the participants were administered a cloze test. In language
testing, cloze tests have frequently been adopted as a valid and reliable instrument to
assess overall language proficiency (Brown & Rodgers 2002). In this study, the cloze
test was composed of three cloze passages, which were adapted from the nation-wide
standardized China College English Test Band 4 (CET-4) Database for non-English
majors at low intermediate to intermediate English proficiency level.
A one way ANOVA showed no significant difference among the five groups on the
cloze test results, F(4, 78) = .628, p = .679. On this basis, the groups were found to be
comparable with regard to English proficiency.
Experimental tasks
The present study used six tasks in total. There were three narratives (one for practice
and two for actual data collection), and three interactive decision-making tasks (again,
one for practice and two for data collection). The four treatment tasks were carefully
arranged in order to tease out the intervening influence of task type and task order. The
two sub-groups under each treatment group were assigned the same tasks in reverse
order to counterbalance any intervening effect of task sequence. In addition, the different topics were arranged in a balanced order. Table 1 below shows the arrangement
of task type and task topics.
In the narrative tasks, the participants described stories from cartoon episodes
of Tom and Jerry. They watched the cartoons, and then retold the story. In the interactive decision-making tasks, the participants, in dyads, acted as the editors of the
Table 1. Arrangement of task types and task topics
Sub-group 1
Sub-group 2
Cycle 1
narrative task a
(topic: puppy tale)
interactive task b
(topic: unbalanced degree)
Cycle 2
interactive task a
(topic: cyber love)
narrative task b
(topic: baby butch)
Cycle 3
interactive task b
(topic: unbalanced degree)
narrative task a
(topic: puppy tale)
Cycle 4
narrative task b
(topic: baby butch)
interactive task a
(topic: cyber love)
135
136 Li Qian
problem column of a magazine (cf. Skehan & Fosters (1997) Agony Aunt task). In
each task, they discussed the problem in a letter written to the magazine and agreed
upon the best advice for the writer. Each letter described a certain tricky personal
situation that did not have a simple or obvious solution (Skehan & Foster 2001).
(See Appendix for task instructions and problem letters for the interactive decisionmaking tasks).
Procedures
The participants were seen five times at one-week intervals, the first time for orientation and the other four times for main study data collection. Prior to the data collection,
an orientation session was given to ensure that the participants were well informed
regarding the task procedure and the basic transcribing skills that were required (for
the experimental groups). In addition, task practice for both task types was expected
to reduce the participants performance anxiety (Bygate 1996, 2000, 2001).
In the task session, each participant was provided with a tape-cassette recorder.
For the narrative task, a cartoon episode from Tom and Jerry was played to the participants. After two minutes planning, every participant was asked to describe the episode
to the recorder as if they were telling the story to someone else who had not watched
the cartoon. The recordings were then used for post-task transcribing. With the interactive decision-making tasks, participants, working in dyads, were given the problem
letter and then discussed with each other to agree what good advice could be given.
The pairs were the same in each session. Their discussion performances were recorded
for transcribing as well.
At the post-task stage, participants transcribed part of their recordings from the
tapes with paper and pencil. As for the length of the transcribed performance, in a
narrative task, the participants who worked individually transcribed a 3-minute performance starting from a certain story point. For the pair-work groups, this was more
complicated. Each member of the dyad contributed a 1.5-minute performance in a
continuous storyline. For example, starting from a story point (The door is too small
for the puppy to come in), participant As 1.5-minute performance were transcribed,
and the transcription ended with another story point (Tom was angry) from which
participant Bs 1.5-minute performance started. In this way, the story content contributed by the dyads was supposed to be different from each other so as to avoid any competition concerning the quality of the oral performance between the two members.
Several story turning points in the cartoons were provided for free selection by each
pair. In the decision-making tasks, considering that between-interlocutors pauses
tended to occur more frequently than in the narrative task, a five-minute conversation
between a dyad was assigned to be transcribed either individually or in pairs. To follow previous research procedures (Lynch 2001, 2007) that five minutes are needed and
enough for learners to transcribe their own one-minute speech, in the present study
fifteen and twenty minutes were allocated for the transcription of the narrative and
interactive performance respectively.
The four experimental groups were assigned to carry out post-task transcribing
activities in different conditions as follows (see the research design in Table 2):
Group I individual transcribing group: the participants worked individually to
transcribe the recorded performance.
Group II individual transcribing and revising group: the participants individually
transcribed the recorded performances, and then revised the original transcripts.
Group III pair transcribing group: the participants worked in pairs to transcribe
the recorded performance.
Group IV pair transcribing and revising group: the participants worked in pairs to
transcribe the recorded performance, and to revise the original transcripts.
Group V the control group did the same tasks as the experimental groups did.
However, no post task activities were involved after the task performance.
The transcripts were then collected by the researcher for further qualitative study.
At the end of the last task cycle, an interview was carried out to gather feedback
on the study, specifically participants reflections on their task performance and on the
post-task activities.
Table 2. Research design

Individual
transcribing
group
(N = 16)
Individual
transcribing &
revising group
(N = 16)
Pair
transcribing
group
(N = 16)
Pair
transcribing &
revising group
(N = 16)
Control
group
(N = 16)
SubSubSubSubSubSubSubSubSubSubgroup 1 group 2 group 1 group2 group 1 group 2 group 1 group2 group 1 group 2
(n = 8) (n = 8) (n = 8) (n = 8) (n = 8) (n = 8) (n = 8) (n = 8) (n = 8) (n = 8)
cycle 1
Na1
P12
Ib
P1
Na
P2
Ib
P2
Na
P3
Ib
P3
Na
P4
Ib
P4
Na
Ib
cycle 2
Ia
P1
Nb
P1
Ia
P2
Nb
P2
Ia
P3
Nb
P3
Ia
P4
Nb
P4
Ia
Nb
cycle 3
Ib
P1
Na
P1
Ib
P2
Na
P2
Ib
P3
Na
P3
Ib
P4
Na
P4
Ib
Na
cycle 4
Nb
P1
Ia
P1
Nb
P2
Ia
P2
Nb
P3
Ia
P3
Nb
P4
Ia
P4
Nb
Ia
Note: 1. N
a: narrative task puppy tale; Nb: narrative task baby butch;
Ia: interactive decision-making task cyber love; Ib: decision-making task unbalanced status
2. P1: post-task individual transcribing; P2: post-task individual transcribing & revising;
P3: post-task pair transcribing; P4: post-task pair-transcribing & revising.
137
138 Li Qian
Research design
In this large-scale study, a 4 2 2 research design was employed (see Table 2).
The first independent variable, the post-task transcribing condition, is a betweensubject factor with five levels: (1) individual transcribing only, (2) individual transcribing and revising, (3) pair transcribing only, (4) pair transcribing and pair
revising, plus a control group. The second independent variable, task type, is a
within-subject factor with two levels: the narrative task and the decision-making
task. The third independent variable, task session, is a within-subject factor with
two levels: the first and the second sessions for either narrative or decision-making
tasks. The dependent variable is the oral performance which was measured in terms
of complexity, accuracy, and lexical performance. The major focus in this chapter
is the post-task transcription condition, but with some consideration of task differences (narrative vs. interactive). Space does not enable extensive coverage of the
factors of time or task-session.
Data transcription and coding

The data (i.e. participants task performances) were first digitized and transcribed for
research analysis. Then, the transcribed performance data were further divided into
speech units (AS-units) (Foster, Tonkyn & Wigglesworth 2000). Subsequently, the
documents with AS-unit divisions were further arranged according to two formats:
the CHAT format to enable CLAN sub-programs to be used, particularly VOCD, and
the Task Profile (TP) format (Skehan, Chapter 1, this volume) which outputs a wide
range of complexity and accuracy measures, as well as the index Lambda of lexical
sophistication (Read 2000, and see discussion in Chapter 1, this volume). The different
measures of learners performance are summarized in Table 3.
Table 3. Different measures of task performance
Performance
Index
Description
Clause per AS unit
The number of clauses per AS unit
Mean length of AS unit
The number of words per AS unit
Ratio of error-free
clauses
The total number of error-free

clauses divided by the number of clauses
Errors per 100 words
The number of errors per 100 words
Lexical Diversity
Corrected type/token ratio

(Malvern & Richards 2002)
Lexical
Sophistication
Lambda
Use of less frequent words
Complexity
Accuracy
Data analysis
Data were analysed using SPSS15.0. To address Research Question One concerning
the effects of post-task transcribing, a Multivariate Analysis of Variance (MANOVA)
was performed. For Research Questions 2 and 3 concerning the effects of pair/individual transcribing and the effects of further revision, two-way MANOVAs were performed to consider the two independent variables simultaneously. All the MANOVAs
were followed by post-hoc comparisons of all the examined conditions to identify
which groups were significantly different from the others. Furthermore, effect size
(Cohensd)1 calculations were conducted to demonstrate the magnitude of any significant effect.
Results
The presentation of the results is organized to address Research Questions 1 to 3 in
sequence. For Research Question 1, the focus is on the general effects of post-task
transcribing by comparing the four post-task groups with the no post-task control
group. To address Research Questions 2 and 3, the focus is on the effects of individual/
pair conditions and the effect of revision on learners task performance, and the comparison was among the four experimental groups.
Results for Research Question 1

To explore the effect of post-task transcribing, a comparison was made between the
post-task groups and the no-post-task control group. As a preliminary step, performance on the first task performed by the experimental groups was compared with that
of the control group, to explore whether there were any significant differences between
the five groups at the beginning. The descriptive statistics for the two task performances of each task type are presented in Table 4. The rows show mean scores and
standard deviations for the different experimental conditions and the control group,
with scores for the narrative task and the decision-making task. Columns show the
scores for each dependent measure, for the first performance of the task (the baseline
condition) and then for the second performance. The latter represents the crucial
value for the present study, and the one used in statistical comparisons.
. (1) when d is smaller than 0.2, it can be regarded as a small effect; (2) when d is between 0.2
and 0.8, a medium effect; and (3) when d is larger than 0.8, a large effect. (Cohen 1988)
139
140 Li Qian
Table 4. Descriptive statistics for two task performance of each task type
Group
Complexity
Accuracy
Task
Clause ratio
Ratio of
error-free clauses
Errors/100 words
Task 1
Mean
(SD)
Task 2
Mean
(SD)
Task 1
Mean
(SD)
Task 2
Mean
(SD)
Task 1
Mean
(SD)
Lexical performance
Mean length of
AS unit
Lexical diversity
Lexical
sophistication
Task 2
Mean
(SD)
Task 1
Mean
(SD)
Task 2
Mean
(SD)
Task 1
Mean
(SD)
Task 2
Mean
(SD)
Task 1
Mean
(SD)
Task 2
Mean
(SD)
Individual
transcribing
(n = 16)
N*
.39
(.12)
.52
(.13)
12.46
(5.88)
9.61
(3.94)
1.25
(.12)
1.38
(.13)
7.84
(.86)
9.20
(1.25)
30.75
(5.33)
36.78
(6.28)
1.99
(.37)
2.12
(.33)
.60
(.11)
.70
(.07)
8.73
(3.02)
7.51
(2.70)
1.77
(.14)
1.88
(.09)
5.52
(1.72)
6.22
(1.77)
57.83
(12.82)
57.70
(11.46)
1.37
(.25)
1.64
(.25)
Individual
transcribing
& revising
(n = 16)
.37
(.13)
.54
(.11)
13.36
(5.60)
8.87
(3.26)
1.33
(.12)
1.42
(.15)
8.49
(1.75)
9.44
(1.33)
34.65
(13.69)
33.28
(5.97)
1.89
(.30)
2.43
(.11)
.63
(.12)
.78
(.05)
7.37
(2.70)
6.03
(1.91)
1.70
(.25)
1.76
(.15)
6.07
(1.61)
6.87
(2.02)
54.66
(12.36)
54.39
(11.53)
1.49
(.22)
1.72
(.14)
Pair
transcribing
(n = 16)
.34
(.11)
.51
(.10)
14.71
(4.34)
10.35
(3.47)
1.29
(.16)
1.53
(.13)
7.95
(1.34)
8.79
(1.29)
29.54
(6.96)
34.43
(5.49)
1.86
(.39)
2.17
(.28)
.55
(.12)
.71
(.10)
9.62
(4.42)
5.50
(2.72)
1.66
(.13)
1.90
(.14)
6.58
(2.02)
9.71
(1.78)
56.80
(8.25)
57.93
(8.20)
1.35
(.21)
1.54
(.13)
Pair
transcribing
& revising
(n = 16)
.32
(.11)
.51
(.13)
12.63
(5.10)
8.85
(3.18)
1.30
(.12)
1.47
(.14)
7.79
(1.06)
8.16
(1.30)
31.44
(5.63)
35.78
(5.09)
1.89
(.41)
2.11
(.30)
.56
(.08)
.77
(.05)
9.17
(3.47)
6.21
(2.72)
1.63
(.10)
1.86
(.15)
6.16
(1.12)
9.47
(2.03)
53.35
(10.96)
57.46
(9.97)
1.36
(.17)
1.45
(.15)
Control
(n = 16)
.35
(.11)
.38
(.08)
13.14
(5.20)
12.06
(5.49)
1.32
(.14)
1.35
(.12)
7.60
(1.31)
7.80
(1.32)
29.55
(6.58)
37.78
(7.69)
1.93
(.56)
2.04
(.48)
.55
(.06)
.56
(.06)
10.30
(2.86)
9.84
(3.14)
1.67
(.14)
1.67
(.14)
5.70
(.52)
5.78
(.60)
50.75
(9.28)
60.96
(11.13)
1.24
(.25)
1.33
(.23)
(*N: narrative I: interactive)
A series of MANOVAs performed on the Task One (baseline condition) values in

terms of complexity, accuracy and lexical performance showed no significant differences between the control group and the experimental groups.
The second task performance was analyzed to see whether the scores of the experimental groups, as a result of the involvement of post-task transcribing, were significantly different from the control group. The detailed statistics show that in terms of
accuracy in the second narrative task, the four experimental groups produced more
error-free clauses (.52, .54, .51, .51 for the experimental groups and .38 for the control
group respectively) and fewer errors per 100 words (9.61, 8.87, 10.35, 8.85 for the
experimental groups) than the control group (12.06). In the second interactive task,
the experimental groups produced more error-free clauses (.70, .78, .77, .71 for the
four experimental groups) than the control group (.56). The former produced fewer
errors per 100 words (7.51, 6.03, 5.50, 6.21 for the experimental groups) than the control group (9.84). One-way MANOVA results (see Table 5) show that, in the second
narrative task, the experimental groups produced significantly more error-free clauses
than the control group did (p = .001, Cohens d = 1.72); in the second interactive task,
the experimental groups produced significantly more error-free clauses than the control group did (p = .001, Cohens d = 3.74), and significantly fewer errors per 100 words
than the control group (p = . 001, Cohens d = 1.81).
As for complexity, the detailed statistics show that in the second narrative task, for
the clauses per AS unit measure (1.38, 1.42, 1.53, 1.47) and the mean length of the AS
units in words (9.20, 9.44, 8.79, 8.16), the scores of the experimental groups were higher
than those of the control group (1.35 for clause ratio, 7.80 for mean length of AS unit).
In the second interactive task, the experimental groups produced more clauses per
AS unit (1.88, 1.76, 1.90, 1.86) than the control group (1.67), and also p
roduced more
words per AS unit (6.22, 6.87, 9.71, 9.47) than the control group (5.78). MANOVA
results (see Table 5) show that in the second narrative task, the experimental groups
produced significantly more clauses per AS unit than the control group (p = .007,
Cohens d = 1.43). In the second interactive task, the experimental groups were significantly superior to the control group in both complexity measures (clause ratio per AS
unit: p = .003, Cohens d = 1.88; mean length of AS unit: p = .001, Cohens d = 3.06).
Regarding lexical performance, the detailed statistics show that the control
group (37.78 for the narrative task; 60.96 for the interactive task ) scored higher than
the four experimental groups (36.78, 33.28, 34.43, 35.78 in the narrative task; 57.70,
54.39, 57.93, 57.46 in the interactive task) in terms of lexical diversity. As for lexical
sophisitication the control group (2.04 for the narrative task;1.33 for the interactive task) scored lower than the experimental groups (2.12, 2.43, 2.17, 2.11 for the
narrative task; 1.64, 1.72, 1.54, 1.45 for the interactive task). MANOVA results (see
Table5) show that the experimental groups used significantly more sophisticated/
infrequent words than the control group (narrative task performance: p = .010,
141
142 Li Qian
Cohens d = 1.16; interactive task performance: p = .000, Cohens d = 2.12). There was
no significant difference between the experimental groups and the control group as
far as lexical diversity was concerned.
Table 5 provides summary information on the MANOVA results for Research
Question 1. The results concerning the performance on two different task types are
presented separately. Under each task type, probability values of the multiple indexes
of complexity, accuracy and lexical performance are provided, together with the effect
sizes.
Table 5. One-way MANOVA results for Research Question 1
Task type
Task performance
Sig.
clauses per AS unit
3.43
.007*
1.43, large
words per AS unit
2.25
.056
0.57, medium
ratio of error free clauses
4.39
.001*
1.72, large
error per 100 words
1.45
.214
0.46, medium
lexical diversity
1.13
.350
0.4, medium
lexical sophistication
3.11
.010*
1.16, large
clauses per AS unit
4.02
.003*
1.88, large
words per AS unit
2.83
.000*
3.06, large
26.63
.000*
3.74, large
7.50
.000*
1.81, large
.68
.643
0.31, medium
8.30
.000*
2.12, large
Dependent variables
Effect size
Narrative task
Complexity
Accuracy
lexical performance
Interactive task
Complexity
Accuracy
ratio of error free clauses

error per 100 words
lexical performance
lexical diversity
lexical sophistication
(*p < .01)
In brief, the experimental groups, as a result of post-task involvement, were

s ignificantly superior to the control group in terms of accuracy, complexity and lexical
sophistication.

Research Question 2 considers whether there are any significant differences between
individual-based and pair-based post-task transcribing. MANOVA results show that
in the second narrative task, the pair transcribing groups produced significantly more
clauses per AS-unit (1.53, 1.47) than the individual-based groups (1.38, 1.42) (p =.007,
Cohens d = 0.7). On the other hand, the latter used significantly more words per AS-unit
(9.20, 9.44) than the former (8.79, 8.16) (p = .05, Cohens d = .051,). It may, therefore, be
inferred that the pair transcribing groups produced more but shorter clauses or clause
elements than the individual groups, whereas the latter adopted more words but simpler syntax (with less subordination).
In the second interactive task, the pair groups used more words per AS-unit (9.71,
9.47) than the individual groups (6.22, 6.87) (p = .001, Cohens d = 1.76,), without any
significant differences noted for the ratio of clauses per AS unit (1.88, 1.76 for the individual groups; 1.90, 1.86 for the pair groups). Table 6 shows the results for both task
types in terms of the complexity measures.
With both task types, there were no significant effects of individual/pair transcribing for accuracy and lexical performance.
Table 6. Significant effects of individual/pair conditions in both task types: Complexity
Groups
Narrative Task
Interactive Task
Dependent variables
Sig.
Effect size
pair > individual
clause ratio
7.69
.007*
0.7, medium
individual > pair
words per AS-unit
3.96
.050
0.51, medium
pair > individual
words per AS unit
6.28
.001*
1.76, large
(*p < .01)

Research Question 3 investigated whether there is any effect for revision after posttask transcribing as compared to a no-revision condition. The MANOVA results show
that in the second interactive task, as far as accuracy is concerned, the revision groups
produced a significantly larger proportion of error-free clauses (.78,.77) than the norevision groups (.70, .71) (p = .001, Cohens d = 1.09,). No significance was found for
the other accuracy measure.
As for complexity, in the second interactive task, the no-revision groups used more
clauses per AS-unit (1.88, 1.90) than their counterparts (1.76, 1.86) (p = .014, Cohens
d = 0.64), but they did not significantly differ in terms of words per AS-unit (6.22, 9.71
for the no-revision groups; 6.87, 9.47 for the revision groups). Table 7 shows the significant effects of revision/no-revision condition in the interactive task performance,
together with associated effect sizes.
Table 7. Significant effects of revision/no-revision condition: Accuracy & Complexity
Interactive Task
(*p < .01)
Groups
Dependent variables
Sig.
Effect size
revision> no revision
accuracy: error-free ratio
7.16
.001*
1.09, large
no revision> revision
complexity: clause ratio
3.56
.014
0.64, medium
143
144 Li Qian
The MANOVA results showed that in the second narrative task, the involvement
of revision brought about no significant differences in terms of the various aspects of
task performance. Nor was any significance found with regard to lexical performance
in both task types. In brief, the involvement of revision led to a more accurate, but less
complex interactive task performance, although there was no such effect of revision on
narrative task performance.
In addition to the above main effects of both post-task conditions, there is an
interaction effect between the individual/pair condition and the revision/no-revision
condition, although this is only on the measure of lexical sophistication (p = .007,
Cohens d = 0.7). Specifically, in the narrative task, the revision condition pushed
the individual transcribing groups to produce significantly more low-frequency
words than the pair groups. On the other hand, the no-revision condition was associated with the pair transcribing groups using more low-frequency words than their
counterparts.
Summary of the results

To sum up, in the present study, post-task transcribing, as a focus on form activity at
the post-task stage, was effective in producing more accurate and complex language
in task performance. Furthermore, given that post-task transcription was operationalised in more than one way, the distinctive role of individual/pair transcribing and
the effect of revision after transcribing were explored. In this study, pair-based transcribing was effective, although to a limited extent, for the improvement of syntactic
complexity. The involvement of revision after transcribing brought about a positive
effect on accuracy, but a negative effect on complexity.
Discussion
There are four major sections to this discussion section. First, I will discuss the general
effects of post-task transcribing, followed by a discussion of the differential effects of
transcribing on the various aspects of performance. This is followed by a section on the
role of interaction, while the last section discusses the importance of revision as a step
which follows transcription itself.
Effects of post-task transcribing

The results reveal that the post-task groups are superior to the control group in terms
of accuracy, complexity and lexical sophistication. In other words, focus on form at
the post-task stage is beneficial for the improvement of L2 performance. This finding
is consistent with previous studies (Mennim 2003; Stillwell et al. 2010).
To account for the effects of post-task transcribing, two issues may be related: the
foreknowledge of post-task transcribing, and the operationalization of t ranscribing.
Regarding the first, prior to task performance, participants were informed that they
would transcribe their performance recordings afterwards. It appeared that the
foreknowledge of transcribing played a role in directing participants attention to
formal aspects of performance because this may remind them that task performance
is not an end in itself, but instead is connected with wider pedagogic concerns
(Foster & Skehan 2013). This may have emphasized the importance of the quality of
task performance. Participants, therefore, were cautious during performance to keep
a balance between fluent communication and language accuracy. Accordingly, they
appeared to pay attention not only to meaning transmission for task accomplishment, but also allocate certain attention to the formal aspects of performance for a
more satisfactory transcribed performance. Even so, it was not clear whether they
shifted their attention unconsciously or intentionally.
The operationalization of post-task transcribing possibly may have had some
effects on language development as well, although these suggestions are rather speculative. One of the most evident advantages of post-task transcribing is that it affords
participants opportunities to attend to formal aspects of language performance.
During task performance, most of the learners attentional focus was probably on
communication and meaning to get the task done. In contrast, at the post-task stage,
it was likely that more attention could be released to consider the formal aspects of
task performance, because meaning would not compete for major attention any longer
(Foster & Skehan 2013; Lynch 2001, 2007). Under these conditions, noticing, which is
a prerequisite for language change and acquisition (Schmidt 2001), may occur more
easily and naturally at the post-task stage.
However, if attention for formal aspects of language is available, this only offers
the possibility for noticing to take place, but does not guarantee its occurrence. In the
present study the performance transcripts pushed participants to notice, remember,
and reproduce the processed language forms (Lynch 2007). On the one hand, transcripts transformed the oral task performance into a written form which may have
further prompted the learners to attend to the formal aspects of their performance
(Doughty & Williams 1998a). On the other hand, the transcription reactivated the task
performance and, in this way, may have led to deeper processing, such as cognitive
comparison. As Doughty (2001) says:
If the verbatim format of recent speech remains activated in memory and available for
use in subsequent utterance formulation, this can be taken to be an important cognitive
underpinning for facilitating the opportunity to make cognitive comparisons (p. 253).
Cognitive comparisons may be made between the transcripts and the target language
(i.e. noticing the gap) or between the missing forms in the transcripts and the existing
145
146 Li Qian
counterparts in the target language (i.e. noticing the hole), both of which functioned
effectively for the improvement of formal aspects in the current research.
Different effects of post-task transcribing on various aspects

of task performance
Post-task transcribing proved to be effective for language improvement. However, the
effects of focus on form were not uniform for different aspects of task performance,
with the strongest effect of transcribing on accuracy and a more limited effect on
complexity.
In line with previous post-task studies (Skehan & Foster 1997; Foster & Skehan
2013; Lynch 2001, 2007), the present study reveals that accuracy is the performance
area that most clearly shows the effects of post-task transcribing. Accuracy in language
use can arise from three interacting sources: the degree of accuracy of the language
representation itself; the strength of competing representations; and the degree of
automatization of language production (Wolfe-Quintero et al. 1998). The first source
of accuracy is dependent upon the learners established interlanguage (IL) system and
long-term memory, and the other two pertain to on-line real-time language use and
rely on the allocation of attention. Given its cross-sectional nature, the present study
is mainly concerned with the factors related to on-line accurate language use, namely
the competition between different representations and the degree of automatization of
language production. It is reasonable to assume that between the two competing representations (i.e. accuracy and complexity), participants focused most of their attention on language conformity to the target forms and avoided errors in their attempts
to achieve such a goal. Further, the multiple involvements of task practice and posttask transcribing increased the likelihood for language use to be automatized to some
extent and consistently channelled learners attention to the accuracy aspect of their
performance.
Even so, we found some effects of post-task transcribing on complexity, although
the effects were not as strong and consistent as those on accuracy. It was found that the
effects of post-task transcribing on complexity were shown in all the pair-transcribing
groups in the interactive tasks. Thus, it may be inferred that it was the interactivity
resulting from either the pair transcribing activity or the interactive task type, rather
than the simple involvement of post-task transcribing itself, that generated the positive effects on complexity. The role of interaction in the pair transcribing condition is
discussed in the next section.
The current study reveals that focus on form at the post-task stage may be
facilitative for the improvement of accuracy and complexity, although to a different extent. In the literature, many of the focus-on-form studies have only been
concerned with the effects on accuracy, using measures related to target-like forms
(Doughty & Williams 1998a). In fact, the emergence of many intermediate IL

forms that often represent increasing IL complexity rather than increasing accuracy
(Skehan 1996) may be facilitated by focus on form as well. Consistent with this,
Doughty and Williams (1998b) argued that:
focus on form does not always immediately lead to IL changes that are reflected in
increased accuracy. It may also lead to restructuring that reflects increased complexity,
an equally important aspect of IL development. In this respect, focus on form has the
advantage of affecting both IL development and IL accuracy (p.254).
So far, the discussion has concerned the general effect of post-task transcribing, regardless of the differences between various transcribing conditions. However, in view of
the pedagogical applications of this study, post-task transcribing was operationalised
in various ways. The following sections present a discussion on these different operationalisations in greater detail.
The role of interaction in pair-based transcribing

In the present study, the post-task transcribing was carried out either individually or
in pairs. Despite the slightly contradictory result concerning the mean length of ASunit in the narrative tasks, the main trend in the results indicates that pair-based transcribing in both tasks encouraged more complex syntax than the individual condition.
From the perspective of cognitive SLA, restructuring may have caused this growth in
complexity (Wolfe-Quintero et al. 1998). The process of restructuring increases the
chances that new forms will be incorporated into IL systems; promotes risk-taking
and requires attention being devoted to the new forms of language which are being
assembled (Skehan 1996, p. 50). In this sense, unlike accuracy, which reflects greater
control over internalized forms, complexity is more related to the internalization of
new forms (Swain & Lapkin 1995). There are at least two potential ways to promote
IL restructuring: production of output plus metatalk, and production of output plus
feedback (Swain 1995, 2005). Both metatalk and feedback may result from interactions between speakers. Skehan (1996) also proposed that interactive opportunities
are important to achieve IL restructuring and higher complexity. Self-evidently, the
pair-based transcribing in this study provided more interactive opportunities than the
individual transcribing condition.
From a psycholinguistic perspective, pair-based transcribing seemed to offer participants the opportunities to engage in the kind of moves which are facilitative of L2
learning (Long 1983, 1996), such as seeking and receiving confirmation, and providing
each other with explanations about the original task performance. Swain and Lapkin
(1998) pointed out that requests for confirmation about language form or language
choice direct learners attention to a specific language item. As the participants indicated in the interview, during pair transcribing, there were cases when participants
147
148 Li Qian
could not fully understand what their partner exactly said in the recordings or why the
partner said what they said. Requests for confirmation and the corresponding explanations occurred in the interaction. An exchange that follows a confirmation move
forces the learner to clarify and organize their own knowledge and thus enhances
their own understanding (Storch 2007, p. 155). Such exchanges between the partners
provided an advantage which was missing when participants transcribed individually.
Joint responsibility over the creation of the transcripts means that students may be
more receptive to peer suggestions and feedback comments (Storch 2007).
The role of revision after transcribing

In pedagogical practice, mere transcribing, even if in different forms (e.g. individually
or in pairs), may have limitations and not be ideal because it stifles learners further
interaction with, and development of the transcripts. Once learners are asked to transcribe their task performance, they tend to revise the transcripts to produce a better version by means of error correction, or text editing etc (Willis & Willis 2007).
The present study revealed that in the interactive tasks, the involvement of revision
after transcribing promoted learners accuracy, but at the same time it had a negative
effect on complexity. These results are consistent with previous research in second
language writing (Ferris 2003; Hyland & Hyland 2006) which shows that the involvement of revision has positive effects on language accuracy. When participants were
asked to revise their transcripts, they were pushed to produce modified output. As
Swain (1993) put it, for IL development, learners need to reflect on their output and
consider ways of modifying it to enhance comprehensibility, appropriateness, and
accuracy (pp. 160161). Swain (1995, 2005) claimed that modified output is likely
to promote language learning. To push learners to revise or modify their own output
may cause not only attention to form, but also an underlying cognitive comparison
(Doughty 2001) and then reprocessing (c.f. Swains Output Hypothesis). In Lynch
(2001, 2007), students were involved in revising and reformulating. They showed performance improvement as a result of this process. Lynchs studies revealed that revision after post-task transcribing provides the opportunity for explicit feedback, both
positive and negative, and this is a central requirement for formal language learning (Ferris 2003). In teaching practice, some teachers reflect that during the posttask transcription phase, many trainees wanted to write what they wish they had said
rather than what they had actually said (Willis & Willis 2007: p. 173). In other words,
without communicative pressure, during the revision process, learners tend to recall
their memory for better forms.
In addition, in the modifying and revising process, learners primary concern was
on the process of error correction (Ellis 2008; Kormos 1999). Prior to error correction, needing to judge whether something was an error or not (i.e. the identification
Get it right in the end 149
of errors) channelled participants attention to reflect on their earlier performance,

making use of their language resources to notice the gaps between their own IL system
and the target language. The errors themselves, producing negative evidence in L2 language use, may have pushed learners to alter their performance priorities by assigning
greater importance to accuracy in task performance (Leeman 2007).
In contrast to the active role revision played with accuracy, a negative effect
of revision was noted as well: once the participants had been involved in revision
after transcribing, their language performance became less complex than that of the
no-revision groups. This finding is in line with previous research which found that
once learners paid strong attention to corrected errors, they tended to simplify their
writing to avoid errors in their output (Kepner 1991; Sheppard 1992). Avoidance
strategies were identified in early error analysis studies: The learners who found a
construction difficult tended to avoid it, using it only when they were confident that
they could get it right, or when they had no choice (Truscott 2007). It is possible
that in the present study, due to the involvement of revision, the participants were
more aware of the accuracy of the task performance. Reliance on an error avoidance
strategy may have given rise to the counterproductive risk-avoidance strategy in the
interactive tasks.
From a psycholinguistic perspective, the contrasting effects of revision on accuracy and complexity may be explained by the trade-off between these two performance areas (Skehan 1998). In the present study, the learners primary concern during
revision was for accuracy. Given the limited attentional resources available for the
formal aspects of performance, complexity, as a competing area for attention, therefore received less attention during revision. The participants tended to use simpler
structures which they could control well without needing to allocate extra attentional
resources to the task performance. As such, more accurate but simpler performances
were produced by the revision groups.
Given that revision was not consistently supportive for different language aspects,
whether or not to adopt revision in task-based language pedagogy may depend on the
different goals of instruction. Where accuracy is the focus, it is clearly appropriate.
Where complexity and development are the goal, this may not be the case.
Last but not least, the only interactive effect between the revision condition and
the individual/pair condition is worthy of further explication. In this study, it was
found that further revision pushed the individual transcribing groups to use more
infrequent words in the narrative task than the pair transcribing groups. In brief, the
involvement of revision brought about superior narrative task performance in terms of
lexical sophistication for the individual transcribing groups. Since both the narrative
tasks and the individual transcribing activity were carried out individually, and the
individual groups used more advanced words, it may be assumed that individual work
may give rise to some improvements in certain aspects of task performance, such as
150 Li Qian
lexical complexity in this study. While the effect of pair work has been established in
previous research (see a review in Storch 2007), research work with an emphasis on the
benefits of individual condition in L2 learning may be a fruitful field as well.
Pedagogical implications
The present study has interesting implications for second language instruction. First
and foremost, teachers in task-based settings are recommended to include post-task
activities in their teaching practice. The present study focussed on post-task transcription and showed a striking effect for improvements in formal aspects of language. The
procedure is perfectly feasible in regular classrooms in that only recorders and pens
are needed for transcribing and the average time for transcribing a 1-min extract is
around five minutes, both of which are manageable in L2 classrooms. In addition,
other types of post-task activities can also be examined in further research so as to
provide more focus-on-form options for pedagogical application.
In the second place, the findings highlighted the need to monitor the variations
in post-task transcribing carefully. Not all the transcribing conditions were beneficial
for overall language improvement. For example, only the pair transcribing condition
was favorable for syntactic complexity improvement. L2 learners at different proficient
levels may be primarily concerned with different aspects of language performance.
Teachers should, therefore, carefully design transcribing conditions to allow students
with different needs and at different stages of IL development to focus on different
aspects of task performance achievement.
Thirdly, teachers need to understand the factors that impact in contrasting ways
on different performance aspects. For instance, the effect of revision is complex. It is
generally accepted among teachers that the involvement of revision is helpful for L2
learners (Willis & Willis 2007). However, the results of this study reveal that revision
in a general sense facilitates improvement in accuracy, but may hinder the use of complex language. Thus, we should be cautious when we adopt further revision in posttask transcribing unless raising accuracy is the current pedagogic goal. One strategy
that could be employed is for a teacher to make it clear to the students that the focus
of revision is on both error-correction and structural improvement prior to revision.
This may help learners direct their attention to both aspects. This might reduce the
potentially negative effect of revision on complexity to a certain extent.
Finally, it should be acknowledged that all the above pedagogical recommendations, based as they are on just one study, cannot really be warranted unless further
replication studies are carried out. It should be noted as well that transcribing, when
adopted as a type of post-task activity, may be beneficial to induce learners to focus
on form, but might not necessarily bring about an immediate improvement in L2
performance.
Conclusion
Second language acquisition is a complex phenomenon. So is focus-on-form research:
Researchers are torn between the desire to test theoretical claims about L2 acquisition,
which requires the investigation of precise and discrete instructional options, and the
desire to ensure that form-focused instruction is ecologically valid, which leads to
combining options into treatments that are pedagogically defensible.
(Ellis 2008, p.900)
This research has been explorative in terms of both theoretical and pedagogical issues.
The findings have underscored the necessity for task-based research and pedagogy to
give equal weight to a post-task focus on form as during pre- and during-task stages. As
Skehan (2007) noted, a task-based approach has much to offer form-focused instruction in a variety of ways. Focus on form at the post-task stage is a promising area which
is worthy of future exploration.
References
Bialystok, E. (1990). Communication strategies: A psychological analysis of second Language use.
Oxford: Basil Blackwell.
Brown, J.D. & Rodgers, T. (2002). Doing second language research. Oxford: Oxford University Press.
Bygate, M. (1996). Effects of task repetition: Appraising the developing languages of learners. In
J.Willis & D. Willis (Eds.), Challenge and change in language teaching (pp. 136145). Oxford:
Macmillan Heinemann.
Bygate, M. (2000). Introduction. Language Teaching Research, 4, 185192.
M.Bygate, P. Skehan, & M.Swain (Eds.), Researching pedagogic tasks: Second language learning,
teaching and testing (pp. 2348). Harlow: Longman.
Clennell, C. (1999). Promoting pragmatic awareness and spoken discourse skills with EAP classes.
ELT Journal, 53, 8391.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ:
Lawrence Erlbaum Associates.
Doughty, C. (2001). Cognitive underpinnings of focus on form. In P. Robinson (Ed.), Cognition and
second language instruction (pp. 206257). Cambridge: CUP.
Doughty, C. (2003). Instructed SLA: Constraints, compensation, and enhancement. In C. Doughty&
M. Long (Eds.), Handbook of second language acquisition (pp. 256310). New York, NY:
Blackwell.
Doughty, C., & Williams, J. (1998a). Issues and terminology. In C. Doughty & J. Williams (Eds.),
Focus on form in classroom language acquisition (pp.111). Cambridge: CUP.
Doughty, C., & Williams, J. (1998b). Pedagogical choices in focus on form. In C. Doughty &
J.Williams (Eds.), Focus on form in classroom language acquisition (pp.197261). Cambridge:
CUP.
Doughty, C., & Williams, J. (Eds.). (1998c). Focus on form in classroom language acquisition.
Cambridge: CUP.
151
152
Li Qian
Ellis, R. (2001). Introduction: Investigating form-focused instruction. Language Learning, 51,
146.
Ellis, R. (2008). The study of second language acquisition (2nd ed.). Oxford: OUP.
Ferris, D.R. (2003). Response to student writing: Implications for second language students. Mahwah,
NJ: Lawrence Erlbaum Associates.
Foster P., & Skehan P. (2013), Anticipating a post-task activity: The effects on accuracy, complexity and fluency of L2 language performance. Canadian Modern Language Journal 69, 3,
249273.
Fotos, S., & Nassaji, H. (Eds.). (2007). Form-focused instruction and teacher education: Studies in
honor of Rod Ellis. Oxford: OUP.
Hyland, K., & Hyland, F. (2006). Feedback in second language writing: Contexts and Issues. Cambridge:
CUP.
Kepner, C.G. (1991). An experiment in the relationship of types of written feedback to the development of second-language writing skills. Modern Language Journal, 75, 305313.
Kormos, J. (1999). Monitoring and self-repair. Language Learning, 49, 303342.
Krashen, S. (1982). Principles and practice in second language acquisition. Oxford: Pergamon.
Krashen, S. (1985). The Input Hypothesis: Issues and implications. London: Longman.
Leeman, J. (2007). Feedback in L2 learning: Responding to errors during practice. In R. Dekeyser
(Ed.), Practice in a second language: Perspectives from applied linguistics and cognitive psychology
(pp.111137). Cambridge: CUP.
Leeser, M. (2004). Learner proficiency and focus on form during collaborative dialogue. Language
Teaching Research, 8, 5581.
Lightbown, P., & Spada, N. (1990). Focus-on-form and corrective feedback in communicative language teaching: effects on second language learning. Studies in Second Language Acquisition,
12, 429448.
Loewen, S. (2005). Incidental focus on form and second language learning. Studies in Second Language Acquisition, 27, 361386.
Long, M. (1983). Native speaker/non-native speaker conversation and the negotiation of comprehensible input. Applied Linguistics, 4, 126141.
Long, M. (1991). Focus on form: A design feature in language teaching methodology. In K. de
Bot, R. Ginsberg, & C. Kramsch (Eds.), Foreign language research in cross-cultural perspective
(pp.3952). Amsterdam: John Benjamins.
Long, M. (1996). The role of the linguistic environment in second acquisition. In W. Ritchie &
T.Bhatia (Eds.), Handbook of research on second language acquisition (pp.413468). New York:
Academic Press.
Long, M., & Robinson, P. (1998). Focus on form: Theory, research and practice. In C. Doughty &
J. Williams (Eds.), Focus on form in classroom language acquisition (pp. 1541). Cambridge:
CUP.
Lynch, T. (2001). Seeing what they meant: Transcribing as a route to noticing. ELT Journal, 55,
124132.
Lynch, T. (2007). Learning from the transcripts of an oral communication task. ELT Journal, 61,
311320.
Lynch, T., & Maclean, J. (2001). A case of exercising: Effects of immediate task repetition on learners
performance. In M. Bygate, P. Skehan, & M.Swain (Eds.), Researching pedagogic tasks: Second
language learning, teaching and testing (pp.141162). Harlow: Longman.
McLaughlin, B. (1990). Restructuring. Applied Linguistics, 11, 113128.
Mennim, P. (2003). Rehearsed oral L2 output and reactive focus on form. ELT Journal, 57, 130138.
Mennim, P. (2011). Learner negotiation of L2 form in transcription exercises. ELT Journal, 66(1),
5261.
Norris, J., & Ortega, L. (2000). Effectiveness of L2 instruction: A research synthesis and quantitative
meta-analysis. Language Learning, 50, 417528.
Pica, T. (2002). Subject-matter content: How does it assist the interactional and linguistic needs of
classroom language learners? Modern Language Journal, 86, 119.
Prabhu, N.S. (1987). Second language pedagogy. Oxford: OUP.
Rutherford, W. (1988). Second language grammar: Teaching and learning. London: Longman.
Schmidt, R. (2001). Attention. In P. Robinson (Ed.), Cognition and second language instruction (pp.
328). Cambridge: Cambridge University Press.
Sheppard, K. (1992). Two feedback types: Do they make a difference? RELC Journal, 23, 103110.
Skehan, P. (1996). A framework for the implementation of task-based instruction. Applied Linguistics,
37, 3862.
Skehan, P. (2007). Task research and language teaching: Reciprocal relationships. In S. Fotos &
H.Nassaji (Eds.), Form-focused instruction and teacher education: Studies in honor of Rod Ellis
(pp. 5569). Oxford: OUP.
language performance. Language Teaching Research, 1, 185211.
Skehan, P., & Foster, P. (2001). Cognition and tasks. In P. Robinson (Ed.), Cognition and second language instruction (pp. 183205). Cambridge: CUP.
Spada, N. (1997). Form-focused instruction and second language acquisition: A review of classroom
and laboratory research. Language Teaching, 30, 7387.
Stern, H.H. (1983). Fundamental concepts of language teaching. Oxford: Oxford University Press.
Stillwell, C., Curabba, B., Alexander, K., Kidd, A., Kim, E., Stone, P., & Wyle, C. (2010). Students
transcribing tasks: Noticing fluency, accuracy and complexity. ELT Journal, 64, 445455.
Storch, N. (2007). Investigating the merits of pair work on a text editing task in ELS classes. Language
Teaching Research, 11, 143159.
Swain, M. (1993). The output hypothesis: Just speaking and writing arent enough. The Canadian
Modern Language Review, 50, 158164.
Swain, M. (1995). Three functions of output in second language learning. In G. Cook & B. Seidhofer
(Eds.), Principles and practice in applied linguistics: Studies in honor of H.G. Widdowson
(pp.125144).Oxford: OUP.
Swain, M. (2005). The output hypothesis: theory and research. In E. Hinkel (Ed.), Handbook on
research in second language teaching and learning (pp. 471483). Mahwah, NJ: Lawrence
Erlbaum Associates.
Swain, M., & Lapkin, S. (1995). Problems in output and the cognitive processes they generate: A step
towards second language learning. Applied Linguistics, 16, 371391.
Swain, M., & Lapkin, S. (1998). Interaction and second language learning: Two adolescent French
immersion students working together. The Modern Language Journal, 82, 320337.
Truscott, J. (2007). The effect of error correction on learners ability to write accurately. Journal of
Second Language Writing, 16, 255272.
Willis, D., & Willis, J. (2007). Doing task-based teaching. Oxford: OUP.
153
154 Li Qian
Willis, J. (1996). A framework for task-based learning. Harlow: Addison Wesley Longman.
Wolfe-Quintero, K., Inagaki, S., & Kim, H.Y. (1998). Second language development in writing:
Measures of fluency, accuracy and complexity (Technical Report #17). Honolulu, HI: University
of Hawaii, Second Language Teaching and Curriculum Center.
Appendix
Task instruction and letters for interactive decision-making tasks
Instruction: In this task, you have some letters which were sent to the Problem Page of
a magazine, to a problem aunt called Sue. She replied to each of them with advice to
help the writer solve each of the problems.
Imagine that the two of you together are Sue and that your task is to agree on the
advice to put in the letter you send to each of these people. In each case, to think of the
different sorts of advice which are possible, of why one bit of advice would work better,
or of why some bit of advice might contain difficulties or dangers. Work out what the
best advice is that you could put in your letter of reply.
Letter A: Cyber love

We have a daughter of 16 years old. When she was in junior middle school, she was
excellent.
Last year, she began to be addicted to internet exploration. She made some netfriends through the internet. Recently she disappeared for two weeks. Finally, we found
her in a boys house. She told us that she was going to marry that boy whom she got to
know through the Internet. The boy is 12 years older than her, and now is jobless. We
locked her in the bedroom and didnt allow her to go out. However, she escaped from
the window to the boys house. What shall we do now?
Letter B: Unbalanced degree

I have been in love with my girlfriend for one year. Now she is studying at a university
and I work as a clerk in a company after my graduation from a technology institute.
Being afraid of her parents objection, she didnt tell her parents about our love until
recently. Several days ago, she told them about our love and quarreled with them. Her
mother came to my company and asked me to break up with her, because I have a
lower academic degree than her daughter. I am in despair. What shall I do now?
chapter 6
Structure, lexis, and time perspective

Influences on task performance
University of Pittsburgh / St. Marys University, Twickenham
The Cognition and Tradeoff Hypotheses account for task performance in different
ways. The former sees task complexity as the driver for higher accuracy and structural
complexity whereas the latter, within the constraints of limited attentional capacities,
sees performance as being accounted for through the interaction of influences
from task characteristics and task conditions. This chapter reports on a study
which contrasts these two accounts, manipulating task structure (as an influence
on primarily accuracy, but secondarily complexity), vocabulary difficulty (as a
disruptor of smooth processing during performance), and time perspective (as a
method of operationalising task complexity). The results do simultaneously produce
raised accuracy and complexity, but this is best accounted for through the separate
contribution of task structure and a there-and-then perspective (analysed differently
to that within the Cognition Hypothesis), rather than through greater task complexity.
Vocabulary difficulty did not have the predicted impact. The results are discussed in
terms of the Tradeoff and Cognition Hypotheses.
Introduction
The last twenty-five years haveseen great practical interest in task-based approaches to
instruction (Ellis 2003; Van Den Branden 2006; Van den Branden et al. 2009). At the
same time, there has also been a parallel focus on research into task-based performance
(Skehan & Foster 2008), in an attemptto develop a research-based view on language
instruction. Indeed, research in this area has the dual attractions of connecting with
interesting theoretical issues in the domain of second language acquisition (see, e.g.
Housen & Kuiken 2009) and practical concerns within the field of pedagogy (Skehan
2011). The available research allows for a range of generalisations, which add to our
understanding of how differences in task features and task conditions can have systematic influences on performance, and so, perhaps, pedagogy. For example, tasks based
on familiar information or more concrete information are easier (Brown et al. 1984)
156 Zhan Wang & Peter Skehan
and also associated with greater accuracy (Foster & Skehan 1996). Similarly, tasks which
require information m
anipulation (i.e. those requiring the creation of a storyline to link a
series of pictures) and integration (i.e. those requiring the integration of foreground and
background information in a narrative task) are more difficult, but are associated with
greater language complexity (Skehan & Foster 1997; Tavakoli & Skehan 2005). These
findings are the result of research studies which have focussed on task types or characteristics (Skehan & Foster 2008), but there has also been research on the conditions under
which tasks are performed. Planning, for example, has been shown to be beneficial to
performance, whether strategic, that is pre-task, or on-line, (that is conducted while the
task is running), with the former consistently producing more complex and more fluent performance (Foster & Skehan 1996; Ortega 1999), and the latter more consistently
producing greater accuracy (Yuan & Ellis 2003; Ellis 2009). Other task conditions, such
as repetition (Bygate 2001; Wang this volume), or post-task activities (Skehan & Foster
1997; Li this volume) have also proved to be beneficial.
Even though the research base on task performance has grown steadily, there is
still a heated debate on the theoretical underpinnings that can explain the results we
see. At present, there are two competing theoretical models aiming to account for the
impact of task type and task conditions on performance, the Trade-off H
ypothesis
(Skehan 1998, 2009a) and the Cognition Hypothesis (Robinson 2001a, b, 2012). The
starting point for the Trade-off Hypothesis is the assumption that there are attentional
limitations on performance, associated with limited working memory size, and that
pressure on such limited resources will have implications for what a second language
speaker can produce. Task research has frequently portrayed performance in terms
of language complexity, accuracy, and fluency (and more recently lexis also, Skehan
2009b). So, applying attentional limitations to this view of performance, prioritising
one area of performance, complexity, say, might have significant effects on performance in other areas. In fact, the Tradeoff Hypothesis has proposed that there is a
particular tension between complexity and accuracy, such that it is difficult to produce
high levels of performance in both of these areas simultaneously. In contrast, high
accuracy and high fluency, or high complexity and high fluency are in conflict to a far
lesser extent (Skehan & Foster 1997).
This, though, is a rather basic account of the Trade-off Hypothesis. The fundamental assumption is that if tasks become more difficult, the significance of attentional and
memory limitations becomes greater. But this does not mean that the effects of Tradeoff are unavoidable. Indeed, Trade-off is a fundamental constraint, and then a major
contribution of task research is to explore how task characteristics and task conditions
can mitigate its effects. In other words, influences which singly might elevate performance in a single direction may, when operative together, raise performance in more
than one area. Indeed, the purpose of much Trade-off research is to see to what extent
such limitations can be overcome. For example, Tavakoli and Skehan (2005), and
oster and Tavakoli (2009), each using picture story narrative retelling, have shown
F
how task structure, which ordinarily promotes accuracy and fluency, and information
integration, which normally promotes complexity, can have a conjoined positive influence and raise accuracy and complexity together. This particular interactive influence
may be difficult to achieve ordinarily, but the above studies show that it is possible.
The fundamental assumption is that attentional limitations have to be assumed, but
this can be taken as the necessary starting point to explore how pedagogic goals, even
if not easy to achieve, can be reached within such constraints.
A final aspect of the Trade-off Hypothesis is its dependence on the Levelt model
of first language speaking (Levelt 1989; Kormos 2006). This model proposes (amongst
other things) that there are three broad stages in speaking: Conceptualisation, Formulation, and Articulation. The first of these is concerned with the ideas to be expressed,
with an evaluation of the context of speaking, and with decisions about the stance
towards what is being said. The stage outputs the pre-verbal message. Formulation
then takes the pre-verbal message, accesses the mental lexicon, and engages in clothing
the propositions to be expressed with language, first through lexis, and then through
syntactic encoding. Skehan (2009a) takes this first language model and applies it to
second language case, particularly in relation to task-based performance. He discusses
four types of influence on performance: complexifying, pressuring (both of which
make demands on the processing system), and easing and focussing (which reduce
processing pressure, or direct it). He then relates findings from the task performance
literature to these four influences, and links them to stages within the Levelt model. In
this way, an attempt is being made to construct a theoretical base for the claims which
are made, and also to link task performance to the psycholinguistic processes second
language speakers engage in when speaking. Such a theoretical base is also important
in combating the claim (Robinson 2007) that the Trade-off Hypothesis is vacuous, and
only makes post-hoc claims rather than predicting where trade-off effects will occur.
The contrasting approach is represented by Robinsons Cognition Hypothesis (Robinson 2001a; Robinson & Gilabert 2007). This hypothesis explicitly rejects
notions of attentional limitations, and proposes that tasks should be designed and
sequenced on the basis of gradual increases in cognitive complexity. A triadic componential framework is proposed, embracing task features, which impact on task complexity, task conditions, which draw on interactive factors, comprising participation
and participant variables; and learner factors, both affective and ability, which are
important for task difficulty (being defined in reference to the language learner rather
than task features). For our present purposes task complexity factors are most relevant;
the other two areas will not be pursued here. As for task complexity, a distinction is
made between resource-directing factors and resource-dispersing factors. The former
category contains sub-headings such as number of elements in the task, time perspective (Here-and-now vs. There-and-then), and reasoning demands. It is assumed that
157
more elements, there-and-then tasks, and more reasoning demands individually (and
presumably collectively) make a task more complex, and as a result raise the level of
accuracy and complexity. In other words, task complexity pushes up performance in
each of these general areas, which is in direct contrast with the default position of
the Trade-off Hypothesis. Indeed, Robinson (2005) makes the claim that increasing
complexity in each of these areas pushes speakers to produce particular aspects of
language, such as the use of articles, hence the resource-directing label. In contrast,
resource-dispersing influences, while increasing task complexity, do not direct learners to any particular aspects of language code which can be used to meet the additional
task demands (Robinson 2005: p. 7). These lead Robinson to note, that resourcedirecting factors will affect fluency negatively but accuracy and complexity positively.
In contrast, increasing task complexity through resource-dispersing factors will influence fluency, and accuracy and complexity, negatively (e.g. through lack of provision
of planning time, or through multiple tasks, or through the need to use unfamiliar
information).
Broadly there are two types of critique that one can raise against the Cognition
Hypothesis. First, there is the issue of evidence. The hypothesis has been around for
some time now, and its proponents have published a wide range of research studies.
However, on the whole, this research has not been outstandingly supportive of the
predictions, especially regarding the above-mentioned joint influence on accuracy and
complexity. Frequently one aspect of performance is raised, but not the other (see
Iwashita et al. 2001; Michel et al. 2007; Kuiken & Vedder 2007, 2008; Rahimpour 1997;
Robinson 1995; Gilabert 2007 regarding the positive effect of more complex task on
language accuracy; and Foster & Skehan 1999; Foster & Tavakoli 2009; Robinson 2007;
Tavakoli & Foster 2008, 2011 on language complexity); it is rare to see the prediction
of joint raising fulfilled. Occasionally this is the case, (as in Ishikawa 2006), but these
results were obtained with written performance, which renders comparison with the
spoken language studies difficult. There are a few studies (e.g. Foster & S kehan 1999;
Tavakoli & Skehan 2005; Foster & Skehan 2013) where both accuracy and complexity
are higher. In view of these studies, Skehan (2009a) argues that it is insufficient, given
the Cognition Hypothesis, to demonstrate that accuracy and complexity are raised
one must also demonstrate that, at the individual level, the two variables are correlated.
Otherwise there is the possibility that some learners may raise complexity, some may
raise accuracy, and the outcome will be significant gains for each, but this, then, would
not apply at the individual level. In the above three favourable studies, for example,
the correlations between accuracy and complexity were not significant, which makes it
hard to defend that they provide any support for the Cognition Hypothesis.
But another type of critique can be raised with regard to the extent to which each
of the components of task complexity actually give rise to higher complexity levels.
The fundamental claim is that resource-directing influences, through greater cognitive
complexity, push learners to higher accuracy and structural complexity levels. Take
Here-and-now vs. There-and-then, for instance. The interpretation favoured by the
Cognition Hypothesis is that the second condition is more complex (hence, the differences in performance that are predicted). But one can ask why this would be the
case, and demand greater clarity on the exact meaning of cognitive complexity. Speaking about Here-and-now is certainly easier with respect to the availability of information to be communicated. But it is also less negotiable, and has more prominent
input which needs to be attended to unavoidably, factors which add to complexity.
Speaking about There-and-then is certainly more difficult regarding the lack of input
to be easily referred to, and it also makes memory demands. But it is potentially less
input-dominant and provides greater scope for negotiation, and the shaping of contributions on the part of the speaker, since the stimulus material, the input provided, can
be responded to selectively, sometimes ignored, and more easily repackaged to make
it easier to handle. With regard to resource-dispersing influences the assumption is
made that the different categories of influence are unproblematic and work in predictable ways. This, however, is doubtful. For instance, planning, which is interpreted in
the Cognition Hypothesis as resource-dispersing, is claimed to produce lower performance (if planning time is not available). But the research on planning raises arguments against this interpretation. The literature suggests that planning does not have
equal effects on all aspects of performance, leading to stronger effects on complexity
and fluency, and smaller and less dependable effects on accuracy. There is also the
issue of how strategic and on-line planning interact (see Wang, this volume). In addition, evidence from qualitative studies of planning (Ortega 2005; Pang & Skehan this
volume) show that planning consists of many processes and many goals. These include
planning-for-task-interpretation (leading to complexity?), planning-for-organisation
(leading to accuracy?), and planning-as-rehearsal (again leading to accuracy but also
fluency?). Planning cannot be treated as monolithic: its effects are subtler. So, all in all,
the categories which make up the Cognition Hypothesis are not without problems,
and require clearer construct definition.
These considerations motivated the authors to design a study which systematically
investigated typical Trade-off variables (structure and lexical demands) and a typical Cognition variable (time perspective) to explore the various questions raised by
these two models. Before we describe the study itself, we need to review the evidence
relating to these specific variables a bit further. As a starting point, task structure was
inferred to be important by Skehan and Foster (1997), but this was only post-hoc from
the results of Foster and Skehan (1996) and Skehan and Foster (1997). Skehan and
Foster designed a subsequent study to explore this influence (Skehan & Foster 1999)
broadly confirming the original results. Tavakoli and Skehan (2005), and Tavakoli and
Foster (2008) published subsequent research consistent with the claim that tasks that
contain clear macrostructure (and so ease Levelts Formulator stage), favour accuracy,
159
and fluency. However, the later studies have complexified the picture somewhat, since
structure has emerged as a generally favourable influence on performance, and in
some cases was even associated with greater complexity (Tavakoli & Skehan 2005). It
was therefore decided to use Structure as an independent variable in the present study.
The broad motivation is that, following the Trade-off Hypothesis, researchers try to
account for raised performance not through any need to posit greater task complexity,
but instead through the use of specific targeted variables such as structure which have
been shown to enhance performance in particular areas.
As for the time perspective (Here-and-now versus There-and-then), this has
featured in a number of Cognition Hypothesis studies. It has, however, mainly been
studied through map tasks. The Here-and-Now condition is implemented with a participant describing a route on a map that is available and visible. The There-and-then
condition requires the participant to describe the route without the map being visible.
Predictions are made that, since the non-present condition is more cognitively complex, it will push speakers to greater accuracy and complexity. This prediction is rarely
fulfilled, and the most typical result is that accuracy is raised, while complexity is not
(Gilabert 2007; Rahimpour 1997; Robinson 1995), a result which is inconsistent with
the Cognition Hypothesis.
As a matter of fact, the strength of the claims about time perspective would be
much greater if alternative operationalisations (other than through map tasks) of the
same construct were used, potentially generating consistent results. In the present study
we decided to create a condition that differed from the above-mentioned map task in
two ways. First, we used a video-based presentation. We felt a video, with actual characters, would constitute a more involving challenge for our participants. In addition,
with a video narrative, engagement might potentially be richer, since causal links could
be worth commenting on, as might the motives of the characters in the video. But the
essential contrast Here-and-now vs. There-and-then would be maintained. In this
way, the Here-and-now condition is very clear and makes demands only on working
memory within processing. But the flow of input is considerable, which puts the speakers ability to plan and organise under considerable pressure. In contrast, the Thereand-then condition does not have the same pressure of heavy input. Onthe other hand,
memory demands are much greater and no new stimulus material is involved, or can
be referred to. The speaker can, though, shape the narrative in whatever way is desired,
and so plan what is going to be said. Further details will follow in the Materials section.
Secondly, following up on previous studies, (Skehan 2009b), we included lexical
demands as an additional variable, since this appears to have an impact on performance across a range of task-based studies. Two aspects of this work are relevant.
First, Skehan (2009b), following Meara and Bell (2001), discusses the use of a statistic,
Lambda, which measures the extent to which, in the small texts typical of second
language task work, speakers use less frequent lexis. The procedure divides a text up
into ten-word chunks and then establishes how many words, in each ten word chunk,
are outside a certain frequency range. A statistic is then calculated, Lambda, which
uses a Poisson distribution (appropriate for infrequently occurring events) to capture
the extent to which less frequent words penetrate the text. Second, Skehan (2009b)
discusses the extent to which the use of less frequent lexis has an impact on other
aspects of performance. He argues that greater use of less frequent lexis, prompted
by the demands of a particular task, is associated with lower complexity and accuracy
scores. He suggests that the need for second language speakers to access less frequent
language from a less well developed mental lexicon leads to disruption at the Formulator stage in speech production. He offers the conclusion that tasks which do require
less frequent lexis can therefore have a damaging effect on task performance generally.
But this conclusion is based on post-hoc analyses of a range of studies. It was not the
result of research design. For that reason, the variable of lexical demands is included
in the present study in a more systematic way to explore its effects.
This background leads to four general research questions, and their associated
hypotheses. The research questions are:
Research Question 1: What will be the effect of time perspective (Here-and-now versus
There-and-then) on complexity and accuracy?
Research Question 2: How will task structure affect accuracy and fluency?
Research Question 3: How will the use of less frequent lexis affect accuracy and
complexity?
Research Question 4: How will time perspective, task structure, and frequency of lexis
interact?
The specific hypotheses are as follows (and see also Table 1):
Hypothesis One: There-and-then tasks will raise complexity, but not accuracy. This
follows from the analysis presented earlier regarding the differences between the
There-and-then and Here-and-now conditions, and their implications for the psycholinguistics of processing. Essentially, the There-and-then condition, since it allows
repackaging of content, will give speakers more opportunity to express ideas more
densely and to bring out connections between events and to clarify the motives of
the participants in the narratives. Interestingly here, the Cognition Hypothesis should
predict that both complexity and accuracy will be raised, even though Cognition
Hypothesis motivated research has tended to report an accuracy effect only. In fact, it
is further assumed that the There-and then condition has no influence on accuracy,
since it is proposed here that There-and-then is not a more complex condition, merely
a different condition, characterised by different processing demands.
161
Hypothesis Two: Task structure will raise accuracy and fluency in performance. This
hypothesis follows from the review of previous studies which have used this variable
to explore task-based performance. The hypothesis is exploratory regarding complexity, since previous results have been inconsistent. Tentatively, we can hypothesize that
there will be an increase in this area.
Hypothesis Three: Tasks which lead to the use of less frequent lexis will show lower
scores for accuracy and complexity. The motivation here derives from Skehan (2009b),
which argues (on the basis of a post-hoc interpretation of findings) that such tasks lead
to lowered performance in these areas. The present study is more systematic with the
use of tasks intended to provoke different frequencies of lexical items.
Hypothesis Four: The variable of time perspective will show interactions with task
structure and lexical frequency, and produce stronger effects, specifically more complex and accurate performance for structured tasks performed under the Thereand-then condition and tasks with easy lexis performed under the There-and-then
condition as well. It is argued that the There-and-then condition, despite posing
different memory demands, contains less demanding processing conditions, so that
there is more likelihood that additional variables such as structure and lexis can have
an impact. The impact of these variables is predicted to be more muted under the
Here-and-now condition because of the pressure of input that is involved. There is
also the point here that more structured tasks, because of their greater structure, will
enable memory pressures to be lessened because of the more organised nature of the
narratives to be told.
Table 1. Hypotheses and predictions
Hypotheses
Variables
Predictions*
C
Hypoth.1
Time Perspective (Here-and-now

versus There-and-then)
>
Hypoth.2
Structured versus Unstructured task
>?
>
Hypoth.3
Task with less freq lexis versus

more freq lexis
<
<
Hypoth.4
Interactions
++
++
Reasons
L
Foster & Skehan 1999;
Trade-off Hypothesis
>
Tavakoli & Foster 2008;

Tavakoli & Skehan 2005
>
Skehan 2009b
C: syntactic complexity; A: Accuracy; F: Fluency; L: lexis

* = the predicted directions for the hypotheses are from the 1st variable to the 2nd variable as this is shown
in the Variables column.
>? means the hypothesis is tentative: + indicates that interactions will occur
Blank cell refers to no effect
The various hypotheses, summarized in Table 1, and now looked at more generally, propose that different individual variables, motivated by psycholinguistic processing concerns, will account for the results that will be obtained. Main effects are
important, but so are interactive, or conjoint effects: the one prediction from Hypothesis Four, that there will be an effect on both complexity and accuracy, is derived from
such interactive effects on performance. In other words, it is not proposed that it is
necessary to hypothesise greater task complexity to achieve these results (as the Cognition Hypothesis, in contrast, would claim); instead they are claimed to be the result
of the interplay of processing factors.
Methods
To examine task complexity from the three dimensions reviewed above: time perspective (Here-and-now versus There-and-then), lexical difficulty (easy vocabulary versus
difficult vocabulary), task structure (structured task versus unstructured task), and
also allow comparisons within and across these factors, this study is based on a 2x2x2
research design, with lexical difficulty and task structure as within-subject factors, and
time perspective as a between-subject factor.
Participants
Participants were 72 Chinese L1 (Mandarin) speakers who were learning English
as a second language, with slightly more female than male participants. They were
recruited from a major university in Hong Kong. Most (71%) participants were year 1
students; 14% were year 2, 5% were year 3, and 3% were year 4 students. They voluntarily participated in the study and received time compensation fees for their participation. Student consent forms were collected during the study.
English proficiency pre-test

In order to assign the 72 participants into two groups of equivalent English speaking proficiency (the Here-and-now and There-and-then speaking conditions), a pretest was administered. A version of the TOEFL Listening subtest (extracted from
Hinkel 2004) was used as the pre-test because TOEFL listening was reported to be a
strong indicator of general English proficiency measured by TOEFL exams (Sawaki
etal. 2009). In addition, listening may involve relatively similar processes to speaking, especially regarding the degree of resources required on-line for problem solving
(Yuan & Ellis 2003, p. 9). The results showed that English proficiency was balanced
across the two groups. The Here-and-Now group had a mean score of 38.18 (.69) and
163
the There-and-Then group had a mean score of 37.72 (.69) in the pre-test. A betweensubjects t-test showed no significant difference between the groups.
Material and tasks

Following Skehan and Foster (1999) and Wang (2009), we used video-based tasks
as the basis for narrative retellings. These previous studies used videos from the Mr.
Bean television series. For the present research, we switched to the use of Shaun the
Sheep cartoons. Trial runs with these cartoons suggested that they provoked a richer
and more varied type of narrative. The Shaun the Sheep video series was selected
because it was animation made for children which was easy to comprehend; it had no
speech or conversation in it so that speakers could concentrate on doing the speaking
tasks; the videos were fun to watch and speak about, and they were relatively new at
the time the research study was conducted all the participants reported in after-task
interviews that they had not watched the series before. Each video was six to seven
minutes long.
The research design manipulated two variables relevant to the video-based
prompts: plus or minus structure, and plus or minus less frequent lexis. The structured tasks were selected by using the problem-solution structure used in Tavakoli
and Skehan (2005) and Tavakoli and Foster (2009), where a problem-theme organises
the story, and the narrative develops as attempts to solve a problem are tried out and
evaluated. In this respect two videos were chosen. In Tooth Fairy, the dog character
in this series experiences toothache, and so the major part of the story concerns the
flock of sheep failing to help the dog to extract the tooth, followed by the dog actually
going to a vet to get the tooth extracted professionally. In Bathtime, the farmer orders
the dog to bathe the (very dirty) sheep, but there is no hot water. The major part of
the story then relates how the sheep and dog collaborate to obtain hot water for their
collective bath, mainly by stealing the farmers hot bath water. In each of these video
stories, the underlying problem connects the different events which take place, and
gives them a driving narrative force. The two other videos were unstructured. In Off
the Baal, the sheep obtain a cabbage, which is used as a football. They then engage
in a game of football, with various unconnected events, including the football being
stolen, retrieved, and then lost again. In Fetching, while the Farmer is away, and he is in
charge, the sheepdog meets an attractive female dog he is so smitten by that he doesnt
notice the mayhem that is occurring in the farmhouse when the flock of sheep invade
it. Desperate last minute measures are required when the farmer returns. Although
there are problems within these latter two stories, the problems do not really tie the
events of the stories together, and so the videos can be regarded as unstructured.
Both researchers, together with a group of experienced EFL teachers, agreed on these
characterisations of the videos.
Identifying the vocabulary demands of the tasks was more complex. Two procedures were followed, and the tasks which were used were the ones that survived these
procedures. First, groups of experienced teachers viewed candidate videos and rated
them for perceived vocabulary demands. On this basis two groups of five videos each
were identified which differed clearly in rated vocabulary demands. At a second stage,
these videos were described by several people, native as well as non-native speakers,
and these performances were assessed by means of a version of Mearas PLex computer
program (Meara & Bell 2001). The program outputs a statistic for each performance,
Lambda, which captures the use in the performance of less frequent vocabulary (see
Skehan 2009b for discussion). The videos which were chosen were identified through
their contrasting Lambda means, which differed markedly: Tooth Fairy and Fetching
provoked use of more frequent vocabulary (and therefore avoidance of less frequent
vocabulary), and Bathtime and Off the Baal were associated with higher Lambda
scores, and a reliance on less frequent vocabulary.
Operationalization of the Here-and-now and There-and-then conditions

For the Here-and-now group (n = 45), participants were asked to narrate the story in
the present tense while they were watching the Shaun the Sheep video. For the Thereand-then group (n = 27), participants were asked to narrate the story in the past tense
after having watched the video once, and they were not allowed to watch the video
as they were performing the narrative task. This is in line with operationalizations of
the There-and-then condition in the literature (e.g. Robinson 1995a; Gilabert 2007),
except, obviously, that Robinson and Gilabert used map tasks rather than video
narrative tasks.
Procedure
One of the researchers collected the data in a language lab during a one-on-one meeting with each of the participants. Participants were asked to take an English proficiency pre-test first. Next, a training session was conducted to ensure that participants
understood what they had to do, and were familiar with the Shaun the Sheep series in
general terms and the characters it contains. There was instruction and explanation,
with examples (see Appendix 1). In addition there was a brief oral trial run to make
sure that participants were familiar with speaking in these circumstances. Then participants in each group performed narratives of the four Shaun the Sheep videos as a main
session. The four videos were selected according to the combination of task structure
and vocabulary difficulty. Table 2 presents the operationalization of the main session.
To avoid practice effects, the administration order of these four videos was arranged
according to a Latin Square design which was pre-programmed into the computer.
During the main session, both the Here-and-now and There-and-then groups narrated
165
stories facing the computer screen to make the two performance conditions as similar
as possible. Participants were also told that they were narrating to someone who had
not watched the movie so as to create imagined listeners for the task. Finally, the participants filled out a questionnaire and had a short interview with the researcher. The
whole data collection process took approximately 1.5 hours.
Table 2. The operationalization of the main task session
Structured
Unstructured
Here-and-now
Here-and-now (N = 45)
There-and-then
(N = 27)
Voc_easy
Tooth Fairy
Tooth Fairy
Voc_dif
Bathtime
Bathtime
Voc_easy
Fetching
Fetching
Voc_dif
Off the Baal
Off the Baal
Data handling and coding

While completing the tasks, participants were recorded on MP3 players. The digitised
sound files generated by this procedure were transferred to computer and then
transcribed broadly using Soundscriber software. The transcription was then o
perated
upon in a series of stages (similar to the other empirical chapters in this volume).
First, each performance was coded into AS-Units (Foster, Tonkyn & Wigglesworth
2000) as the basic segmentation unit. Next, each AS line was copied to form, at that
stage, a second identical line. The first line, edited to begin with an asterisk and a
participant identifier, was a CHAT tier, which was analyzed by CLAN software. The
second line, edited to begin with a percentage sign and a participant identifier, was
a TaskProfile line, appropriate for the more specialised software developed to quantify second language task based data. The coding conventions for each line followed
CHAT (MacWhinney 2000) and TaskProfile (Skehan, Chapter 1, this volume ;ms) conventions respectively. CHAT conventions (see MacWhinney 2000 for details), which
enable the use of CLAN software, allowed the calculation of a range of sophisticated
language features such as part of speech morphosyntax and lexical D (MacWhinney
2000; Malvern & Richards 2002; Richards & Malvern 1998). Taskprofile allows computational analysis of features of second language speech data, especially the measures of
complexity, accuracy, fluency, and lexis directly. This line was coded for subordinate
clauses; for error free clauses, as well as type of error; repair fluency indices, such as
reformulation, repetition; and silent pause length (see below). In addition, one other
line (the %snd line) was included. It simply contained start and finish time information for each AS unit, given down to millisecond level. This was important for the
computation of speed fluency measures.
A sample of coding is provided in the following:

*LVY: One day &m [/] morning, Bitzer is helping her [///] his (0.836) farmer to
(0.674) number the &sh [///] flocks.
%snd: 00.04.41 00.14.14
%LVY: |One day {&m} *morning, Bitzer is helping {her} rpl his (0.836) farmer errfr:::
to (0.674) number the {&sh} rpl flocks. err_2_l:;;a |
This sample coding shows the three lines that have been described, as well as examples
of repair coding (* for repetition, rpl for replacement), some unfilled pauses, and
two clauses, one coded as error-free, the second coded for a medium gravity lexical
error (err_2_l). There is also a coding for a main clause ::: and coding for a nonfinite subordinate clause occurring after the main clause :;;a. Incomplete or modified
contributions are shown enclosed in curly brackets. Chapter 1 provides further information on this system of conventions.
Four Ph.D. students of applied linguistics coded the data.1 The researchers
arranged training and regular meetings to discuss the coding issues with them every
two weeks while the coding was taking place. One of the researchers also sampled 10%
of the coded data from each coder for reliability checking before each meeting. Once
there was any disagreement, the researchers and coders tried to reach consensus during the meetings and keep consistency of coding.
Measures
The combination of CLAN and TaskProfile software meant that a wide range of measures were used in this study. These are shown in Table 3.
Research design and statistical analysis

The research design consisted of a three-way factorial design, and multiple d
ependent
measures. There were two within-subject factors, vocabulary difficulty and task
structure, and one between-subjects factor, time perspective. There were also a large
number of dependent variables, measuring structural complexity, language a ccuracy,
lexical content, repair fluency, breakdown fluency, and speed fluency. After an exploratory factor analysis to explore the data structure for the dependent variables, a
MANOVA was run in the first instance, followed by appropriate univariate tests, using
SPSS Version 17.
. The authors would like to thank Cai Jing, Gavin Bui, Christina Li, and Ren Hongtao for this
work.
167
Table 3. Measures of speech performance explored in this study

Complexity
Total Words
Total number of words in a speech sample.
Mean Length AS
The average number of words per AS unit
Main Clauses
Total number of main clauses per AS-unit.
Subordination
Amount of subordination clauses and verb

infinitives per AS unit (Foster et al. 2000).
Error Free Clauses
Total number of clauses which has no error in

syntax, morphology, or word order, etc.
Error Free Proportion
Total number of error free clauses divided by

total number of clauses.
Error Degree
Degree of error (level 1, 2, 3 from large gravity,

medium to small gravity judged by coders)
Error Type
Type of error (a, b, n, l, m, s, p, x)
Lexical
Type token ratio (Malvern & Richards 2002)
diversity
sophistication
Lambda
An index of lexical sophistication

(Meara & Bell 2001; Skehan (2009c)
speed
Speech Rate
Pruned words per minute
breakdown
AS Pausing_100
Number of pauses at the end of AS units per

100 words.
Mid Clause
Pausing_100
Number of pauses in the middle of AS units.
Repetition_100
Number of Repetitions per 100 words
Replacement_100
Number of lexical and syntactic replacements

per 100 words
Reformulation_100
Number of phrases or clauses per 100 words

repeated with some modification to syntax,
morphology, or word order.
False Starts_100
Number of false starts per 100 words
Accuracy
Fluency
repair
# Measures in bold are those reported in this chapter.
Results
This section will present the results of the study. After some preliminary discussion on
the measures to be included, first, descriptive information will be presented, followed
by MANOVA results, and then the subsequent univariate analyses.
The data collected in this study are part of an ongoing exploratory attempt
to establish the structure of second language performance, and to research which
measures best capture the different dimensions which have been identified. Tothat
Structure, lexis, and time perspective 169
end, the data were subjected to a series of factor analyses, for the four different
tasks which were used. In all analyses, clear factors emerged for accuracy and
complexity. The measures which showed highest typical loadings were the index
of error-free clauses and the measure of subordination per AS-unit respectively.
Accordingly, these will be included in the main statistical procedures. In addition,
lexical sophistication, indexed by the value for Lambda, will also be included since
it is fundamental to the research design. This leaves the major area of fluency, for
which a wide range of indices were available. Separate factor analyses of this area
suggested three sub-fluency dimensions: end of clause pausing (standardised to
100 words), mid-clause pausing (standardised similarly), and repair, best indexed
in the present case by number of false starts per 100 words. Interestingly, a speed
fluency factor did not emerge. Equally interestingly, the location of pausing generated separate factors. It appears that the influences on pausing at the end of a clause
are not quite the same as those which are concerned with pausing within a clause.
On the basis of the factor analysis, therefore, the most useful dependent variables
to include are:
Amount of subordination per AS-unit

Proportion of error-free clauses
Lambda as an index of lexical sophistication
AS Pauses per 100 words
Mid-clause pauses per 100 words
False starts per 100 words
Descriptive statistics for the data are presented in Table 4. The table shows mean
scores, standard deviations, and N sizes for the two groups from the between-subjects
condition, with scores for the four tasks on each variable used.
Given the presence of six dependent variables, as well as a complex threeway design with both between and within factors, the first step was to conduct a
MANOVA. The initial MANOVA showed clear significance (Pillais trace: p < .001).
In addition, tests of sphericity were acceptable. Accordingly, it was permissible to
proceed to the univariate tests. Regarding the between-subject condition of time
perspective, the results for those measures which attained significance are shown
in Table 5.
The significant results are for complexity, with means of 1.28 (Here-and-now)
versus 1.69 (There-and-then); AS boundary pausing (9.09 pauses per 100 words
versus 4.38 pauses) and false starts (as an index of repair, (with 0.87 false starts per
100 words for Here-and-now versus 1.33 for There-and-then). In other words, the
There-and-Then condition produced greater structural complexity, with a very strong
effect. There was also a significant difference with AS-boundary pausing, but not
Table 4. Descriptive statistics for condition, lexis, structure

Here and now (n = 45)
Complexity
Accuracy
Lexis
AS Pausing
Mid-clause Pausing
False starts
There and then (n = 27)
Freq. lexis
Less freq. lexis
Freq. lexis
Less freq. lexis
Structured
1.31
(.15)#
1.33
(.17)
1.80
(.27)
1.81
(.28)
Unstructured
1.26
(.16)
1.21
(.10)
1.58
(.27)
1.58
(.19)
Structured
.50
(.13)
.51
(.15)
.57
(.12)
.54
(.13)
Unstructured
.49
(.15)
.51
(.15)
.50
(.14)
.48
(.13)
Structured
1.49
(.36)
1.70
(.41)
1.69
(.37)
2.09
(.34)
Unstructured
1.57
(.36)
1.81
(.43)
1.48
(.34)
2.03
(.30)
Structured
8.81
(3.10)
8.87
(2.83)
3.92
(1.76)
4.14
(1.35)
Unstructured
8.96
(2.90)
9.70
(2.89)
4.71
(1.72)
4.75
(1.78)
Structured
10.27
(4.02)
11.63
(5.34)
10.94
(6.87)
11.50
(5.56)
Unstructured
9.92
(4.75)
12.03
(4.73)
10.56
(5.35)
12.72
(6.59)
Structured
0.84
(0.90)
1.00
(1.01)
1.33
(1.02)
1.17
(1.08)
Unstructured
0.69
(0.88)
0.96
(0.94)
1.02
(0.82)
1.78
(1.26)
# Standard deviations are given in parentheses
Table 5. Effects of time perspective

Measure
Here-and-now
There-and-then
SD
SD
Sig
Complexity
1.28
0.15
1.69
0.25
94.56
.001
AS Pausing
9.09
2.93
4.38
1.65
85.14
.001
False Starts
0.87
0.93
1.33
1.05
9.16
.004
with m
id-clause pausing, suggesting that the difference in pausing between these
two conditions is located at the point where pausing can be considered to be more
appropriate (Skehan 2009b). Finally, the There-and-then condition produced more
repair, although the absolute values here were not very great.
Next, we turn to the within-subject effects and those for interactions. Once again,
only the significant results are shown. (As it happens, there were no results close to
significance. Results were either clearly significant or clearly non-significant here.) The
relevant results are shown in Table 6.
Table 6. Effects of structure
Measure
Structured tasks
Unstructured tasks
SD
SD
Sig
Complexity
1.51
0.21
1.37
0.17
44.01
.001
Accuracy
0.52
0.13
0.50
0.14
4.67
.04
AS Pausing
6.96
2.41
7.53
2.45
10.77
.002
Structure has a major impact on complexity, with means of 1.51 (Structured tasks)
versus 1.37 (Unstructured tasks), and end-of-AS unit pausing, with means of 6.96 for
Structured tasks versus 7.53 for Unstructured tasks, suggesting that the Structure condition produces much greater subordination, and also that the amount of pausing at
AS Unit boundaries is much reduced. Speakers in this condition appear to produce
much denser language, with complex organisation within propositions. They also
manage to organize this discourse effectively, with fewer pauses at the end of clauses.
In addition, there is an accuracy effect, although this is not so strong (with means of
0.52, the proportion of error free clauses for Structured tasks vs 0.50 for Unstructured
tasks). In other words, the structure condition does support less error (or to put this
another way, the unstructured condition provokes more error), suggesting that a processing condition is able to raise both aspects of form. We will return to this below.
However, the correlations between accuracy and complexity for the four tasks do not
provide strong evidence that this was sustained at the individual level, with two positive and two negative correlations between accuracy and complexity, none of which
reached significance.
The findings for Lambda, as a measure of lexical sophistication, are shown in
Table 7. As before, only variables with significant effects are shown. There was a significant difference between the putatively high vocabulary demand and low vocabulary
demand conditions (Table 7), confirming that this condition did have the impact on
performance that the experimental design was expected to produce. In other words,
the two hard vocabulary videos, Tooth Fairy and Off the Baal, did lead to the use of
less frequent vocabulary (which produced means of 1.87 measured by Lambda), while
the other two videos, Bathtime and Fetching, were associated with the use of more
frequent vocabulary (with means of 1.55 lambda). (In passing, it could be noted that
there was no difference in D, as an index of lexical diversity, confirming the results
reported in Skehan 2009b).
171
172
Table 7. Effects for lexis

Measure
Freq. lexis
tasks
Lexis
Mid-clause pausing
False Starts
Less freq. lexis

tasks
SD
SD
Sig
1.55
0.36
1.87
0.38
93.961
.001
10.35
5.06
11.94
5.44
11.014
.002
0.93
0.90
1.17
1.05
12.296
.001
There are also significant interactions for vocabulary with other variables, specifically mid-clause pausing (with significantly more mid clause pausing for hard vocabulary tasks) and false starts (with significantly more false starts for hard vocabulary
tasks too). In other words, the condition which provoked the use of less frequent
words was associated with more breakdown in the middle of a clause, and also the
need to use more repair. There was no significant result for end-of-AS unit pausing, only pausing where specific lexical choices might be an issue. Equally interesting is the lack of any significant interaction with complexity and accuracy. It does
appear that using less frequent vocabulary had an impact on performance, but this
was related to mid-clause pausing, rather than disrupting structure-building or influencing accuracy. This conflicts with the results reported in Skehan (2009b), although
this might be considered a favourable outcome from a pedagogic perspective, since
it implies that scrutinising teaching tasks for lexical demands may not be as vital as
Skehan (2009b) proposed.
Finally, we turn to interactions. The results are presented in Table 8, and values are given for Structure-by-Condition, Vocabulary Difficulty-by-Condition, and
Structure-by-Vocabulary Difficulty.
Table 8. Interaction effects for time perspective, structure, and lexical difficulty
Structure-by-time perspective
Complexity
Accuracy
Lexis
HnN
TnT
HnN
TnT
HnN
TnT
Struct.
1.32#
(0.16)
1.81
(0.28)
0.50
(0.14)
0.56
(0.13)
1.59
(0.39)
1.89
(0.36)
Unstruct.
1.24
(0.13)
1.58
(0.23)
0.50
(0.15)
0.49
(0.14)
1.69
(0.40)
1.75
(0.32)
F = 8.96; p = .004
# Mean (SD)
F = 5.40; p = .024
F = 10.10; p = .001
Vocabulary difficulty-by-time perspective

Lexical sophistication
HnN
TnT
Freq. Lexis Tasks
1.53 (0.36)
1.59 (0.37)
Less Freq. Lexis Task
1.76 (0.42)
2.06 (0.32)
F = 6.89; p = .012
# Mean (SD)
Structure-by-vocabulary difficulty
False starts
Freq. lexis tasks
Less freq. lexis task
Structured
1.03 (0.95)
1.07 (1.04)
Unstruct.
0.82 (0.86)
1.28 (1.07)
F = 4.44; p = .04
There are three significant results for the Structure-by-Condition interaction.

The picture appears to be quite consistent: the There-and-then Structured condition
produces noticeably more elevated performance than any of the other combinations.
The combination of a structured narrative being done under the There-and-then
condition is particularly effective for each of the three performance areas. However,
there is some variation. The clearest effect is with complexity. Against the background
of a main effect for time perspective, the additional contribution of a structured narrative produces a much higher level of subordination. It appears that the full effects
of time perspective need a structured task to manifest themselves, while, equally, the
Here-and-now condition is so powerful in the other direction that it seems to prevent
the advantages of a structured task from manifesting themselves. In other words, while
structure contains the potential for the speaker to indicate connections and use subordination, this potential is more difficult to realise when input has a more dominating influence in the Here-and-now condition. Accuracy shows the same pattern, but
perhaps a little less strongly: the advantage of the joint condition here is not so great.
Interestingly, also, the joint condition here has no effect on any of the fluency indices
such effects as were found with fluency were simple main effects. But in general, it is
clear that any aspect of form, accuracy or complexity, benefits considerably when the
major variables in this study are combined in joint conditions.
The results for the interaction between lexis and time perspective follow a similar
pattern, essentially. Against the background of a general effect for the manipulated
173
variable of vocabulary, the There-and-then condition leads to the use of even more
less frequent words. So it accentuates the effect of vocabulary demands. Finally there
is one significant interaction result involving a fluency measure that of false starts as
an index of repair, and here the relevant variables at work are structure and vocabulary difficulty. Harder vocabulary and an unstructured task lead to a particularly great
amount of repair, while the easy vocabulary unstructured task leads to least repair. The
strength of this effect, though, is not great.
Discussion
Given the length and complexity of the Results section, it may be worth restating the
main findings first:
The There-and-then condition produced greater language complexity, fewer
pauses at clause boundaries, and more repair.
The Structured tasks generate greater complexity, less pausing at AS boundaries,
and more accuracy.
The lexical difficulty conditions produced greater use of less frequent words, more
mid-clause pausing, and more repair.
The interaction of the There-and-then and the Structured Conditions generated
more complexity, and more accuracy.
The interaction of the There-and-then and lexical difficulty conditions led to particularly greater use of less frequent words.
The interaction of structure and vocabulary difficulty led to greater repair.
First, we will discuss the significant main effects. The There-and-then condition produced more complexity, less pausing at AS boundaries, and more repair; however,
there was no increase in accuracy. So once again, regarding the Cognition Hypothesis,
there are mixed findings. Complexity was higher, as predicted, as was fluency (opposite to the Cognition Hypothesis prediction) and accuracy was unaffected (again, not
consistent with Cognition Hypothesis predictions). In some ways, it is easier to start
with the Here-and-now condition, which did not produce elevated performance in
any way. As discussed earlier, the Here-and-now condition may have the advantage
of presence of material to be described (and so lack of memory burden), but it has
considerable disadvantages. Most important are the dominance of the input and the
way this input, if all attended to, is remorseless while the video is running. The major
consequence of this is that the speaker has little time to repackage ideas, or to be selective as to what will be said. They are forced to maintain a descriptive immediate level
without an opportunity to shape their contribution. In the There-and-then condition,
in contrast, memory demands may be higher, but the speaker has the opportunity to
be selective, to organise what is being said, and even to indicate causality and character
intentions. This is not because the task (i.e. the video narrative) is more complex: it is
simply because it is crucially different. As a consequence, there is very clearly more
complex language, whether indexed by the AS-based measure of subordination or by
the number of words in clauses. We would argue, in other words, that the results for the
time perspective comparison do not provide support for the Cognition Hypothesis. In
contrast, they are consistent with pressures of psycholinguistic processing, and with
the Trade-off Hypothesis. The Here-and-now condition, in other words, deprives the
Conceptualiser of potential depth, while the There-and-then condition does enable it
to work, and to do better, despite memory limitations.
Lexical difficulty impacts upon three variables across the board: vocabulary itself,
mid-clause pausing, and false starts. This is an interesting effect. Fluency has many
sub-dimensions (breakdown fluency, repair fluency, speed, automatisation: Skehan
2009b). It seems the effect of using less frequent lexis is to disrupt processing midclause, when unexpected problems of lexical selection may be thrust upon the speaker.
Not surprisingly, perhaps, repair is also associated with the engagement of more challenging lexis. In other words, the need to access less frequent words from the second
language lexicon disrupts automatisation in performance. Retrieving such words, and
the important information contained in lemmas that enables syntax building, has a
price, and this is most clearly reflected in the extent to which language is produced
atuomatically. (Length of run and overall speed were also significantly affected.)
The effects of structure are also very clear. The two structured tasks lead to higher
complexity and lower end-of-clause pausing, as well as greater accuracy. Both aspects
of form are affected, in fact, together with greater fluency, specifically with regard to
normal pausing. This does suggest relatively smooth processing, and a capacity to
approach more parallel processing.
As interesting as these main effects may be, the interactions are even more so.
First, there is the interaction between Structure and Time Perspective. Tasks which
are both structured and There-and-then elicit language which is more complex and
more accurate. In other words, under the less demanding processing conditions of
There-and-Then production, and if the task eases Formulator operations by providing
a clear macrostructure, it appears that second language speakers have more attention
available for all aspects of form. Greater lexical and structural complexity, and error
avoidance are the outcome of supportive conditions. Many second language speakers
may want to avoid error, but many things may get in the way. It appears that Hereand-Now processing or unstructured narratives each got in the way in this particular
study. If neither is operative, the surface of language can be given more attention and
error is reduced. It is interesting that while There-and-then and the structured condition, as main effects, impact upon complexity, they also have a synergistic, interactive
175
effect. Itis possible that, as in Wangs study (this volume) with her supported on-line
planning condition, the Conceptualiser and Formulator work in harmony for second
language speakers.
In addition, the correlations between structural complexity and accuracy (a relationship of some significance to the Cognition Hypothesis) are quite intriguing. The
correlations for the two unstructured tasks are close to zero, suggesting independence
of the two performance areas. For the structured tasks, under the There-and-then
condition, the correlations are 0.31 for Bathtime (difficult vocabulary) and 0.42 for
Tooth Fairy (easier vocabulary), the latter one being significant at the 0.05 level. The
correlation may not be particularly high, accounting for only 16% of the variance,
but this is the first time that a joint effect has been demonstrated plus a correlation at
the individual level. We still argue that these results are the consequence of separate
variables working together. We do not see, for example, why time perspective linked to
structure should lead to greater task complexity, even though we accept that Cognition
Hypothesis supporters might offer a different interpretation.
Equally interesting is the interaction effect between lexical difficulty and time perspective, even if this is quite limited. More words of lower frequency are used in the
There-and-then condition, suggesting that the lack of Here-and-now time pressure,
together with the ability to repackage ideas, creates sufficient attentional capacity for
wider lexical retrieval. The greater flexibility in this condition enables second language
speakers greater opportunity to search for less obvious lexical choices. There is also an
interaction between structure and vocabulary, but only for the dependent variable of
repair. The focus for this interaction is a slightly raised repair mean score for the difficult vocabulary, unstructured combination (i.e. the most challenging combination).
It appears that this combination pushes learners into a greater need to modify the language they have produced. This effect though does not impact on any other variables,
such as mid-clause pausing or accuracy.
We now need to look at the results more generally, to determine what these patterns
reveal about the nature of second language speaking. The alternative models we have
considered contrast a viewpoint that task complexity, free of attentional limitations,
drives performance (the Cognition Hypothesis), and a viewpoint that certain influences on psycholinguistic processes (e.g. through task features and task conditions),
and subject to attentional constraints, lead to systematic differences in performance
(the Trade-off account). In general, the results of the study do not sit well with the Cognition Hypothesis. The Cognition Hypothesis prediction of greater complexity with the
There-and-then condition is fulfilled, but the prediction of accuracy is not, and worse,
the results for fluency are the opposite of prediction. There-and-then conditions elicited
more false starts in learners speech performance in a sense, a less fluent accomplishment, and the opposite to the higher fluency prediction of the Cognition Hypothesis.
In addition, the analysis provided earlier suggests that it is by no means obvious that
the There-and-then condition produces a more complex task. It has been argued that it
is, simply, different, and the contrasts in performance are associated with the difference
rather than greater task complexity (as the next paragraph makes clear). The other variables in play are not so central to the Cognition Hypothesis. Task Structure has come
to be included in more recent accounts of the Cognition Hypothesis as a resource-
dispersing variable (Robinson 2011), but in a way which is not seen as integrally linked
with task complexity and with facilitating specific form-function mappings. This variable does not straightforwardly plug into clear predictions. Perhaps the role of less frequent lexis would be interpreted as increasing task complexity, in which case it should
raise language complexity and accuracy as the Cognition Hypothesis would predict. If
this is the case, the results of this study are not supportive, since the condition provoking use of less frequent lexis has no impact on complexity and accuracy scores.
A Trade-off interpretation of the differences between Here-and-now and Thereand-then emphasizes different aspects. First, there is the impact of input pressure in
a Here-and-Now condition, since the amount of material which has to be handled,
understood, processed, and expressed is considerable. The input keeps coming, and
so momentary attempts on the speakers part to encode things through language are
put under pressure by newly incoming input. Just as one set of propositions may be
assembled, a new set of pressures arrive. These factors have considerable potential for
disruption. In contrast, in the There-and-then condition, there is no immediate pressure from incoming input. The speaker can be selective and choose to encode whatever
they like. In this way, they can, possibly, orient the story to their own strengths and
their linguistic knowledge. They can also repackage the material from the video and
even make links between different sections. They can interpret motives and focus
more clearly on the point of what is happening. (In passing, it should be noted that
there was no difference in overall number of words between the two conditions, and
speakers did, in both conditions, generally try to do justice to the stories concerned.
Neither condition was clearly worse than the other in general narrative quality.) So,
the difference in burden of input processing was marked. The second issue is memory.
In a sense, the Here-and-Now condition makes less demands on memory in general
(although working memory operations are intense), since what is narrated reflects
what is immediately shown on screen. In contrast, the There-and-then condition is
demanding in a different way, because there is nothing to refer to; yet a six minute
video has to be narrated. The story has to be kept in mind while the retelling proceeds.
This makes demands, but it is not clear how severe these demands are; in addition,
structured narratives by definition are organised, which may facilitate the retelling.
The Cognition Hypothesis sees the memory demands as the crucial aspect that push
for greater task complexity. From the perspective of the Trade-off Hypothesis this is
less obvious, because this hypothesis attaches greater importance to the (damaging)
processing demands of the Here-and-now condition.
177
A Trade-off interpretation is fairly clear when it comes to the other two variables,
structure and less frequent lexis. A general interpretation for structure effects (e.g.
Skehan & Foster 1999; Tavakoli & Skehan 2005) is that the speakers capacity to rely
on knowledge of macrostructure removes the need to engage in broad brush planning
during performance, since the overall shape of a story is known. As a result, attention is
available to focus on the surface features of language. In other words, a significant part of
what the Conceptualiser component has to do is clear, straightforward, and undemanding, so that Formulator concerns can be prioritized. Interestingly, in the present study,
the general results are modified in two ways. First, there is a main effect of complexity (and some measures of fluency). It appears that participants have responded to the
potential of structured tasks to indicate more complex relationships by using more complex subordination. Second, the accuracy effect only occurs in the There-and-then condition. It appears to be the case that Structure can also have an organising effect which
facilitates language complexity. But where accuracy is concerned, it seems that the input
dominance of the Here-and-now condition washes out any structure effect. For structure to enhance accuracy, minimum attentional conditions must be able to operate, and
the There-and-then condition provides these, because there is space for the speaker to
use organisation and planning. This is largely a Formulator-based explanation, but it is
clear that structure is not powerful enough a variable to work in all conditions.
The need to use less frequent lexis has a similar Formulator-based explanation.
The Formulator stage in speech production has lexical and then syntactic phases,
where the lemma retrieval from the mental lexicon drives the building of syntactic
frames, assuming rich information (beyond simple word meaning) is available in the
lemma. Consistent with Skehans (2009b) post-hoc suggestions from previous work,
the need to access less organised, less robust, and less elaborate lemma information
derails performance, reducing mid-clause fluency, and leading to more repair. The
use of easier vocabulary, at least in the sense of more frequent vocabulary, enables
smoother processing, and perhaps a greater approximation, on the second language
speakers part, to parallel processing as opposed to the need to engage in a more serial,
effortful processing.
Skehan (2009a) offers an account of second language task-based spoken performance organised in terms of the Levelt model of first language speaking. The account
is based on the evidence which has accumulated through the range of task-based
research over the last twenty-five years. The Conceptualiser, Formulator-Lexis and
Formulator-Syntax stages are the spine of this account; the various influences are categorised as complexifying (in that they provoke the use of more complex language);
pressuring (in that they create more demanding processing conditions); easing (which
is essentially the reverse of pressuring); and focussing (in that accuracy is selectively
given greater priority in attention). We can pursue this approach here to try to integrate the findings from the present study into this account.
What is particularly interesting is that two of the variables, time perspective and
structure, each appear twice in this Figure. This is shown in Figure 1 where the variables in question are italicised. We will deal with these first in the discussion.
Complexifying,
Pressuring
Levelt Stage
Easing, Focussing
There-and-Then
Structure
Conceptualiser
Less frequent lexis
Formulator: Lexis
More frequent lexis
Here-and-Now
Formulator: Syntax
Structure
There-and-Then
Figure 1. The impact of time perspective, structure, and lexical frequency on second language
spoken performance
Regarding time perspective, it is helpful to discuss Here-and-now and Thereand-then separately, and not as different points on a continuum, or as different polarities on the same dimension. There-and-then can impact upon the Conceptualiser
and enable the speaker, through the freedom in time and negotiability of the task that
are available, to produce denser, more organised syntax, with more subordination
and with more extensively developed clauses (as indexed by number of words). The
speaker is not forced into a lower level of immediate description and, in contrast, can
show links and causality and motives in a more satisfactory manner. Similarly, structure in the task enables the speaker to have a wider perspective on what is happening
and to do the story more justice, again to show links, since there is less domination by
immediate events which have to be recounted. Events form a pattern, and this pattern
can then be reflected in the complexity of the language which is used. So, both these
influences on the Conceptualiser are positive and lead to an interpretation of the task
in which more challenging propositions are expressed.
But both There-and-then and Structure also have an easing role. There-and-then
essentially achieves this by not being Here-and-now. In other words, the lack of time
pressure eases processing, and removes the need to continually respond to new input.
The speaker, as a result, has the freedom to choose what to say and to devote attention to
saying it. This is distinct from the capacity to respond to the latent complexity of the story.
With structure, easing derives from the way that a clearer macrostructure allows speakers
to know where they are in the wider discourse, so that they can devote more attention to
the details, to the surface of language, rather than wrestling with extensive conceptualisation. They are licensed, as it were, to get on with sub-sections of the story that they have
to relate, more confident that they can rejoin the main narrative drive without difficulty.
179
In contrast, there are two pressuring influences. The first is the need to use less
frequent lexis. This impacts upon the Formulator at the lemma access stage. Effective, automatised, parallel communication is disrupted when lexical items are needed
which are not instantly available. When these are encountered, the speaker has to do
something to solve a processing problem, and the need for attentional resources to
do this has an impact on the ongoing processing. This is where what would ideally be
parallel processing becomes serial in nature (Kormos 2006). Since communication
has to continue as smoothly as possible, this disruption has a major impact on the
cycles of Conceptualisation, Formulation, and Articulation which normally proceed
in parallel. Instead of a modular system working effectively, particular stages (in
this case the lemma retrieval stage) interfere with ongoing processing. The second
pressuring influence is the need to engage in Here-and-now processing. In this case,
at least in the context of a video-based narrative retelling, the quantity of incoming (non-verbal) input which has to be encoded is considerable, and the (limited)
second language speaker has difficulty in analysing this input, extracting what is
important, and then formulating a response quickly while more input is arriving.
The result is that performance is disrupted (cf. the large phonation time differences,
the large average pause time differences) as the pressures of communication become
too great.
Happily, at least, not all the influences in the present study are so demanding.
Structure has an easing role at the Formulator stage, for reasons given earlier. The
clarity provided by structure eases Formulator operations. The same applies to more
frequent lexis. This too provides accessible material during speaking, so that the second language speaker is more likely to be able to sustain parallel processing, at least
some of the time.
Conclusions
As an extension of the model provided in Skehan (2009a), which is developed further
in the final chapter of this volume, the present study helps to clarify the influences on
second language task performance which are most salient. The range of influences
whose operations are now beginning to be understood is growing. This enables more
precise predictions of task performance to be conducted in the future, because we
have improved analytic schemes to understand tasks as research units. In addition,
the way these influences can be related to the psycholinguistic processes described in
the Levelt model of first language speaking gives them even greater validity. Models of
second language speaking which are not grounded in such psycholinguistic processes
are inevitably weakened.
But there are also some pedagogic implications. In general, the results add to our
understanding how task choice, based on, for example, degree of structure and nature
of lexical content, and task conditions, such as time perspective, have systematic relationships on different performance areas, such as language complexity, accuracy, and
fluency. Hence, if teachers wish to promote one of these areas particularly, then results
such as those reported here can make a contribution. There are, though, some specific
areas which are worth highlighting. The results of this study while broadly supporting
previous research regarding the way structure promotes accuracy, also suggest that
structure may have some contribution to make to language complexity. This suggests
that structure can function similarly to information integration (Tavakoli & Skehan
2005; Tavakoli & Foster 2009) under certain conditions, making it a very useful tool
for teachers wishing to engender a focus on form into their communicative activities.
Time perspective also has some interesting implications. It would appear that where
video-based material is concerned, there is little to be said for the Here-and-now condition. This seems generally to create conditions which pose problems and depress levels
of achievement. The There-and-then condition seems much more supportive of pedagogic work, at least where the intention is that learners should be supported in gaining
control over newer or less salient language. Correspondingly, greater understanding
of what constitutes pressure enables more effective pedagogic decision-making where
the intention is to help learners mobilise their abilities to cope with pressure.
Finally, the results for lexical frequency might have the greatest relevance for pedagogic decision-making. If we assume that less frequent lexis is more difficult lexis, then
tasks which draw upon such lexis create difficulties for learners. They seem to lead to
lower levels of automatisation. In interaction with the There-and-then condition, they
also create problems for accuracy. If it is possible to predict which tasks will draw upon
difficult lexis, there is the possibility of pre-teaching such lexis so that when the task is
done, the lexis will become more easily available and the task can run more smoothly. To
put this another way, with tasks in classrooms and with tasks in research, it seems very
important to investigate the lexical demands that will be made. In classrooms, overly difficult lexical demands might compromise the pedagogic usefulness of tasks. In research,
overly difficult lexis (which is unidentified as such) in a particular experimental condition
can introduce unwanted variance, compromising interpretations of any research results.
Acknowledgments
This work was supported in part by The Research Grants Council, Hong Kong (grant
number 450307). The authors would like to thank Martin Bygate, John Norris, and
Kris Van den Branden for their helpful comments on an early draft of this.
181
References
Brown, G., Anderson, A., Shilcock, R., & Yule, G. (1984). Teaching talk: Strategies for production and
assessment. Cambridge: CUP.
M.Bygate, P. Skehan, & M. Swain (Eds.), Researching pedagogic tasks: Second language learning,
teaching, and testing (pp. 2348). Harlow: Longman.
Ellis R. (2003), Task-based language learning and teaching. Oxford: OUP.
accuracy in L2 oral production. Applied Linguistics, 30, 474509.
Foster, P., & Skehan, P. (1999). The influence of source of planning and focus of planning on taskbased performance. Language Teaching Research, 3, 185214.
Foster, P., & Skehan, P. (2013). Anticipating a post-task activity: The effects on accuracy, complexity
and fluency of L2 language performance. Canadian Modern Language Review 69, 3, 249273.
Foster, P., & Tavakoli, P. (2009). Native speakers and task performance: Comparing effects on complexity, fluency and lexical diversity. Language Learning, 59(4), 886896.
Foster, P., & Tavakoli, P. (2011). Task design and second language performance: The effect of narrative type on learner output. Language Learning, 61(suppl.1), 3772.
Gilabert, R. (2007). Effects of manipulating task complexity on self-repairs during L2 oral production. International Review of Applied Linguistics, 45, 215240.
Hinkel, E. (2004). TOEFL test strategies with Practice Tests (3rd ed.) Hauppauge, NY: Barrons.
Ishikawa, T. (2006). The effects of task complexity and language proficiency on task- based language
performance. The Journal of Asia TEFL, 3(4), 193225.
Iwashita, N., McNamara, T., & Elder, C. (2001). Can we predict task difficulty in an oral proficiency
test? Exploring the potential of an information-processing approach to task design. Language
Learning, 51(3), 401436.
Erlbaum Associates.
Kuiken, F., & Vedder, I. (2007). Task complexity and measures of linguistic performance in L2 writing. International Review of Applied Linguistics, 45(3), 261284.
Kuiken, F., & Vedder, I. (2008). Cognitive task complexity and written output in Italian and French
as a foreign language. Journal of Second Language Writing, 17, 4860.
MacWhinney, B. (2000). The CHILDES Project: Tools for Analyzing Talk (3rd ed.). Mahwah, NJ:
Lawrence Erlbaum Associates.
Meara, P., & Bell, H. (2001). P_Lex: A simple and effective way of describing the lexical characteristics of short L2 texts. Prospect, 16, 519.
Michel, M., Kuiken, F., & Vedder, I. (2007). the influence of complexity in monologic versus dialogic
tasks in Dutch L2. International Review of Applied Linguistics, 45, 241259.
Ortega, L. (1999). Planning and focus on form in L2 oral performance. Studies in Second Language
John Benjamins.
Rahimpour, M. (1997). Task complexity, task condition, and variation in L2 oral discourse. Unpublished Ph.D. thesis. University of Queensland, Australia.
Richards, B.J., & Malvern, D.D. (1998). A new research tool: Mathematical modelling in the measurement of vocabulary diversity (Award reference no. R000221995). Final Report to the Economic
and Social Research Council, Swindon, UK.
Robinson, P. (1995). Task complexity and second language narrative discourse. Language Learning,
45, 99140.
Robinson, P. (2001a). Task complexity, task difficulty, and task production: Exploring interactions in
a componential framework. Applied Linguistics, 22(1), 2757.
Robinson, P. (2001b). Task complexity, cognitive resources, and syllabus design: A triadic framework
for examining task influences on SLA. In P. Robinson (Ed.), Cognition and second language
instruction (pp. 287318). Cambridge: CUP.
Robinson, P. (2007). Task complexity, theory of mind, and intentional reasoning: Effects on L2
speech production, interaction, uptake and perceptions of task difficulty. International Review
of Applied Linguistics, 45, 193214.
Robinson, P. (2011).Task-based language learning: A review of issues. Language Learning, 61(Suppl.1,
June 2011), 136.
Robinson, P., & Gilabert, R. (2007). Task complexity, the cognition hypothesis and second language
learning and performance. IRAL, 45, 161176.
Sawaki, Y., Stricker, L.J., & Oranje, A.H. (2009). Factor structure of an internet-based test. Language
Testing, 26(1), 530.
tasks. In H. Daller, D. Malvern, P. Meara, J. Milton, B. Richards, & J. Treffers-Daller. (Eds.),
Vocabulary studies in first and second language acquisition: The interface between theory and
application (pp. 107124). London: Palgrave Macmillan.
Skehan, P. (2009c). Models of speaking and the assessment of second language proficiency. In
A.Benati. (Ed.), Issues in second language proficiency (pp. 202215). London: Continuum.
Skehan, P. (2011). Researching tasks: Performance, assessment, pedagogy. Shanghai: Shanghai Foreign
Language Education Press.
Skehan, P. (manuscript). Conventions for coding complexity, accuracy, fluency and lexis: The use of
TaskProfile. The Chinese University of Hong Kong.
Skehan, P., & Foster, P. (1997). The influence of planning and post-task activities on accuracy and
complexity in task-based learning. Language Teaching Research, 1(3), 1633.
Skehan, P., & Foster, P. (2008).Complexity, accuracy, fluency and lexis in task-based performance: A meta-analysis of the Ealing research. In S. Van Daele, A. Housen, F. Kuiken,
M. Pierrard, & I. Vedder. (Eds.), Complexity, accuracy and fluency in second language use,
learning and teaching (pp. 263284). Brussels: Royal Flemish Academy of Belgium for Sciences
and Arts.
Tavakoli, P., & Foster, P. (2008). Task design and second language performance: The effect of narrative type on learner output. Language Learning, 58(2), 439473.
183

R.Ellis (Ed.), Planning and task performance in a second language (pp. 239276). Amsterdam:
John Benjamins.
Van den Branden K. (2006). Task-based language education: From theory to practice. Cambridge:
CUP.
VandenBranden,K., Bygate M., & Norris, J. (2009). (Eds.), Task-based language teaching: A reader.
Amsterdam: John Benjamins.
Wang, Z. (2009). Modeling L2 speech production and performance: Evidence from five types of planning
and two task structure. Unpublished Ph.D. thesis. The Chinese University of Hong Kong.
Yuan, F., & Ellis, R. (2003).The effects of pre-task planning and online planning on fluency, complexity, and accuracy in L2 monologicoral production. Applied Linguistics, 24, 127.
Appendix A Task Instructions

Instructions for There-and-Then narration
You will see several short cartoon films. Each film lasts for about 5 minutes or so. The
films will be played only once. After watching each of them, you are required to tell
a story based on what you have watched to someone who has never seen it. You are
required to:
i. Recall what you have watched; and tell the story as clearly as possible and in the
order that the film was presented.
ii. Since you will not have time to plan how to tell the story, you should be thinking
about the story retelling even when you are watching the film.
iii. Please narrate the story from a 3rd person view point. Instead of using I, please
use Shaun, Bitzer for example, as the names of the main sheep. (A card of the
main sheep characters with their names is provided.).
iv. Please use the past tense. Here are some examples:
Shaun entered the room.
He wanted to call his friends.
Instructions for Here-and-Now narration

You will see several short cartoon films. Each film lasts for about 5 minutes or so.
While watching, you are required to tell the story to someone who has never seen the
film. You are required to:
i. Since you are required to tell the story simultaneously while the film is playing,
when the film starts (after the theme song), you should be ready to start telling the
story; when the film stops, you should be finishing your story too.
ii. Tell the story as clearly as possible and in the order that the film is presented.
iii. Please narrate the story from a 3rd person view point. Instead of using I, please
use Shaun, Bitzer for example, as the name of the main sheep. (A card of the
main sheep characters with their names is provided).
iv. Please use the present tense. Here are some examples:
Shaun enters the room.
He wants to call his friends.
185
chapter 7
Structure and processing condition

in video-based narrative retelling
Peter Skehan & Sabrina Shum
St. Marys University, Twickenham / Chinese University of Hong Kong
This chapter reports on a study of video-based narrative retellings, in which the

major variables are degree of structure and the nature of the processing conditions
under which the retellings were done. The two variables were manipulated in a 44
design. Four Mr. Bean video clips were used, with different levels of structure, ranging
from no structure to a clear, well organised problem-solution structure. In addition
to a control group, there were two online processing conditions (opportunity to
pause, and provision of a summary before the task), and one offline Watch-then-Tell
condition. The results of the study show that two of the online conditions had some
mitigating influence, that is, the opportunity to pause the video, and the provision of
a summary before the video was seen. More structured narratives and less pressured
processing conditions produced more accurate and more complex performances.
The same influences lead to less end-of-clause pausing but more reformulations.
The results are discussed in terms of the Levelt model of speaking, applied to second
language performance.
Introduction
In some earlier second language task-based research, Skehan and Foster (1997, and
see also Foster & Skehan 1996) proposed that tasks which contain structure are associated with more accurate performance. They made this inference, post-hoc, after retrospectively analysing tasks which were found to contain structure and elicit higher
accuracy in performance. This inference led to the design of a new study (Skehan&
Foster 1999), in which task structure was varied more systematically. In the earlier
studies, the tasks were either personal information exchange tasks or narrative retellings based on a set of cartoons. In Skehan and Foster (1999), the stimulus material
consisted of two Mr. Bean videos, which differed in the structure of the story that was
shown. In one, Mr. Bean plays Crazy Golf , after a conventional beginning, Mr.Bean
hits the golf ball outside the course, and slavishly following an admonition not to
188 Peter Skehan & Sabrina Shum
touch the ball under any circumstances, endures a series of chaotic adventures which
have no development or connection, until he finally completes the course. In contrast, Mr.Bean goes to a restaurant follows a familiar schema (the restaurant script)
in which unpredictable things happen, but against the structure of a familiar set of
events, that is, dining out in a fancy restaurant. The results in this study partly followed
the predictions, but not wholly. There was an increase in accuracy, but not as consistently as was predicted.
Two possible interpretations of this study presented themselves. The first is that
video-based retellings are, for some reason, not so supportive of demonstrating experimental effects of structure. A second possibility is that the concept of structure should
be more carefully analysed than had been done for the Skehan and Foster (1999)
study. Accordingly, Tavakoli and Skehan (2005) analysed structure more subtly. They
suggested a number of levels of structure. These were:
no structure: an unrelated series of events

a story which has a clear beginning, middle and end
a story which has a familiar script, e.g. a restaurant script
a story which has clear causal links to drive development
a story which has a problem-solution structure (and therefore contains causal
links, but within a tighter and larger structure)
It was reasoned that there is a scale of structure in these different forms of story, and
that the degree of looseness is reduced as one works through the scale.1 In other
words, the second and third steps in the scale have a certain arbitrariness, but conventions dictate how the story will develop, or at least the framework within which
it is told. The fourth and fifth steps, though, have a more logical, story-coherent progression, and therefore can be considered to be tighter. They derive from discourse
analysis work, e.g. that of the Winter-Hoey analysis of problem-solution structures
(Winter 1976; Hoey 1983), and reflect the way a story develops through a problem
arising from a situation, followed by a solution (or solutions) and then perhaps an
evaluation of the entire story.
Tavakoli and Skehan (2005) designed a study to explore the relevance of the scale
(omitting the third step, since it had been included in Skehan & Foster 1999). The study
used cartoon picture series, designed to represent the other four steps. The results were
partly consistent with the predictions. A scale with four distinct steps of structure
did not emerge clearly, but it was evident that discourse structure (i.e. the final two
. Tavakoli and Skehan (2005) proposed, for cartoon picture series, that this looseness can be
operationalised by the number of pictures in the series, other than the first or last, whose order can
be changed without impairing the capacity to tell the story.
Structure and processing condition in video-based narrative retelling 189
steps) did lead to greater accuracy, with or without planning, and at both proficiency
levels included. So there was something of a split between no structure and beginningmiddle-end organisation, on the one hand, and a causal (and problem-solution) structure, on the other. While the performance under the beginning-middle-end condition
was a little disappointing, more importantly, the notion of tight structure did survive.
Another aspect of the results is interesting. In one of the structured tasks (with a loose
problem-solution structure), there was also an increase in structural complexity. In
this particular narrative, to tell the story effectively, background and foreground information in various cartoon pictures had to be integrated. This was necessary in only
one of the four narratives used, and it was associated with the greater degree of subordination in the stories used, suggesting that the details of the story structure can raise
complexity in addition to accuracy. These factors also functioned similarly in Foster
and Tavakoli (2009) and Tavakoli and Foster (2008). These researchers report on a
study which included additional variables. Using cartoon picture series as in Tavakoli
and Skehan (2005), they included the variables of information organisation, native
versus non-native speakerness, and foreign language vs. second language contexts as
independent variables. For our current purposes, it suffices to say that these studies
confirmed the importance of structure as leading to increased accuracy in performance, and of information organisation as raising complexity.
Naturally, this raises the question as to why more structured narrative retellings
lead to increased accuracy. Skehan (2009a) draws on Levelts (1989, 1999) model of
first language speaking to account for this finding. Levelt distinguishes three broad
stages in speech production: Conceptualisation (where ideas are developed), Formulation (where ideas are transformed into language), and Articulation (where speech
is actually produced). In turn, the Formulation stage consists of a lemma retrieval
stage, followed by a morpho-syntax building stage which is contingent on the preceding lemma-retrieval sub-stage. The model is consistent with much evidence on speech
errors, pausing and hesitation phenomena, and slips of the tongue. The model has also
been widely applied to second language speech (de Bot 1992; Kormos 2006). Drawing on this analysis, Skehan (2009a) (and see also Wang & Skehan this volume, and
Skehan, Chapter 8, this volume), argues that structured tasks provide a clearer macrostructure for the story retelling. In this way, such tasks first circumscribe what needs
to be said (less demanding Conceptualisation). Next, Skehan (2009a) proposes that in
second language speaking, the basis for more accurate speech comes from the Formulation stage, and that structured tasks allow the speaker to draw upon the clearer macrostructure to devote more attention to the Formulator stage and, as a result,have more
attention for lemma retrieval (solving problems of retrieval, enabling more complete
lemma information) and also devote more attention to syntax building and monitoring
of performance. All this implies a view of attention and working memory as limited. In
other words, factors which ease the speech production task (e.g. structured tasks and
clearer macrostructure, which reduce the need for demanding Conceptualiser operations) release more attention which can be devoted to other speech production stages.
Reflecting on this literature, however, it seems that clearer effects for structure
have been found with cartoon story retellings, compared to video-based narratives.
However, the more sophisticated views of structure have largely been confined to such
cartoon retellings. This raises the question as to what would happen with video-based
retellings based on more theoretically defensible characterisations of structure. This is
more than a passing question. Video-based retellings are more demanding in processing terms. As the video is running, the speaker is exposed to a considerable amount of
input, which has to be understood and then repackaged as production. In a theoretical
model based on limited attention capacity, anything which eases processing problems
would be particularly welcome to speakers having to wrestle with the demands on
their attentional resources as part of a video retelling. Accordingly, one aspect of the
present research is to explore whether more fine-grained task features related to structure will still have an impact even under more demanding processing conditions.
It is also worth discussing the results related to complexity reported in Tavakoli
and Skehan (2005). In this case, the task contained information which, if the story
was to be told well, required the speaker to link different bits of information, and
this seemed to push the speaker to use more subordination. In Tavakoli and Skehan
(2005), the interpretation of results was in terms of information organisation in that
foreground and background information had to be integrated in order to tell the story
effectively. But problem-solution structures themselves require different propositions
to be linked, and their causal connections to be clarified. So, a problem-solution structure may have multiple effects the easing of processing through clear macrostructure leading to greater accuracy, and the push through the structure itself to express
relationships between elements more explicitly, thereby fostering subordination, and
raising measures of complexity based on this.
The previous discussion has concerned the nature of the tasks to be done. But it
is also worth exploring how differences in conditions can impact upon performance.
Once again, here, the starting point is the viewpoint that attentional resources are
limited, and so the conditions under which a task (e.g. a narrative retelling) is done
can be explored in terms of the attentional demands that they make. The most obvious contrast here is between what Robinson (2011) has termed Here and now, and
There and then. In the former, the stimulus material is present and can be used while
the task is running. In the latter, the stimulus material is absent, but has been seen, and
so the task is to speak without the input material available. Robinson (2001) uses this
analysis of time perspective to make some challenging predictions about the nature
of performance. Essentially, he characterises the Here-and-now condition as simpler,
and the There-and-then condition as more complex. Then he proposes that the more
complex condition will lead to better performance, and specifically raise accuracy and
complexity. This follows the Cognition Hypothesis claim that greater task complexity
pushes towards higher performance in particular areas (Robinson 2011). However,
most studies motivated by the Cognition Hypothesis have used static visual materials.
Most studies which explore this contrast in time perspective use tasks such as giving
map instructions, where the stimulus material does not change, even though it can
vary in degree of complexity.
The question is raised, then, as to what would be the influence of using videobased presentations: in this case, the Here-and-now condition is one where the input
material is available and the task is to tell the story in a way which is appropriate
to what is happening on screen. The There-and-then condition would be one where
the video has been seen, and immediately afterwards, the speaker has to retell the
story. All this introduces a new set of psycholinguistic demands, especially concerned
with time pressure. Skehan (2009a) argues that, in this case, there is a genre difference
intime perspective and that the There-and-then condition is simply different (rather
than more complex) than the Here-and-now condition. In other words, although
memory demands are greater under the There-and-then condition, the opportunities
for shaping the task and repackaging what one wants to say are much greater than in
the Here-and-now condition.
Basically, the driving force for the comparison at issue is the impact of time
pressure. This raises the possibility of retaining the time pressure dimension of Hereand-now but trying to attenuate it in some way. Two such methods can be proposed
here. The first directly addresses the issue of time pressure itself. Ellis (2005) and Ellis
and Yuan (2004) have proposed the construct of on-line planning to capture what
happens when speakers, who might otherwise be pressured by time, are able to operate
in less demanding time conditions, and so plan and regroup on the fly (i.e. as they are
speaking). In other words, rather than using strategic or pre-task planning, they are
able to use planning-while-speaking. Ellis (2005) argues that such on-line planning is
associated with greater accuracy in performance. Wang (this volume) draws attention
to some research design concerns regarding this original research, which she addresses
in her own research by slowing down the videos to be retold (and so standardising the
conditions under which the online planning opportunities are created). But this is not
the only way to provide opportunities for on-line planning, especially as far as videobased narratives are concerned.
The present study incorporates a different method of supporting preparedness
and opportunity for online planning. Participants can simply be given the power to
stop the video whenever they choose, so that they can compose themselves, clear the
decks as it were, and prepare for what they are about to say. In this way they could, at
any point of difficulty, free themselves from the remorseless pressure of synchronising
the development of ideas and their generation of actual speech. While one cannot be
sure what learners will do while the video is paused, the fact that they can pause the
video at will means that they only need to have a fairly limited ambition for what they
hope to achieve from their current pause since they know they can pause again soon.
It is likely, in other words, that they can focus on accuracy.
A contrasting approach to assisting learners while a task is running (and which is
different from strategic planning) is to give them an overview of what the video will
consist of. In other words, they can be provided with the outline of events, and thereby
be given a macrostructure. In this way, what subsequently happens may contain surprises at the level of detail, but not in terms of the general things which will happen
while the video is running. As a result, they are being provided with a structure within
which they can operate. One would predict in these circumstances that knowledge
of this overall structure would assist learners and enable them to use the Formulator
stage more effectively, and so achieve greater levels of accuracy and lexical retrieval,
the aspects of speech Levelt associates with this stage.
Next, we focus on the nature of performance itself, and how it can be measured, a
concern that has permeated all chapters in this book. Most studies have used general
measures, although there are also approaches, such as Crookes (1989) and Robinson
etal. (2009), which use more specific measures. The general measures have shown some
development over the last two decades. Skehan (2009b) discusses how fluency measures have changed, and how they relate to different aspects of fluency (breakdown fluency, repair fluency, speed). He also discusses pause location as particularly important
for second language speakers, addressing Segalowitzs (2010) proposal that measures
which distinguish effectively between native and non-native speakers are particularly
useful Skehan (2009b). Regarding complexity, Norris and Ortega (2009) propose that
the commonest measure in task-based research, the use of some sort of subordination index, is not ideal and that it does not reflect improvement at higher proficiency
ranges, where complexification may occur through other processes, such as lengthening of nominal phrases (indeed, research suggests that subordination does not reflect
changes in the complexity of texts among higher proficiency learners, where other measures do). They propose that researchers should also use a measure based on number
of words per clause, which would reveal such finer-grained distinctions (and they also
suggest using larger-scale measures such as mean length of utterance or T-Unit, which
captures complexity more holistically). In this way, different aspects of c omplexity can
be captured. At the very least, the use of additional measures seems worth exploring,
alongside the more conventional measure of subordination per speech unit (Foster,
Tonkyn & Wigglesworth 2000; and also see discussion in Chapter 1).
The remaining area which is consistently measured in task performance is that
of accuracy. The standard general measure has been to compute the percentage of
error free clauses. But there are alternatives. Gilabert (2007), for example, uses reformulations as a measure of accuracy. Mehnert (1998) used a measure of errors per 100
words, arguing that this was more appropriate for German, the language used by her
participants. In addition, Skehan and Foster (2005), who were concerned that measures of the proportion of error-free clauses might be inflated if a speaker used lots of
short clauses correctly, proposed a measure based on the length of clause that can be
accurately handled. In this measure, all clauses from a speaker are ranked for length,
and then accuracy is established for all clauses of particular lengths. Then the greatest
clause length that meets a criterion of accuracy (generally when 70% of the occurrences of that length of clause are error free) is taken as the measure of accuracy. This
measure does not correlate with complexity and so is not a confounded measure of
that construct. Instead, Skehan and Foster (2005) propose it as a more valid measure
of this aspect of performance.
Whichever of these measures is used, however, error-free clauses or errors per
100 words or clause length accuracy, they do not take error gravity into account in
any way. There are two reasons for building in some notion of error gravity to the
coding of second language performance. First, it may be that error scores are misleading if they reflect many but superficial errors. In other words, if one speaker produces
a large number of such errors, but few serious errors, but another speaker produces
the same overall error-free clause score based on a much higher proportion of serious
errors, a misleading index of accuracy is being created. Accordingly, separating error
scores as a function of gravity becomes desirable for reasons of validity. But second,
there is the more pragmatic goal of discrimination. Even if the general ratio of serious
to superficial errors is relatively constant across speakers, there is the problem that
lower-level students may achieve quite low scores if any error in a clause causes that
clause to be deemed incorrect. As a result, potential discrimination between speakers
may be lost.
For both these reasons, therefore, it may be desirable to code transcripts not simply for error, but also for error gravity. The clearest advocates of this procedure are
Foster and Wigglesworth (2010), who suggest that it is appropriate to separate three
levels of error gravity low level, medium level, and high level, and that these three
levels should be defined in terms of the extent to which communication is impaired.
If an error does not really make the extraction of meaning difficult, they propose that
the error should be regarded as minor. On the other hand, if meaning cannot or can
hardly be extracted, then the error should be regarded as serious. Between these, a
level of error is one where meaning can be retrieved, but with some effort, and they
propose that these are best regarded as intermediate errors. Their system was used in
the present study, and so for all candidates we have two types of error score (error-free
clauses, length of clause error) with each of these represented at two difficulty levels
(since scrutiny of the data suggested that the intermediate and serious errors should
be combined, in that there were relatively few serious errors).
Drawing on the previous discussion, we can formulate a number of research questions, and associated hypotheses for the present study.
Research Question One: In the context of video based presentation, what will be the
influence of varying the degree of structure in narrative retellings? This leads to:
Hypothesis One: Greater structure in the video-based narratives will lead to greater
accuracy in performance.
Hypothesis Two: Greater structure in the video-based narratives will lead to greater
complexity in performance.
Research Question Two: What will be the effects of varying and attenuating the time
pressure while narratives are being told? This leads to a series of sub-hypotheses:
Hypothesis Three (A): Unmediated narration while a video is running will lead to the
lowest levels of accuracy and complexity.
Hypothesis Three (B): Mediated narration while a video is running will raise
accuracy, but it is an open question whether this will be higher for the condition
providing the opportunity to pause, or the condition giving provision of a summary
(pre-narration).
Hypothesis Three (C): There-and-then narration will lead to higher levels of accuracy,
complexity, and fluency.
Method
The present study used a series of Mr. Bean video-based narratives, which participants
then had to narrate, under various conditions.
Materials
A wide range of excerpts from the Mr. Bean television series were examined, and after
trialling, four were selected to capture different degrees of structure. The original narratives were video-edited to reduce their length so that they ran for some 57 minutes.
The four narratives were:
Mr. Bean plays Crazy Golf: In this story, Mr. Bean plays a round of Crazy Golf. The
attendant instructs him that he shouldnt touch the golf ball under any circumstances. He then accidentally knocks the golf ball out of the Crazy Golf area, even
outside the park it is situated in. A series of unconnected misadventures result,
which culminate in Bean arriving back at the golf course after dusk, to finally
complete his round. (This video was used in Skehan & Foster 1999.)
Mr. Bean at Christmas: In this story, Mr. Bean meets his girlfriend on Christmas
Eve. In the window of a jewellery store, she sees a ring she would like. Afterwards
he goes home and prepares for Christmas rather eccentrically. Mr. Bean wakes up
on Christmas Day, receives the present he sent himself, and then prepares dinner.
His girlfriend arrives, gives him a nice present, and he, having misunderstood
her interest in the ring they looked at on Christmas Eve, gives her a picture and a
hook, both of which he also saw in the window of the jewellery store.
Mr. Bean visits the Funfair: On a trip to a funfair, Mr. Beans car accidentally gets
hooked to a pram, containing a baby. He takes the baby with him to the funfair,
but still wants to have fun. So he parks the baby in a rocking car, putting in lots
of money. He then tries out various rides in the funfair, before coming back to
find the baby who is still crying. He buys some (helium filled) balloons to amuse
the baby, but they carry the baby up into the sky. He responds by using a bow and
arrow to burst the balloons, causing the pram to sink to earth, as it happens right
next to the babys mother.
Mr. Bean catches a thief: Mr. Bean visits a park and want to take a photo of himself
with a statue. He fails, and recruits a passer-by who deceives him and steals his
camera. Mr. Bean then searches for the thief, finds him, and puts a rubbish bin
over his head to immobilise him, and jabs the man with a pencil, causing the man
to shout in pain. He fetches a policewoman but the immobilised thief has run
away. Later, he goes to a police station where suspects, including the actual thief,
have been rounded up. In an identity parade he fails to recognise the thief visually,
but then gets the police to put a rubbish bin on each suspect, and jabs each, finally
recognising the thief from his squeals.
The first of these videos was unstructured, but the remaining three were structured, to
different degrees. The Christmas story has a beginning, a middle and an end, but there
is no strong causal structure. The story concerns the relationships between two people,
and how these play out against the backcloth of Christmas conventions. The other two
stories, though, have causal links, and in each case, there is a problem which is eventually solved. However, in the Funfair story, there are random diversions, such as when
Mr. Bean goes on the roller coaster, or plays darts, before the thread of the kidnapped
baby is resumed. In the Thief story, in contrast, there are no diversions everything is
concerned with the theft or the catching of the thief.
Materials were also involved in one of the processing conditions Summary.
Under the Summary condition, participants were provided with a brief summary of
the main events in the story. A typical summary is given below:
Funfair: In his car, Mr. Bean goes to a funfair, on his own. By accident, when he
arrives he takes a baby in its pram away from its mother. He then thinks that the
best thing will be to try to look after the baby at the same time as he is having fun
in the funfair. So when he is going on the different rides and doing different things
in the funfair, he has to think of ways of looking after the baby. In the end, and
with a lot of luck, he is able to return the baby to its very worried mother, again by
accident.
Each of the summaries was just over 100 words.
Research design
The present study has two factors, and each of the factors has four values. The study
uses four narrative tasks each chosen to exemplify a particular form of structure, as
indicated in the Materials section. These were (a) no structure, (b) a clear beginningmiddle-end structure, (c) loose problem-solution structure, (d) tight problem-solution
structure. Structure was a within-subject variable, in that all participants completed
all of the tasks, although in counterbalanced order. The second factor is processing
condition, and this is a between-subjects variable. The four conditions were: Watchand-Tell; Pausing; Summary; and Watch-then-Tell. In Watch-and-Tell participants
viewed the video and were required simultaneously to tell the story that they were
watching. In the Pausing condition, they were provided with the video control which
enabled them to pause the video whenever they wanted. Otherwise they had to tell the
story as the video was running. In the Summary condition, participants had to view
the video and simultaneously tell the story, as in Watch-and-Tell, but prior to viewing
the video they were provided with a summary of the story (as shown above), and given
as much time as they needed to read and understand this version, which was presented
in English. Finally, in the Watch-then-Tell condition, participants viewed the video,
but were not allowed to make notes. Then they had to recount what they had seen in
the video.
Procedure: Interacting with students

One-on-one interviews were arranged to collect speech samples from the participants. First, a member of the research team explained to the participant what was to
be done (watch-and-tell, watch-then-tell, watch-and-tell with summary preview, or
watch-and-tell with pausing). The Mr. Bean videos were shown on a computer screen.
The participant, wearing a headset with a microphone which did the recording, then
followed the instructions which had been given. As an additional means of recording,
whenever a participant gave his/her consent, a video recording of the procedure was
also made.
Data processing
Each narrative video retelling was recorded onto an MP3 player, generating a sound
file which typically lasted three or four minutes. Subsequently the sound file was
transferred to computer, and then transcribed, broadly, using Soundscriber. Next,
the transcription was put into modified CHAT format (MacWhinney 2000), with
appropriate headers and formatting. Each AS turn was represented in four lines. The
first was the CHAT line, and simply followed CHAT conventions. The second line
eventually became the part-of-speech coded line, using the Mor and Post subroutines
from within CLAN.2 The third line contained timing information, in milliseconds,
regarding the start and finish times of the AS turn. The fourth line was a repeat of the
transcript from the CHAT (first) line, but was coded according to TaskProfile conventions. The fourth line was processed by the computer software written by the first
author to analyse coded second language task performance. This line contained all
codings for clause boundaries, for error (including error gravity), for repair fluency,
for filled pauses, and for silence-based pausing.
Participants
The speech samples of 28 non-native speakers of English (NNSs) were included in this
study.
These were all second-year university students, who, at the time of the data collection, were studying in South China University of Technology (SCUT), Guangzhou,
China, They were chosen following teacher recommendation that these students were
likely to accomplish the story-telling task. There were a few cases in which the student
had difficulty with the task and failed to produce a satisfactory quantity of speech.
These data were not included in any analysis. The participants all came from mainland China, with L1s of Cantonese and Mandarin Chinese, and ranged in age from
nineteen to twenty-two, with a mean age of twenty-one. Fifteen were female and thirteen were male. Their proficiency, based on their College English Test scores, was low
intermediate.
Measures
The dependent variables in the present study were of fluency, accuracy, and complexity. Fluency was measured with two indices. Previous research (Skehan & Foster 1997;
Tavakoli & Skehan 2005) has indicated a distinction within fluency between measures
based on pausing (i.e. unfilled pauses) and measures based on some sort of interruption to the message being expressed (e.g. reformulation). These have been termed
breakdown and repair fluency, respectively (Skehan 2001). In this study, breakdown
fluency was measured through the number of AS-boundary, and mid-clause pauses,
each standardised per 100 words. A pause was defined as an interruption to the speech
flow of more than 40 milliseconds. Skehan and Foster (2008) show that these measures are appropriate ones to capture breakdown fluency for non-native speakers.
. These subroutines implement a part-of-speech tagger to the CHAT line, and accomplish this
in two stages. The Mor subroutine makes initial tagging judgements, but flags cases of ambiguity,
where more than one part-of-speech is possible. The Post subroutine then addresses these cases of
ambiguity, generating the final, part-of-speech tagged line.
eformulations, as a measure of repair fluency, were based on the number of times

R
that a participant changed the nature of their utterance in a manner that was syntactic
or morphological (reformulation). This measure, too, was standardised per 100 words.
(See Wang & Skehan this volume, for more extensive description.)
The first measure of complexity used was calculated by dividing the data into syntactic clauses and AS-units (Foster et al. 2000) and expressed as the ratio of clauses
to AS-units. In other words, the more clauses per AS-unit, the higher the complexity score. An AS-unit is minimally defined as a single speakers utterance consisting
of either an independent clause, or sub-clausal unit, together with any subordinate
clause(s) associated with either. The second complexity measure, drawing on the work
of Norris and Ortega (2009), was of the number of words per clause. It was reasoned
that this reflects (different from the subordination measure) the extent of internal
complexity within clauses.
Given the exploratory nature of the measures of accuracy, two different measures
were used, with each being calculated in two ways. The first accuracy index, widely
used in the task performance area, was the proportion of clauses which were errorfree (Foster & Skehan 1996). However, additional measures were used to relate accuracy to the length of clause which could be handled without error (see Skehan &
Foster 2005, for additional discussion of this measure). Clauses were categorised
for length in words, and then the accuracy level for clauses of different lengths was
established, for example, three word clauses, four word clauses, five word clauses,
etc. Previous research (Skehan & Foster 2005) has shown that an accuracy criterion
of 70% (i.e. that 70% or more of clauses of that length are accurate) is both demanding and discriminating. Accordingly, this criterion will be used here. In this way, a
measure of accuracy is available which avoids the possibility that a participant can
obtain a high score which is inflated because it is disproportionately influenced by
error free clauses that are quite short in length, for example, I know. Both measures
were calculated first by including all errors, however small, but each measure was also
calculated counting as error only those which to some degree impaired communication. Ten per cent of the sample was double coded, and all measures produced more
than 90% agreement.
Analyses
There were two independent variables in this study, structure (with four values a
within-subject variable), and processing condition (a between-subjects variable again
with four values). In addition, there were six dependent variables (two measures of
complexity, two of accuracy, and two of fluency). In addition to the calculation of basic
descriptive statistics, the central analysis used in this study was a repeated m
easures
multivariate analysis of variance, followed by appropriate univariate tests. Effect sizes
are also provided.
Results
To gain a general picture of the results which were obtained, Table 1 shows the descriptive statistics for each combination of the four tasks and the four processing conditions.
Standard deviations are given in parentheses. The dependent measures included are
two of complexity (Clauses per AS-unit, and number of words per AS-unit (ASComp
and WdsCom respectively)); two of accuracy (the proportion of error-free clauses and
the greatest length of clause, in words, that could be handled accurately at the 70%
level (EFC and LenAcc respectively)); and two of fluency (the number of pauses at the
end of a clause, standardised to 100 words, and the number of reformulations per 100
words (EoCPaus and Reform respectively)).
Given the two-dimensional nature of this research design, with a mixture of
between (processing condition) and within (task) factors, as well as the use of six
dependent variables, the data were subjected to a repeated measures multivariate
Table 1. Descriptive statistics
Golf (N = 28)
ASComp
WdsClause
EFC
LenAcc
EoCPaus
Reform
W+tell
(N = 7)
1.27
(0.14)
5.84
(1.11)
0.66
(0.05)
3.43
(2.57)
8.74
(1.63)
2.75
(1.30)
Pause
(N = 7)
1.31
(0.18)
6.20
(0.56)
0.68
(0.10)
3
(4.04)
7.24
(2.59)
1.77
(1.06)
Summary
(N = 7)
1.29
(0.09)
6.37
(1.76)
0.73
(0.06)
3.14
(2.79)
8.66
(2.52)
2.82
(2.11)
WthenTell
(N = 7)
1.33
(0.12)
7.68
(1.71)
0.77
(0.07)
6.29
(3.04)
5.71
(1.54)
5.53
(2.52)
Mean
1.3
(0.13)
6.53
(1.48)
0.71
(0.08)
3.96
(3.28)
7.59
(2.37)
3.17
(2.20)
Funfair (N = 28)
ASComp
WdsClause
EFC
LenAcc
EoCPaus
Reform
W+tell
(N = 7)
1.31
(0.05)
5.33
(0.40)
0.80
(.08)
6.43
(2.87)
9.24
(1.14)
5.40
(2.95)
Pause
(N = 7)
1.40
(0.16)
5.76
(0.50)
0.80
(.06)
6.29
(2.92)
7.67
(3.54)
2.83
(1.16)
Summary
(N = 7)
1.36
(0.11)
5.46
(0.80)
0.85
(.08)
7.29
(1.98)
8.65
(2.49)
4.20
(2.71)
WthenTell
(N = 7)
1.41
(0.14)
5.98
(0.44)
0.81
(.06)
7.71
(3.45)
5.56
(2.56)
6.71
(3.15)
Mean
1.37
(0.12)
5.64
(0.58)
0.81
(.07)
6.93
(2.76)
7.78
(2.82)
4.79
(2.87)
(Continued)
Table 1. Descriptive statistics (Continued)

Christmas
(N = 28)
ASComp
WdsClause
EFC
LenAcc
EoCPaus
Reform
W+tell
(N = 7)
1.28
(0.06)
5.30
(0.49)
0.72
(0.21)
4.00
(3.65)
9.65
(1.00)
3.13
(1.86)
Pause
(N = 7)
1.30
(0.16)
5.57
(0.29)
0.73
(0.08)
4.57
(3.55)
8.14
(3.20)
3.39
(1.62)
Summary
(N = 7)
1.24
(0.07)6
5.45
(0.29)
0.81
(.06)
7.14
(1.57)
10.28
(2.96)
2.97
(1.81)
WthenTell
(N = 7)
1.40
(0.13)
5.85
(0.28)
0.8
(0.03)
7.43
(1.62)
5.45
(1.84)
5.47
(2.39)
Mean
1.31
(0.13)
5.54
(0.39)
0.77
(0.28)
5.79
(3.05)
8.38
(2.97)
3.74
(2.10)
Thief (N = 28)
ASComp
WdsClause
EFC
LenAcc
EoCPaus
Reform
W+tell
(N = 7)
1.58
(0.12)
5.06
(0.30)
0.82
(0.06)
6
(2.52)
6.62
(1.89)
4.33
(1.78)
Pause
(N = 7)
1.49
(0.17)
5.44
(0.54)
0.77
(0.09)
5.43
(3.41)
6.23
(2.80)
2.12
(0.77)
Summary
(N = 7)
1.45
(0.11)
5.30
(0.51)
0.86
(0.06)
7.14
(3.39)
7.71
(3.02)
3.08
(2.17)
WthenTell
(N = 7)
1.61
(0.10)
5.71
(0.39)
0.85
(0.07)
7.43
(3.15)
3.60
(1.69)
6.35
(3.20)
Mean
1.53
(0.14)
5.38
(0.48)
0.83
(0.08)
6.5
(3.07)
6.04
(2.75)
\3.97
(2.60)
For each cell in this table, N = 7, so that N = 28 for each task, across processing conditions
analysis of variance. The data showed significance at the.001 level for processing condition (Pillais trace = .999; F = 4.56; d.f. = 6), and at .001 for task structure (Pillais
trace = .944; F = 9.29; d.f. = 18). There were no significant interactions.
Moving on to the univariate tests, all tests for task structure were significant, as
shown in Table 2, and all of them generated effect sizes which were large.
The clearest pattern here is for accuracy. Error-free clauses generates an extremely
clear trend, in the order Golf < Christmas < Funfair < Thief, with values of 0.71, 0.77,
0.81, 0.83, with all values significantly different from the others. Length Accuracy 70%
is close to this, with values of 3.96, 5.79, 6.93 and 6.5 respectively, with Golf significantly lower than Christmas, and then both of these significantly different from the
remaining two tasks, with the latter not differing from one another. In other words,
there seems to be a clear relationship between structure of task and accuracy of performance, especially as this relates to the two tasks which contain some degree of
Problem-Solution structure, a comparable result to Tavakoli and Skehan (2005).
Table 2. Univariate tests for task structure and associated effect sizes
F
df
Significance
Effect size
partial Eta squared
AS Complexity
38.4
.001
0.59
Words per clause
12.97
.001
0.32
Error Free Clauses (proportion)
15.95
.001
0.37
7.36
Measure
Length Accuracy 70%
.001
0.21
End of Clauses Pauses per 100
83.2
.001
0.41
Reformulations per 100 words
37.8
.001
0.23
The pattern for complexity is reasonably clear, but with one slightly anomalous
result. Regarding AS Complexity, the most widely used measure in the literature, the
results are broadly similar to accuracy. Golf and Christmas, the least structured narratives, do not differ from one another but they both differ from Funfair which itself
differs from the most structured task, Thief, with the respective values being 1.30, 1.31,
1.37 and 1.53. The pattern with words per clause contrasts with these results. The highest score, 6.53, is with the unstructured Golf task, whereas the other three tasks have
lower scores (Christmas, 5.54; Funfair, 5.64; and Thief, 5.71), and these do not differ
from one another. Structure, in other words, is associated with more subordination,
but shorter clauses. So, while there is clearly a trend towards a relationship between
complexity and structure, there are also some ways in which this trend is looser than
that for accuracy, and the results suggest that the two measures of complexity reflect
different facets of this construct, as Norris and Ortega (2009) argue.
The two fluency measures present a mixed picture. Reformulation is clearer, in
that Golf is significantly different to Christmas and both are significantly different to
the other two tasks, which themselves do not differ, with values of 3.17 (Golf) < 3.74
(Christmas) < 4.79 (Funfair) and 3.97 (Thief). In other words, the more structured
the task, the more reformulation there is, which is an intriguing result. The results for
End-of-Clause pausing present an unclear pattern. Here Funfair and Christmas, the
intermediate structured tasks, generate the highest values, which do not differ from one
another, but which do differ from the lower values Golf (unstructured) and Thief (most
structured). These become a set of results which present a challenge for interpretation.
Next, we turn to the between-subjects variable of processing condition. It should
be borne in mind that the cell size for any comparison is only seven, with the result
that the power of the statistical testing is much less than with Task Structure, where
the cell size was twenty-eight. With complexity, the pattern of results suggests that the
two conditions which eased immediate processing, Pause and Watch-then-tell had the
highest scores with both AS Complexity and Words Complexity, with Watch-then-tell
producing the highest value in each case (consistent with Wang and Skehan (this
v olume)). Even so, there are no significances for the AS measure, with means of 1.36
(Watch-and-tell), 1.38 (Pause), 1.34 (Summary), and 1.44 (Watch-then-tell). There
are significant comparisons, though, with Words per Clause Complexity, between
Watch-and-tell (7.06) and Summary (7.20) versus Watch-then-tell (8.80). Watch-andtell and Summary, on the one hand, and Pause and Watch-then-tell do not differ. In
other words, it appears as though less pressured conditions, broadly, are supportive of
greater complexity in spoken performance, but that this relationship is not particularly
strong. (Once again, a result consistent with Wang & Skehan this volume).
There is a loose pattern also with accuracy. With the Error-free Clauses measure,
Watch-and-tell (0.75) is significantly different to Summary (0.83), and Pause (0.74)
contrasts with Summary and Watch-then-tell (0.81). Regarding Length, Accuracy
70%, Watch-and-tell (4.96) and Pause (4.82) are significantly lower than Watch-thentell (7.2). Summary, at 6.5, occupying a sort of middle ground, does not contrast
significantly with any of the other groups. It appears that the Watch-then-tell and to
a lesser extent, Summary, support a focus on accuracy while language is being produced, although again, the relationship is not strong.
The two fluency measures generate exactly the same significant contrasts, but with
interestingly different directions. Essentially Watch-then-tell contrasts with all other
conditions, but the other conditions do not differ from one another. This is shown
most clearly in Table 3.
Table 3. Significant contrasts with fluency measures
End of clause pauses
per 100
Reformulations
per 100 words
Watch-and-tell
8.57
3.9
Pause
7.32
2.53
Summary
8.27
2.76
Watch-then-tell
5.08
5.96
Itacalised values differ from all other conditions. Other conditions do

not differ significantly
Clearly the There-and-then condition, operationalised as Watch-and-tell, is the

decisive influence here. It leads to significantly fewer pauses per 100 words, suggesting
that under this condition, second language speakers are more able to control where
they need to pause, rather than having to regroup through the demands of the processing conditions. But in contrast, they reformulate more. In other words, they are
engaged, mid-clause, with finding ways of expressing what they are saying in different
ways. This, possibly, reflects a greater on-line engagement with the content of their
speech (Skehan & Foster 2005).
Discussion
We can start this section by restating the main results which were found.
Degree of structure, broadly, leads to greater accuracy.
Degree of structure also has a relationship with complexity, more clearly with the
measure of subordination, but this relationship is weaker than with accuracy.
Structure is associated with more reformulation and repetition.
Structure is associated with fewer end-of-clause pauses.
The Summary and Watch-then-Tell conditions lead to higher accuracy than the
Watch-and-Tell and Pause conditions.
Watch-then-Tell leads to higher levels of complexity than the other conditions.
Watch-then-Tell leads to higher levels of reformulation and fewer end-of-clause
pauses.
The first research question, and the associated hypothesis, concerned the relationship
between structure and accuracy. This hypothesis was broadly confirmed. The
unstructured narrative, Golf, consistently produced the lowest level of accuracy. But
beyond this, although all three more structured narratives produced higher levels of
accuracy, there was something of a split between the two problem-solution narratives (Thief, Funfair) and the more conventional story structure narrative (Christmas)
which contained development, but with some arbitrariness. The two problem-solution
narratives produced higher accuracy, with the Christmas video generating accuracy
levels between the unstructured narrative and the two tightest narratives. The scale
of structure worked very well with the error-free clauses accuracy measure and reasonably well for length-accuracy, although with the latter, there was little difference
between the two problem-solution videos. Obviously, it is interesting that the effect has
been found with video narratives, in addition to what has been reported in previous
studies with cartoon picture series. Once again, one can offer the interpretation that
tasks such as Funfair and Thief provide a clearer macrostructure, so that second language speakers can organise what they say in terms of this macrostructure, and do not
need so much to work at the highest discourse level while speaking. This, then, enables
them to allocate more attention to the Formulator and focus on the surface, the accuracy of what is being said, even under video-based time pressure.
There is also a clear effect of structure on complexity, particularly with the subordination measure. In this case, the major opposition is between the two problem-
solution narratives and the two others. The combination of problem-solution structure
and the subordination measure is particularly interesting here. It seems as though
this structure pushes speakers to express connections between elements through
clause relations, pushing up the subordination measure. In contrast, the words-based
easure of complexity is not so sensitive to these task changes, suggesting that the
m
factors which impact upon needing to use more words within clauses are different
from those which raise subordination. Indeed, with the words-based measure, it is the
Golf (unstructured) task which produces higher values than any except Thief. So in
this case, it seems as though specific task design qualities push the speaker to develop
clauses internally.
Turning to fluency, we have an interesting contrast in the two measures which
were used, end-of-clause pauses and reformulations. In the former case, structure
is associated with greater fluency, with fewer pauses, while in the latter, it is associated with less fluency, since there are more reformulations. The first case is perhaps
clearer and also more consistent with the available literature. It appears that pausing is more controlled when tasks are structured. It seems as if clear macrostructure
and a straightforward and organised set of events to narrate facilitates organisation
of speech units, and a capacity to pause effectively, that is, not at the end-of-clause
points. So, it seems as though, when tasks are structured, the non-native speakers
are more able to use pausing opportunities while maintaining a good flow of speech.
This, perhaps, suggests effective Formulator functioning. But then there are the greater
reformulations to account for, which at first sight seems to tell a different story, since
these r epresent interruptions to the speech stream. We believe it is most likely that the
place of reformulation within fluency measures itself is ready for re-analysis. This is
because reformulations can be both negative and positive. In the former case, they are
thrust upon the speaker as trouble is encountered. However, one can also put a more
positive spin on reformulations. When there is effective overall discourse functioning,
sufficient attention may be available for changes in what is said, within the clause, that
reflect improvements and edits to be more precise in the message being conveyed. It is
possible that this is what is happening here. Structure, then, may be associated with an
effective flow of the discourse, and simultaneously with polishing what is said within
the clause, but without challenging the broader organisational structure of what is
said. If that is the case, then the two fluency measures seem to function as different
sides of the same coin.
Next, we turn to consider the effects of the different conditions under which tasks
were done, and so address the issues raised by Research Question 3 in its various forms.
Regarding accuracy, there is a clear contrast here between the Summary and Watchthen-Tell conditions, on the one hand, which are associated with greater accuracy, and
the Watch-and-Tell and Pause conditions, which are associated with lower accuracy.
The predictions and the actual outcomes are shown in Table 4.
It was hypothesised that the Watch-and-Tell and Watch-then-Tell conditions
would differ most, which is, in fact, what has happened. The former is an online condition, where the latter is not, and this is the difference that underlay the prediction in
favour of the latter. But the remaining conditions are both online in nature, with the
Table 4. Predictions and outcomes for task conditions

Predictions
Outcomes
Most accurate
Most accurate
Watch-then-Tell
Summary: Pause
Watch-and-Tell
Watch-then-Tell
Summary
Pause: Watch-and-Tell
Least accurate
Least accurate
need to retell a video which is running. But compared to the Watch-and-Tell condition, each is mediated to some degree, to ease the processing conditions relative to
the online demands. So the prediction was exploratory here, and it was not predicted
which would lead to higher accuracy, but it was assumed that they would both generate higher accuracy than the Watch-and-Tell condition.
In the event, the Summary condition has produced higher accuracy, although not
quite as much as the Watch-then-Tell condition, while the Pause condition has not
generated greater accuracy, and is usually around the same level as the Watch-andTell condition. This is slightly surprising, but nonetheless interesting. We can discuss
the Summary condition first. In this respect, it is worth noting that this condition
produced the lowest complexity values (lower than Watch-and-Tell and Pause), and
slightly higher values for breakdown dysfluency. It appears that the Summary condition functioned so as to organise what was going to be said, without that content being
challenged, with the result that speakers accepted the outline of their narration, and
thereby were helped to focus on the surface and on the detail. The speakers in this
condition seemed to have the confidence to pause at clause boundaries because they
were surer of the overall organisation of what they were going to say.
In contrast, the Pause condition did not seem to confer any particular advantage
for accuracy. Mediation of the online nature of the task by offering the opportunity to
pause the videotape did not lead to any form of assisted online planning, and indeed
for some of the accuracy measures, the Pause condition is associated with the worst
levels of accuracy! However, this condition does lead to the lowest values for reformulation, suggesting that speakers in this condition were using the opportunity to pause
to achieve some gain. But this was not translated into accuracy, which is something of
a disappointment. The only additional point to make here is that this condition was
an optional one. Participants were provided with the opportunity to pause, but they
didnt have to avail themselves of this possibility. Therefore, the operationalisation of
this condition may need to be rethought in any future research.
Finally, it is particularly interesting that the effects found for structure and for
processing are remarkably similar (and independent, with no interaction effects). Both
influence accuracy, in broadly similar ways. Both also influence complexity, although
structure has clearer effects with the subordination measure, whereas processing
shows more clearly with the Words per Clause measure. Finally, both have similar
effects on the two fluency measures, bringing about fewer end-of-clause pauses and
more reformulations. In the final section, therefore, we will explore why these patterns
are so similar and what brings structure and processing together.
Conclusions
Narrating a video-based story is a very difficult thing to do, and puts considerable
pressure on a second language speaker. It is the sort of predicament which can be very
revealing about how second language speakers wrestle with communication problems.
The flow of input is considerable, and the time pressure under which it arrives is also
unforgiving. So, anything which mitigates these fundamental influences is of interest
to the psycholinguistics of second language speech production as well, ultimately, to
pedagogy.
What we have seen in this study is that there are indeed options available which
can mitigate these pressures. First and foremost is how structured the narrative task
is. Structure, we have seen, is related to higher accuracy. This, in turn, leads to the
interesting question as to why this should be the case. Three interconnected factors
are proposed here. First, there is the issue that structured tasks circumscribe what is
to be said. In other words, if there is a structure to a task, the degrees of uncertainty
and unpredictability are reduced. What has to be said is more clearly demarcated,
and so the speaker has the task of being work-man-like, and getting the job done, in
the greater certainty of what the job actually is. In other words, blind alleys are less
likely in the discourse. Moreover, to the extent that the structure is a fairly universal one, such as problem-solution, the assistance is even greater, since the point of
the communication becomes clearer, and the imagined listener becomes more real.
Second, there is the issue of organisation and framing. If there is a broader structure,
the speaker can make connections much more easily between where they are currently in the discourse, and the wider goals that they have in telling a story. In other
words, they can focus on what they are currently saying, for example, Mr. Bean trying to find the policewoman, confident that they can make links with the broader
story and line of development once the current sub-section is finished. Both of these
influences, in other words, attenuate the demands placed upon the Conceptualiser.
The third influence, then, follows from the previous two. The speaker, cushioned by
this structure, is likely not to have used considerable attentional resources in dealing with planning, and so can focus more clearly on what is currently being said. In
other words, attention is available to do immediate things like avoiding error and
monitoring performance. This, then, underlies the greater degree of accuracy which
is achieved.
One might go even further with this, and propose that the greater time available
enables even better things to happen, in that time can be used to make choices which
lead to easier discourse, easier lexical selections, and even easier syntax, so that the
favourable conditions are magnified even further. The fluency effects are connected
here. Less pressure, through clearer structure, enables speakers to organise their
contributions more completely, and then pause, appropriately, at clause boundaries.
Within the clause, they are then able to concentrate on the surface of language, on
avoiding accuracy, with more opportunity to reformulate.
This, though, does not do justice to the complexity effects which were found, with
structure and with processing. In the former case, it seems that the specific effects of
problem-solution structure play themselves out through greater subordination (and
not so much through more words per clause), and speakers do justice to the need to
express the relation between different elements in the story. With processing, the lack
of time pressure in the Watch-then-tell condition (and see Wang and Skehan, this volume) enables repackaging and greater complexity, but in this case with a clearer effect
with the Words per Clause measure.
What structure seems to do here is produce a relationship between Levelts
Conceptualiser and Formulator that is helpful for accuracy in second language speech
performance. It limits the work that the Conceptualiser has to do initially, in developing a general plan for the story. It also eases the Formulators access to Conceptualiser
operations, since these are likely to connect with the broader structure of the task
and change less than might be the case in other communications. So, from the overall
amount of attention available, the Formulator does not have to compete as much as is
usual with the Conceptualiser, and can get on with doing what it does best shaping
current language, retrieving lemmas, and building syntax. And these can, accordingly,
be done a little better, and lead to higher accuracy rates.
The other factor in the research design fits in well with this analysis. The different processing conditions yielded different results, with the main opposition
between the Watch-and-Tell and Pausing conditions on the one hand, and the
Watch-then-Tell and Summary conditions on the other. As elsewhere (e.g. Wang &
Skehan this volume), the There-and-then condition produces higher performance,
reflecting the lower processing pressure that is involved, associated as this is with
more opportunity for the Formulator stage to function effectively. But, even more
interestingly are the performances for the Summary condition. What this condition
does, in effect, is to provide the speaker with something akin to structure, in that a
broad outline is given, and then the speaker can take this outline, and use it as the
structure to be recounted. It provides the sorts of conceptual hooks that structure
can provide all alone. In this way, although there is the constant pressure of the video
which is rolling, the speaker has the general macrostructure which has been given to
cope more effectively with the potential for derailment that the constant video input
provides. Once again, the issue is the relationship between the Conceptualiser and
Formulator stages.
We can connect this discussion to an even broader perspective. Skehan (2009a)
offers an account of second language speaking which is more general than that provided here. Using the Levelt model as the spine of this account, he explores groups of
influences which are categorised as Complexifying, Pressuring, Easing, and Focussing
second language performance. Essentially, the variables which have been explored in
this study are all concerned with Easing, in that structure and favourable processing
conditions in this study are those which simultaneously limit the demands coming
from the Conceptualiser, while giving more attentional resources to the Formulator.
They provide a piece in the puzzle for understanding how we can foster more effective second language performance. They also provide hints as to how, pedagogically,
tasks can be chosen and implemented which enable learners to perform at a higher
level. They also, of course, indicate how, in reverse, the task of the learner/second language speaker can be made more difficult. But above all, they do indicate that there are
choices that can be made here, and that knowledge of these options can make pedagogy more targeted and effective.
A couple of points are still worth commenting on, one a limitation and the other a
suggestion for further research. The limitation concerns the Processing variable which
has been manipulated in this research. The variable was interesting and suggestive in the
results found, and encouraging for future research, as the earlier discussion indicates.
But one has to admit first that the sample sizes, of only seven per cell, were small, with
the result that any comparisons were fairly weak in statistical power. Even within the
processing variable, the pause condition was particularly problematic. In addition to
small cell size, it is clear that the condition requires more careful monitoring than was
used. There was clearly variation in the extent to which participants availed themselves
of the opportunity to pause, but we do not have data on this, and cannot explore whether
those who exploited this possibility more performed differently than those who did not.
It would be worthwhile to carry out future research which addresses this limitation.
The suggestion for further research concerns the measures of complexity which
were used. What is interesting is the way they were similar, but also how they diverged.
Both showed an impact of structure and of processing, but it is interesting that structure had a clearer impact with AS subordination and processing with Words per
Clause. The former variable, in its Problem-Solution operationalisation in the present research, intrinsically supports more explicit and complex clause relations and the
AS subordination measure picked this up. The processing variable was interesting in
that greater time to build language seemed to push learners to develop more complex
clauses internally. This will be a fascinating contrast to probe in further research.
References
11, 367383.
de Bot, K. (1992). A bilingual production model: Levelts Speaking Model adapted. Applied Linguistics, 13, 124.
Ellis, R., & Yuan, F. (2004). The effects of planning on fluency, complexity, and accuracy in second
language narrative writing. Studies in Second Language Acquisition, 26(1), 5984.
Studies in Second Language Acquisition, 18(3), 299324.
Foster, P., & Tavakoli, P. (2009). Lexical diversity and lexical selection: A comparison of native and
non-native speaker performance. Language Learning, 59, 866896.
Foster, P., & Wigglesworth, G. (2010). Towards a new measure of accuracy in task-based s econd
language performance. English Department, St.Marys University, Twickenham.
Gilabert, R. (2007). Effects of manipulating task complexity on self-repairs during L2 oral production. IRAL, 45, 215240.
Hoey, M. (1983). On the surface of discourse. London: George Allen and Unwin.
Erlbaum Associates.
Levelt, W.J. (1999). Language production: A blueprint of the speaker. In C. Brown & P. Hagoort
(Eds.), Neurocognition of language (pp. 83122). Oxford: OUP.
MacWhinney, B. (2000). The CHILDES Project: Tools for analysing talk, Volume 1: Transcription format and programs (3rd ed). Mahwah, NJ: Lawrence Erlbaum Associates.
Mehnert, U. (1998). The effects of different lengths of time for planning on second language performance. Studies in Second Language Acquisition, 20(1), 83108.
Norris, J., & Ortega, L. (2009). Towards an organic approach to investigating CAF in instructed SLA:
The case of complexity. Applied Linguistics, 30(4), 555578.
Robinson, P. (2001). Task complexity, cognitive resources, and syllabus design: A triadic framework
Robinson, P. (2011). Second language task complexity, the Cognition Hypothesis, language learning,
and performance. In P. Robinson (Ed.), Second language task complexity: Researching the Cognition Hypothesis of language learning and performance (pp. 338). Amsterdam: John Benjamins.
Robinson P., Cadierno, T., & Shirai, Y. (2009). Time and motion: Measuring the effects of the conceptual demands of tasks on second language speech production. Applied Linguistics, 30, 533554.
Segalowitz, N. (2010). Cognitive bases of second language fluency. London: Routledge.

Skehan, P., & Foster, P. (2001). Cognition and tasks. In P. Robinson (Ed.), Cognition and second
language instruction (pp. 183205). Cambridge: CUP.
Skehan, P., & Foster, P. (2005). Strategic and on-line planning: The influence of surprise information
Skehan, P., & Foster, P. (2008). Complexity, accuracy, fluency and lexis in task-based performance:
A meta-analysis of the Ealing research. In S. Van Daele, A. Housen, F. Kuiken, M. Pierrard, &
I. Vedder (Eds.), Complexity, accuracy and fluency in second language use, learning and teaching
(pp. 263284). Brussels: Royal Flemish Academy of Belgium for Sciences and Arts.
Tavakoli, P., & Skehan, P. (2005). Planning, task structure, and performance testing. In R. Ellis
Benjamins.
Tavakoli, P., & Foster, P. (2008). Task design and second language performance: the effect of narrative
type on learner output. Language Learning, 58(2), 439473.
Winter, E. (1976). Fundamentals of information structure: A pilot manual for further development
according to student need. Hatfield, Herts: The Hatfield Polytechnic Linguistics Group, School
of Humanities.
chapter 8
Limited attentional capacity, second language

performance, and task-based pedagogy
Peter Skehan
This volume has reported a number of original empirical studies of task-based second
language performance. The main site for the studies was Hong Kong, and specifically
the Chinese University of Hong Kong, although some studies were actually conducted
in neighbouring areas (e.g. Macao, Guangdong). All the chapters emanate from the
Task-Based Performance Research Group which functioned at that university for some
six years, from 20042010. The studies all took a Complexity-Accuracy-Fluency-Lexis
framework as a starting point, and examined performance in these terms (with one
or two extensions here and there). They also all took a Tradeoff perspective to second
language performance, in that they assumed that attentional and working memory
resources are limited, and that the interest in studying such task-based performance is
to better understand the consequences of such limitations, as well as the factors which
lead to higher performance and overcome the limitations.
Sharing assumptions in this way across the different chapters makes this unusual
as an edited volume. Such volumes are often made up of disparate contributions which
may have some loose connections to one another, but essentially make individual and
possibly disconnected contributions. In the present case, the unified viewpoint means
that the different chapters cohere and provide a cumulative perspective on current
developments within the Tradeoff Hypothesis. Accordingly, it is the function of the
present chapter to bring together these different contributions, and to explore how,
taken together, they provide a clearer picture of how we can understand second language task-based performance from within this framework.
There are three main sections to the chapter. The first, and by far the longest,
explores major themes that have emerged from the different studies, and covers issues
such as planning, structure, processing and cognition, selective attention, and working memory. A second brief section attempts to summarise the findings that have
been reported in terms of positive and negative influences. The third and final section
explores implications of the various chapters for pedagogy.
212
Peter Skehan
Major themes
This section will try to summarise the various chapters, but it will not do so sequentially, taking things chapter by chapter. Rather, different themes will be explored and
then illustrated through the contributions that relevant chapters make. In this way,
common themes will be clearer, and studies which explored more than one variable
(i.e. most of them!) will be considered in more than one place. The discussion starts
with planning, which is the main focus for three of the empirical chapters, and then
moves on to explore the major task characteristic researched here, task structure. This
leads to a consideration of processing issues as well as notions of task complexity
and the Cognition-Tradeoff debate, before concluding with a discussion of selective
attention.
Planning, preparedness, and readiness

Introduction
The literature on planning has grown enormously over the last twenty-five years,
and one wonders what the authors of the two seed articles, Ellis (1987) and Crookes
(1989), now think of the beast that they have unleashed. So it is timely, given the
range of studies now extant, to stand back a little, at the outset, and think about what
planning consists of. The table below shows the different conceptualisations of planning that have motivated studies, and also which individual researchers have used the
different conceptualisations (with, of course, some researchers figuring in more than
one place).
Even though the literature on planning is now extensive, the table makes clear
that it is difficult to form generalisations given the range of different interpretations
of pre-task influence that have been used. The question then becomes whether one
can link these antecedents to different performance profiles. The chapters in this book
make some, albeit limited, contributions to clarifying some of these issues. But at the
outset one might even question whether it is appropriate to use the term planning to
cover the different possibilities. That is the focus for Buis claim (this volume) that the
term readiness might be more appropriate and of the proposal in Chapter 1 that we
need to discuss preparedness. Readiness, for Bui, contrasts task-internal factors (content familiarity, schematic familiarity, task familiarity) with task-external readiness
(rehearsal, strategic planning, and within-task planning). The former concerns what
the speaker already knows that is relevant for the task (and so readiness is exactly the
correct term here), while the latter is concerned with manipulable factors irrespective
of the familiarity the speaker might have with the task and its content.
Limited attentional capacity, second language performance, and task-based pedagogy
Table 1. Different conceptualisations of planning

Conceptualisation
Typical researchers
Using time to plan about something cold

guided or unguided
different foci
preparation for language
preparation for ideas
preparation for organisation and stance
preparation for other peoples contributions
Crookes, Skehan,
Pang, Foster, Tavakoli
Speaking about something from experience

something familiar vs. unfamiliar
Foster and Skehan
Speaking about something which has been studied
Bui
Speaking about something similar in schema, or task type, to what

has been spoken about
Bui, Bygate
Speaking about something which has been spoken or written

about before
authentically, as part of life
as a previous task
immediately afterwards
after a time interval
Ellis, Wang, Bygate
Speaking about something which someone else has spoken

about before
Ortega
Speaking under conditions which enable on the fly or on-line

planning
Ellis, Wang
The notion of preparedness is, of course, very close to this idea of readiness, but
there are some differences in emphasis. The major contrast is between what might be
termed a cold versus involved contrast. With the former, the information to be conveyed does not particularly relate to the speakers previous life or experiences. With
the latter, the speaker has some relationship with what is said. This could come from
the relevance of the task. It could also concern the relationship between previous experience and what is being talked about, reflecting whether similar things have happened
to the participant or not, or whether what is being said in the task has been spoken
about before, for example, recounting a near-death experience. But essentially whether
we are talking about readiness or preparedness, the intention is the same: to explore
what happens before a task as an organising concept to enable functional relationships
to emerge more clearly. Buis demonstration that readiness, in his study, has a lexical, accuracy, and fluency impact while strategic planning (or preparedness) has more
impact on structural complexity is a clear example of this. We have to wait and see
what future research tells us about the sort of fine-grained relationships with different
aspects of performance that these new concepts might reveal.
213
214 Peter Skehan
The three studies

Against this background, what do the chapters in the present volume contribute? Three
chapters are relevant here: Bui (Chapter 3), Wang (Chapter 2) and Pang and Skehan
(Chapter 4). Bui reports on a study where two variables from Table 1 are included:
planning-as-time and planning-as-familiarity, the latter in the sense of speaking
about something which has been studied. The results are very interesting. Familiarity
of this kind leads to significant but small effects on accuracy and fluency, as measured through effect sizes. There is no complexity effect here, i.e. speakers were no
more or less complex when they were speaking about matched viruses or mismatched
viruses (the speaking topics of Buis study). It can be argued that both of these effects
sit within Levelts Formulator stage, and so suggest that with familiarity, more attention was available for ongoing performance. But there is also a medium-size effect on
lexical sophistication, suggesting that when talking in the matched condition, speakers
access less frequent (and more appropriate and technical) words. So, while syntactic
complexity is not increased, lexical complexity is boosted. Speaking about something
with which one is familiar allows the mental lexicon to be searched and accessed in
a different way to what happens with less familiar material. One wonders, as a result,
whether similar results would be found with familiar topics which are less bookish.
(Even so, it must be stressed that this research design was not easy to implement since
finding matched and mismatched conditions is fiendishly difficult.) Bui (Chapter
3, this volume) also researched conventional planning based on time, and here he
found fairly standard things that planning opportunity produces greater complexity
and fluency, but did not lead to any increase in accuracy. In addition, the effect sizes
here were medium in nature, suggesting very interestingly that providing planning
time produces effects which are slightly greater than those produced by familiarity a
non-trivial finding, since this could not really have been predicted one might, in fact,
have expected the opposite.
Wang (this volume) reports a wider ranging study of planning which essentially
tries to explore three senses of planning from Table 1. She explores conventional pretask planning, Ellis notion of online planning, and also planning as previous performance in the L2, through repetition (although this is immediate repetition, unlike
Bygates 2001, delayed repetition). Like Bui (this volume) she found that strategic planning yielded what might be considered to be conventional results: raised complexity and fluency, with large effect sizes. But the two other areas she researched extend
our understanding of planning considerably. Using a more controlled implementation
of online planning than has been used in previous research, she showed that online
planning by itself did not produce greater accuracy, with no significant differences
whatsoever. But when online planning was preceded by strategic planning there was
a significant and large effect on accuracy, and complexity also. It appeared that online
planning needs supportive conditions for it to be effective the benefits of strategic

planning do not translate consistently into greater accuracy (although that is not to
say that strategic planning never generates effects in this area, (Ellis 2009)), perhaps
because the residue of the effects of strategic planning are swamped by the immediacy of the processing conditions while a task is running. However, if the potential
of such planning is given scope to operate, as when online conditions prevail, a joint
accuracy-complexity effect can be achieved. The effects of strategic planning need to
be nurtured, in other words, just as the effects of online planning need the stimulus of
the wider perspective provided by earlier strategic planning.
While this is a very interesting result in itself, the findings with repetition are
even more striking. Here, unusually, all aspects of performance that were measured,
complexity, accuracy, and fluency, were significantly increased simultaneously. In
addition, the effect sizes were generally large, and as Table 7 in Wang (Chapter 2, this
volume) makes clear, this was not due to any anomalies in measurement or low standard deviations. In other words, getting participants to repeat a task immediately leads
to dramatic improvements in performance. Repeating what has been said in some
way prepares the speaker for the second performance, and this preparation seems to
enable errors to be avoided in the second telling, as though they were noticed on the
first occasion, but too late to change. Alternatively it may simply be that more attention
is generally available, and some of this is targeted at monitoring and avoiding error
(see Li, this volume). In addition, it appears that the engagement with the ideas to be
expressed in the first performance enables a more effective method of information
transmission in the second, more densely organised, with fewer elementary propositions. Finally, the fluency effects are well worth commenting on. As Wang (this volume) indicates, actually producing the language which will be re-used smoothes the
subsequent performance hugely, as though speakers wrestling with a second language
are helped considerably at the processing level by having actual phonological plans
more readily available. The smoothing may also release some attentional resources
for Conceptualiser and Formulator operations. The last thing to note here is the contrast between these results and those of Bygate (2001) with more delayed repetition. It
appears that Wangs findings indicate a clearly greater effect than he reports (encouraging though that was). So immediacy here confers considerable benefits, and now it
would be fascinating to vary time between performances to explore how rapidly this
advantage fades, and whether there is an optimum time for the repeated performance
to occur.
Pang & Skehan (Chapter 4, this volume) in contrast, take a much more qualitative approach to investigating planning. Participants in their study completed a
narrative retelling task, after being given ten minutes planning time, and then subsequently engaged in retrospective interviews (RIs). A coding scheme was developed
to capture the reported planning behaviours, one which had linkages with the Levelt
215
216 Peter Skehan
model of first language speaking. The codings so produced were then categorised,
into macro and micro codes, lexical-grammatical codes, and metalanguage codes, a
coding scheme which contrasted with that developed by Ortega (2005), which was
heavily influenced by OMalley and Chamots (1990) coding scheme for learning
strategies. Even so, there is quite a lot of overlap between the two approaches. The
major change is that macro and micro codes figure more prominently in Pang and
Skehans (this volume) system. Monitoring and evaluation are not emphasized as in
her scheme, while Pang and Skehan (this volume) emphasise lexical planning codes
more, as well as some rehearsal codes as these relate to subsequent performance, an
area not explicitly included in the Ortega (2005) study. All the various codes were
also related to performance, and generalisations offered as to the sorts of planning
behaviours associated with success. These were:
Accuracy
Positive: Retrieve specific words; Structure the story: Rehearse in a targeted way
Negative: Create lexical difficulties for yourself; make the story more difficult than
you can handle
Complexity
Positive: Organise the story; link the big to the small; let the ideas lead the way
Negative: Be fancy with words; focus on grammar; be overambitious with planning
Fluency
Be resourceful and flexible with words; avoid a focus on form; rehearse and work
small; organise ideas; avoid difficulty; avoid getting stuck
Be fixed and inflexible with words; concentrate on form; try to be fancy; plan generally; take notes seriously
Lexis
Positive: Focus on words; be general with ideas
Negative: Be unambitious; rehearse generally not specifically
The most surprising thing here is how little in these categories is directly positive.
Building structure and being led by ideas are positive, but there is little to assist accuracy or even fluency in any direct way. Instead the emphasis is on things to avoid
doing, or to do in moderation. It seems that planning can easily contain pitfalls, so
good use of planning means avoiding the pitfalls just as much as doing useful things.
It also helps to have realism about ones abilities so as not to take on too much. Being
flexible and resourceful when things (inevitably) go wrong helps too.
So with these three chapters on facets of planning, where do we end up? First,
there is the issue of needing a wider framework either readiness (Bui, Chapter 3), or
preparedness (Skehan, Chapter 1), a framework which subsumes planning but ranges
more widely. Bui (this volume) shows that familiarity is a useful form of preparedness,
whose strongest impact is on lexical sophistication, but which also influences accuracy
and fluency, suggesting a clear Formulator focus. It seems that knowing an area, at least
in his study, enables the second language speaker to deal with pre-verbal messages with
attention, but does not seem to be associated with greater Conceptualiser operation.
Another facet of preparedness is to tell a story one has told before. In the present case
(from Wangs chapter), that means retelling (repeating) a story which was previously
about something new. (One can imagine all sorts of other retellings of things which
were familiar, to various degrees, in the first place.) In her study, the retelling was immediate, although obviously one can imagine other research designs with different time
intervals between the original and the retelling. In any case, in her study, the results
were very clear, leading to increase in performance of considerable magnitude in accuracy and fluency, and a smaller, but nonetheless very large effect in complexity.
Interpretations
The repetition effects in Wang (Chapter 2, this volume), are so strong that we have to
search for an explanation. Effective priming might be one such factor. The priming literature in psychology is extensive (McDonough & Trofimovitch 2009). An immediate
relevant preceding context can have a major impact on ease of access of related words.
It may be that memory traces from the first performance are still having an effect and
facilitate the subsequent processing. But plausible as this is, the size of the effect seems to
require a more powerful justification. In that respect, it is useful to compare rehearsal/
retrieval as processes within strategic planning, versus actual repetition of performance. The background here is that (from a Leveltian perspective) the Conceptualiser-
Formulator transition is mediated through access to the (second language) mental
lexicon. As mentioned elsewhere in this volume, the first language mental lexicon is
extensive, and the lemmas within it are rich, elaborate, organised and associative. It
contains information on meaning, phonological form, orthographic form, logical relationships to other lemmas, associative and collocational information, and so on. In the
second language mental lexicon, this is not the case. Lemmas, even where they exist, are
nothing like as rich, elaborate, organized, and associated (although, of course, at higher
proficiency levels this is much less the case). The implication of this for retrieval and
rehearsal as part of strategic planning is that only limited information may be retrieved,
partly because only limited information is available, and partly because even what is
there may not be so accessible as in first language lemmas, and in any case the process
may be very effortful. But this, from the speakers perspective, may well be a lot better
than nothing, and may then raise the level of performance subsequently.
217
218 Peter Skehan
In the case of repetition, though, (as compared to strategic planning), different processes may be triggered. The partial or incomplete lemma access that may
be regarded as temporarily sufficient as a result of strategic planning is revealed as
inadequate during a first performance, and the need to actually produce language
means that the speaker has to with the problem of retrieving everything that can be
retrieved, including all the aspects of lemma information mentioned above. This is
what becomes the basis for syntax-building, and even making decisions that go further
than current knowledge, e.g. countability vs. non-countability. So the process required
by repetition is deeper and more comprehensive than is rehearsal. Not only may the
act of speaking establish a better trace for subsequent performance (in that it is more
enduring), it also, vitally, is a better preparation for subsequent performance, since the
speaker is pressured to solve problems, e.g. of phonological form, of likely agreement
implications, of morphology, and indeed of lexical or collocational selection. Indeed
Wang (this volume) also argues that there is a strong Articulator effect too, in that
having produced actual language eases the subsequent phonological performance, an
influence that might well facilitate the maintenance of parallel processing since otherwise articulation itself could consume limited attentional resources. The phonological
solutions may be correct or incorrect, but they at least ease subsequent processing. The
results in Wang (this volume) indicate how effective this process is, for fluency (as one
might first expect) but also for complexity and accuracy, as the engagement with form
in the first performance pays dividends in the second, since the earlier retrieval depth
functions as a sort of investment for the later performance.
One final point is worth making about repetition, even though it might be slightly
misplaced since there is a section on pedagogy later in this chapter. It is clear that the
impact of repetition is greater than that of other conditions reported in this volume.
This suggests considerable potential for repetition as a pedagogic technique. In general, within a communicative approach to teaching, it tends to be frowned on as lacking in authenticity. This, though, assumes (a) that we do not engage in repetition in
real life, and (b) that since no new meanings are being conveyed, the language which
is used is not normal communication. Life, though, can often be repetitive, and so can
the language which is used (Tannen 1989). And despite possible claims of inauthenticity, one needs to explore the reactions and perceptions of second language speakers
(rather than reactions of teachers) (Pinter 2005). Second language learners may well
derive satisfaction from a job done far better through repetition. At the same time,
they may be preparing themselves for even better communication in the future. As
a teaching technique, therefore, repetition may have a lot going for it boring from
a teaching perspective, perhaps, but richly satisfying from the learners point of view
(Bygate 2006).
So, if other forms of preparedness such as repetition and familiarity are relevant,
what of conventional strategic planning itself? Bui (this volume) and Wang (this
v olume; in her basic planning condition) confirmed the general finding from the literature a complexity-fluency effect, but nothing for accuracy. Pang and Skehan (this
volume) also researched strategic planning and raise a different set of issues with their
hybrid qualitative-quantitative study. They record self-reports of behaviour of participants focusing on form, but very often these do not actually relate directly to performance. Some of their codes (building structure, ideas) are positive in their impact and
reasonably clear to understand. But many are more complex and highlight avoiding
trouble, dealing with trouble, and the unhelpful effects of being over-ambitious. So
some planning behaviours contribute because they support actual performance, while
others are more concerned with what not to do. There are several implications here,
and most of these bear on the fragility of the accuracy effects in the literature. The most
general point is that it is clear that different people do different things when given the
opportunity to plan, and that some of these things raise performance in different areas,
some are neutral, and some are associated with reduced levels of performance in different areas. This being so, it is likely that if planners do the positive things which link
with accuracy (retrieving specific words, structuring the story, rehearsing in a targeted
manner), then error may be reduced. But if they do other things during planning then
it is unlikely to be. In other words, there is a certain degree of vulnerability in planning
behaviours linked to accuracy, and possibly chance factors in the ones which may be
used. The result is that different studies may report different effects depending on how
participants use their planning time. On some occasions the positive influences will
predominate. On others, the negative. Other performance areas, such as complexity
and fluency, do not seem so vulnerable to this variation, although with each, the qualitative data reveal potential negative factors it seems, though, that they do not carry
the same weight. The general influence of over-extension as a result of over-ambition,
though, is there for all performance areas. It would seem that planners views of their
own abilities, and their capacity to match planning ambition to realism which is linked
to ability, may be a key factor, and one which has a particular connection to accuracy.
Related to this discussion is another conclusion one can draw from the Pang &
Skehan (this volume) study. This concerns the issue of memory when we examine
the transition from planning to performance. It appears that ideas, structure, and
organisation fade less, and make the transition better from planning to performance,
whereas form, most of the time, does not endure so well and transfers less effectively
(Bygate & Samuda 2005). So there is the possibility that, if ideas and form receive equal
focus during planning, it is the former far more than the latter which impact on performance. This too may have connections to the smaller and less consistently reported
accuracy effects.
In this regard, there is a link to be made with another facet of the chapter by
Wang (this volume). She showed that on-line planning, in itself, was not effective, but
that her online planning condition, when preceded by a Watched condition (which
219
220 Peter Skehan
enabled preparedness) did raise accuracy, and complexity also, both with large effect
sizes. What we seem to have here is a synergy between ideas to be expressed and
the means to achieve such expression Conceptualiser and Formulator working in
harmony the first coming from the opportunity to prepare, and the second, feeding off the first, coming from the supportive conditions. Having less pressure when
speaking is associated with higher levels of performance, but only with the previous
opportunity to plan not only when there is more time available. But this suggests
a slightly different interpretation of what happens during the supportive conditions,
which would be that less pressuring performance conditions create a better context to
remember what has been planned. In other words, the key could be better retrieval of
what has been prepared because the performance conditions provide enough time for
this to happen. The planning lays the foundation. The supportive conditions enable the
fruits of the planning (accuracy as well as complexity) to be realised.
Pang and Skehan (this volume) also shed light on this. In their study, self-reported
use of planning time to focus on form did not have strong relationships with actual
performance, and indeed the only codes which did relate positively to accuracy were
retrieving specific words, rehearsing in a targeted way, and organising the story. They
proposed that form is vulnerable to fading more easily than is the planning of ideas.
If this is the case, it fits in well with Wangs results on supported on-line planning. In
this condition, there has been planning, quite possibly of form, and the gentler performance conditions may allow that form to be retrieved and used. More time in itself is
not the key what matters is that foundations have been laid and there is time. It may
well be that in conventional planning studies, participants have used the planning time
to focus on form, but unless this was very specific, the fruits of planning may not have
transferred to performance. Hence, possibly, the variations in reported results.
To sum up this section, it appears we now have to consider what might be called
a planning efficiency factor. Not only do we need to consider what happens during
planning. We also need to consider how the potential benefits from this may or may
not be transferred into actual performance. To be sure, this is speculative, but it is
consistent with the results reported in this volume. What it does, fundamentally, is
suggest a need for more qualitative research (Ortega 2005), but research which is now
more focused in how it explores processes of planning and their subsequent impact
on performance.
Serial and parallel processing

Next we will consider the extent to which these different senses of preparedness or
readiness might have an impact on the serial versus parallel modes of processing which
we discussed earlier. Clearly, there is a basic serial quality to communication: one idea
has to follow another. But the Levelt model (1989, 1999), in addition to proposing
ifferent stages in first language speech production (Conceptualisation, Formulation,

d
Articulation), also claims that these stages are modular, and can function in a (fairly)
encapsulated way in parallel (Kormos 2006). This means simultaneous operation.
Wang (Chapter 2, this volume), in her Table One, brings out, for example, how at any
one time that current Formulation will be (a) the result of previous Conceptualisation, and (b) the basis for future Articulation. Such simultaneous functioning is only
possible if each module does its job very quickly, and requires relatively little attentional resources. When these conditions prevail, modularity is maintained, and fluent
speech production is possible, with viable unit length at the level of phrase, minimally,
but more probably the clauseor even AS-unit. If, on the other hand, one of the stages
encounters a problem (such as difficult ideas to express in Conceptualisation, or word
or syntax retrieval problems in Formulation, or difficult phonological realization in
Articulation) then parallel functioning is compromised, as attention is diverted to the
problem stage. The problems have to be repaired, and the thread of speaking returned
to. First language speakers tend to handle such problems typically without parallel processing being disrupted. Second language speakers are less likely to do this (although
proficiency level is an important factor here, as will be discussed below.).
Interestingly, one of the goals of strategic planning is to complexify the ideas that
are to be expressed (leading to the very consistent finding that language complexity increases are associated with such planning time). In other words, one interpretation of planning is to make serial processing more likely (other things being equal),
as Conceptualiser operations become more demanding, and probably, more difficult
lemmas need to be retrieved to handle the more complex pre-verbal ideas that are
generated. (This is very close to the interpretation offered by Crookes 1989, as to why
accuracy effects are not found in planning studies. He suggested that speakers created
difficulties for themselves by using planning to implicate more advanced, and so more
difficult, morphosyntax.) Of course there are beneficial aspects to this triggering of
more complex language, but one has to be alert to the way there may be consequences
for other aspects of performance, and one of these may be a greater need to use serial
modes of processing. Interestingly, the other proposed goals of pre-task planning,
rehearsal and organisation, do not pose similar problems for ensuing communication,
but instead equip learners (with greater probability but not certainty) for more parallel processing. With rehearsal, including retrieval of relevant lemmas (Ortega 2005),
second language speakers are readied for the demands that will be made on them, and
planning-as-organisation probably functions similarly to structure, in that it provides
learners with an outline of what they are going to say. This may not impact upon parallel processing directly, but it will enable them to regroup more effectively, in the sense
that if they lose the thread of parallel processing, they are better equipped to retrieve
this mode through the organisational structure which they have given themselves to
work within. The organization and structure provide what might be termed pickup
221
222 Peter Skehan
points which can restart more fluent performance. This point is discussed more extensively in the later section on structure.
The other forms of readiness can be interpreted differently, and all seem directed
to enhancing parallel processing. Knowledge of an area, and familiarity generally, (see
Bui this volume) provide organisation of ideas, and possibly, accessibility of lemmas,
as indexed in the Bui study by faster speech and slightly greater lengths of run. One
assumes that such knowledge, and organization and accessibility of ideas, means that
additional Conceptualiser operations are not so necessary, and so Formulator processes can benefit from greater attention, and enhance parallel processing in the sense
that more capacity is available for attending to formulation during the current speech
production process. Previous speech (see the repetition condition in Wang, this volume) seems to promote effective Conceptualiser readiness with major enhancement of
Formulation and Articulation, at least where immediate repetition is concerned. The
impact on accuracy and fluency in Wangs study is very clear, suggesting that drawing
on a repertoire which has been primed by the previous performance is very effective
indeed. This is all the more remarkable in that complexity also benefits hugely, suggesting that repetition even creates space for a repackaging of the ideas that are to be
expressed, to some degree (Bygate 2001). Repetition is the only condition where all
three performance areas benefit in a very clear way, and so we can now say, the only
area where additional Conceptualiser activity does not seem to be at the expense of
Formulation and Articulation.
The final area to consider is on-line planning. Ellis (2005) has argued that this
form of planning is particularly effective for raising accuracy. We have seen in the present volume, through Wangs research, that simply slowing down the performance, in
itself, did not produce higher levels of accuracy this had to be linked with some way
of involving pre-task planning or Conceptualiser work. When this was done, through
Wangs supported online planning condition, which provided strategic planning
opportunity followed by slowed performance conditions to enable better on-line planning, there was a clear impact on accuracy and complexity, although not fluency. The
provision of greater processing time here does seem to lead to a greater opportunity
to engage in parallel processing, but only if adequate preparation has been achieved
first. Simply providing more time is not enough there has to be some guidance to
enable this extra time to be appropriately exploited. But this raises another issue
the relationship between Conceptualiser and Formulator operations. The supported
online planning condition led to the elusive and desirable result of a joint raising of
complexity and accuracy. What seemed to be happening here is that effective thinking for speaking took place in the earlier watched phase, and this was then carried
over into a performance in which error could be avoided to a greater extent because
of the actual performance conditions in which the video speed had been slowed. Most
other task research designs find ways of promoting either complexity or accuracy. This
particular way of generating preparedness linked to supportive processing conditions,

enabled both to come into play.
The other part of Wangs (this volume) study which is relevant is her repetition
condition. As we have seen, this produced an appreciable improvement in performance relative to the control group condition, in complexity, accuracy, and fluency.
Somehow, actually engaging with a task, with completing it, constitutes the most effective preparation of all. It appears that the first iteration lays the ground for Conceptualisation, Formulation, and Articulation to all proceed more effectively. Somehow
this activity confers a greater benefit for performance than do rehearsal and retrieval,
even though one might have thought they would be equally effective. Actually using
language is more effective than thinking about the language which could be used. So
while there are benefits for Conceptualisation (reflected in higher complexity scores),
Formulation also benefits, with a large increase in accuracy, and also fluency. As Wang
(this volume) points out, this study is unusual in that it engages Articulation as well
as the other stages outlined in the Levelt model. There seems to be something added
by having previously used the actual language that other forms of preparation do not
contribute so well. In a sense, any planning ought to lead to useful priming effects.
Repetition seems to do this particularly well (Wang 2009). This may be because the
act of speaking makes subsequent speaking and articulation easier (and so less attention consuming). It may also be that articulation does prime associated language very
effectively, and also that the deeper lemma access it generates provides a better basis
for the subsequent (parallel) performance. It would seem that having used language
is a way of maximising the chances that parallel processing can be maintained. Possibly similarly to the impact of task structure, having run through the entire task once
enables the speaker to locate new restart points when difficulties are encountered.
Alternatively, breakdowns that occur in much second language speech may be less
likely because the memory of wrestling with the same problem is still available, and
tentative solutions from earlier can be re-applied and applied better. Whatever underlies this effect, this does seem to be a very effective way of getting ready for a subsequent performance, and more research into it would be very much worthwhile, as
Bygate and Samuda (2005) have also argued.
The final study of planning is the qualitative investigation by Pang and Skehan
(this volume). There are no new findings of effects on performance here, so much as
insights into what sort of things planners do, and how these might contribute to performance. The study does, though, have contributions to make regarding serial and
parallel performance. For example, we have seen elsewhere that structure has a beneficial impact on supporting parallel performance. What is interesting here is that there
is some evidence that some learners use planning time to impose their own structure
on what they are going to say. Further, this is then associated with higher levels of
performance, so some second language speakers, when they plan, seem to have come
223
224 Peter Skehan
upon this advantage for themselves. More generally one would expect the opportunity
to prepare, to get material ready for performance, would ease attentional demands and
therefore make parallel processing more likely. What is particularly interesting is that
this does not apply equally to all the things that could be planned for. More specific
ideas and content seem to fade less than do a concentration on form, and on planning
generally rather than specifically. So decisions by speakers to concentrate on Conceptualiser preparation seem to pay bigger dividends, and so might also be more effective
at smoothing the operation of speaking and maintaining parallel processing.
The self-reports on planning also convey an awareness, on the part of second language speakers, of their limitations and the need to overcome these. It is clear that
over-ambition, relative to a particular speakers level of ability, is a major factor. Many
participants report behaviours which over-extend their performance and cause difficulty. Or, to put this another way, behaviours which push learners so that serial processing is more likely because they are trying to be too ambitious, and cannot sustain
parallel performance. In addition, a number of participants show that they are perfectly aware that they will encounter difficulty, and have some idea of how they could
prepare to overcome such difficulties. This too is consistent with the idea that they are
aware that they need to get a higher level of performance (i.e. parallel processing) back
on track. In sum, therefore, the view from the inside where planning is concerned
shows a lot of evidence that maintaining flow in performance, maintaining a parallel
mode of speaking, is a major factor, one that is influenced by self-reported behaviours during the planning period, and also something that the speakers themselves
are aware of.
This section, however, has to be finished with a major caveat. The studies in this
volume have not systematically explored the entire proficiency range. They have, with
one exception, focused on intermediate level learners, with relatively few of these
being at high intermediate level. This even applies to Pang and Skehan (Chapter 4,
this volume) where, even though two proficiency levels were involved, these did not
go outside the intermediate range. The exception is Bui (Chapter 3, this volume) who
did have some low advanced learners in his study, in his higher proficiency condition.
There are important consequences which follow from this limitation. First, it
reduces any claims for generality. The hope in this section, as well as those which follow, is that results and conclusions are applicable more widely, rather than more narrowly. So it would be preferable to be able to claim that insights regarding planning,
for example, are robust and likely to apply widely. The restricted proficiency range,
while not incredibly narrow (intermediate learners are an important group of language learners in themselves) does, though, mean that claims can only be made about
performance on spoken language tasks by this group.
There is, though, an even more specific version of this caveat. The main part of this
section has explored issues connected to serial-parallel performance, and proposed
that it is differences between the first and second language mental lexicons that are
vital in altering the serial-parallel balance. Clearly, as proficiency increases, it is likely
that the second language mental lexicon will develop (have more elements, better
organization, richer information in lemmas) and that the serial-parallel balance will
be strongly affected. This is likely to become very important as high intermediate and
low advanced levels of proficiency are reached. Such a conclusion simultaneously limits the power of generalizations that can be made and also indicates the urgency of the
need for research in this area.
Structure
Introduction
As in the previous section, we can start by asking what structure in a task is. Of course,
taken more broadly this connects with the literature on discourse analysis (Winter
1976; Hoey 1983) and the psycholinguistics of text structure (Kintsch 1994), both
spoken and written. Discourse analysts have tried to explore how some texts have
analysable and familiar structure, a discourse framework which has stood the test of
time and acquired some universality. Psychologists have explored the importance for
comprehension of concepts such as scripts and schemas, and how these influence our
expectations about what is likely to be said, and how, if there are major cross-cultural
differences, they mislead us and make processing difficult. Discourse analysts have
explored how discourse structure might impact on the ways we organise what we
say, as in things like restaurant scripts, or descriptions of a house or apartment. What
happens in these descriptions is not arbitrary, and follows predictabilities which are
important for listener as well as speaker. The different structures which have been
researched demonstrate that we benefit from working within them, since they remove
unpredictability and provide a shell within which the interplay of ideas and language
is facilitated.
The literature that explores relatively brief oral task-based performance has been
concerned with only a subset of the different ways of characterising structure, and has
concentrated on simple narrative structures (beginning-middle-end) or the problemsolution structure (Hoey 1983; Winter 1976). (Structure has not particularly figured
in research into interactive tasks.) With simple narrative structures, the sequence generally consists of some sort of contextualisation, then a development through events,
with some sort of resolution to bring things to a close. The important point here, with
simpler narrative structures, is that there is arbitrariness. There is a need in narratives for development, and a satisfying resolution which brings together and comments upon the events (possibly amusingly). But there is also freedom in what might
225
226 Peter Skehan
happen, and while the development is unfolding all one knows is that there will be an
ending. In contrast, in structures such as problem-solution, the development which
takes place is constantly being related to the problem which has been announced,
and so the degree of arbitrariness is considerably reduced. The final resolution, when
it comes, is then a comment on the satisfactoriness of the solution that is posed to the
problem which was set. The parameters within which things are judged are therefore
far more precise, and correspondingly, the expectations when the narrative is listened
to or produced are much clearer. What this means for the speaker is that development
consists of handling sub-sections of the structured narrative before one returns to the
broader structure that is motivating the story. In other words, there is a license to get
on with minor developments insofar as these minor developments mesh nicely with
the broader development of the story. The speaker, in other words, knows their place
in the story. As a consequence, given the existence of a narrative frame, Conceptualiser
operations are considerably eased, and more attention is available for the Formulator.
The studies in this volume

Three chapters in this volume bear on this analysis, those by Skehan and Shun
(Chapter 7), Wang and Skehan (Chapter 6), and Pang and Skehan (Chapter 4)
(although the third of these addresses issues of structure only partially). The results
from these chapters will be re-presented briefly next, but three points are important
before we get into this level of detail. First, both Skehan and Shun (this volume) and
Wang and Skehan (this volume) used video clips as the basis for the narratives which
were told. This contrasts with a great deal of the literature on narratives within a taskbased approach. Most studies have used cartoon picture series (Skehan & Foster 1999;
Bygate 2001 are exceptions), and so the performances which were elicited for each
of these two chapters were under contrasting processing conditions to most in the
literature. The videos in these two studies were natural speed videos, and so if a narrative had to be retold while a video was running, this presented a considerable input
problem for the second language speaker to process and then act upon. To put this
another way, to find an effect for structure here is a significant test because it has to
be strong enough to withstand the real-time processing pressure that is involved. But
a second point mitigates this to some extent. The next section will discuss Processing
factors more explicitly, and this raises a problem, in that it is not totally possible, in
this discussion of structure, to ignore issues associated with Processing factors. The
two sections, in other words, are not as distinct as one would like. In each, there is a
contrast between Here-and-now and There-and-then performances, which is a strong
processing condition. There is also the complication that in the Skehan and Shun (this
volume) study there are three Here-and-now variations, only one of which is classic
Here-and-now. The other two each have some sort of mediation of the intensity of the
processing condition, with a Summary given in one case, and the opportunity to pause
the video in the other. So these factors have to be borne in mind when interpretations
of the effects of structure are provided.
A third point is to consider how comparable the video clips were that were used
in each case. Skehan and Shun (this volume) used four Mr. Bean video clips. Wang
and Skehan (this volume) used four Shaun the Sheep clips. There was no difference in
length, no difference in amount of overt dialogue (virtually none). Obviously Mr. Bean
contains real characters while Shaun the Sheep is an animated cartoon. Yet one could
easily argue that the sheep, dogs, and pigs in Shaun the Sheep have more convincing
human characteristics than many of the people who appear in Mr. Bean! The key comparison we can make, however, is in terms of structure. Skehan and Shun (this volume)
range their four videos along a scale of structure, going from no structure (Crazy Golf)
to beginning-middle-end structure (Christmas) to loose problem-solution structure
(Funfair) to tight problem-solution structure (Thief). Tight problem-solution structure is characterised by clear conformity to Winters four step structure of SituationProblem-Solution-Evaluation, with no deviation or significant extraneous material,
or sub-threads within the narrative (and this is what distinguishes Thief from Funfair). Wang and Skehan (this volume) contrast two structured videos (Tooth Fairy,
Bathtime) with two unstructured videos (Fetching, Off the Baal). The two structured
videos are most comparable to Thief from Skehan and Shun, in that Winters problemsolution structure is clearly present in each case, and although there are many amusing
events along the way, they are all tightly bound into the problem-solution sequence (at
the level of abortive trial solutions, or slightly extended solutions). So, for present purposes, it is fair to locate Wang and Skehans two videos as reflecting a tight problemsolution structure.
We can now explore the results from the two chapters which directly focus on
structure. Skehan and Shun (this volume) report an effect of structure on complexity, accuracy, and fluency, with structure leading to raised performance in each of
these areas. With complexity, the major contrast is between the most structured video
(Thief) and all the others. With accuracy, each degree of task structure raises accuracy,
although perhaps the major difference is between no structure (Golf) and all tasks that
have some structure, whatever it might be. The impact on fluency (end of clause pausing) resembles that on complexity, in that the most structured video produces clearly
the greatest degree of fluency, and the performance on the other videos do not form
a scale. So it is clear that structure is beneficial here, and the obvious point is that the
effect has been found with a video-based narrative retelling, with the time pressure
that that implies. Wang and Skehan (this volume) report a similar but far from identical story. There are main effect influences for complexity and end-of-clause pausing,
but the major effect reported is that structure has its clearest effect in the There-andthen condition in that study. In other words, when there is processing pressure, as
227
228 Peter Skehan
in Here-and-now, the effect of structure, while there, is not strong with complexity
and fluency, and not really evident at all with accuracy and lexis. So, despite the tight
structure which characterises the videos used, the structure condition does not really
overcome processing pressure to any degree. Possibly it is the mediated Here-and-now
performances in Skehan and Shun (this volume) which have generated the significant
effect for structure there. But at least with Wang and Skehan (this volume) structure
does have an effect on complexity and fluency (though not great), and a fairly large
effect on these areas for There-and-then. Structure also has an effect on accuracy, but
only with There-and-then, and this not particularly large.
A final note in covering the results concerns Pang and Skehan (Chapter 4, this
volume). They did not directly investigate structure, since theirs was a qualitativelydriven study, and so what came up in the qualitative results depended on what the participants said. But it is interesting that some participants report using planning time, in
a study based on cartoon picture narrative retellings, to impose some organisation and
structure on the narrative they were going to produce. They seemed spontaneously to
regard structure as valuable. In addition, very importantly, this use of planning time
was associated with higher levels of performance, confirming the results reported by
Skehan and Shun (this volume) and Wang and Skehan (this volume).
Interpretations
The next task is to try to account for these results, and to understand what is going on
psycholinguistically. At the simplest level, it appears that the existence of a macrostructure does release attention for the speaker, and that the focus for such free attention
will be form, but it would seem now that different aspects of form may receive priority.
Some second language speakers may exploit the released attention to achieve greater
accuracy, in the way that has been predicted in the past. The less pressure that follows
from having a clearer macrostructure means that capacity can be directed towards better online planning, and also monitoring and repair. Hence the accuracy effects which
have been found attention has been directed to the Formulator. But equally, any
available attention can also be directed towards rethinking ongoing conceptualisation
and achieving a higher level of structural density, or complex phrasal structure. In fact,
the notion of structure may be directly linked here, in that if the task itself is structured, there may be more scope to use more complex language, for instance through
subordination, to bring out more clearly the connections between different aspects
of the material in the task. This may be akin to the way planning time interacts with
task complexity, with planning having greater effects when tasks have greater potential through more elements or the need to transform elements. Task structure may be
functioning a little in the same way giving learners something more challenging to
express. The greater complexity is a response to the demands of the task, and to the
attentional resources available.
Limited attentional capacity, second language performance, and task-based pedagogy 229
It is also helpful to revisit Levelt model for L1 speech, and to explore how it
operates differently in a second language context. As mentioned earlier, in the first
language case, one assumes parallel processing, in that the different modules work
together, so that each is doing its work at the same time, dealing with developing
ideas (the Conceptualiser), clothing the ideas in language, in lexis and in syntax (the
Formulator), and then producing that language as actual speech (the Articulator).
The model implies a sequence of speech production, but where the product of each
module is then handed over to the next to accomplish actual speech. All the while
the Conceptualiser continues to produce pre-verbal messages to replace the ones that
the other modules have completed work on, in turn. The only other essential point to
make is that the mental lexicon is accessed to help in the overall process of translating
ideas into language, so that the lemmas in that lexicon (a) are an essential element and
(b) contain the information that is needed to enable language production, including
rich information about meaning, syntax, morphology, collocation, and even articulation. With the efficient operation of this system, the result is that parallel processing is
achieved, in that stages do their jobs very quickly, and so do not consume attentional
resources, in the normal case, enabling the way each stage is simultaneously operative.
The central difficulty in the case of second language performance is that the different stages can each encounter difficulties, independently, but in each case with the
result that the parallel flow of operation is disrupted since attention has to be allocated
disproportionately to one of the stages. (It is assumed, though, that the Conceptualiser, which is more language independent, is least vulnerable to such disruption.) As
a result, a serial mode of performance is frequently necessary, as the flow and cyclical
progression cannot be sustained.
This analysis of the pitfalls of second language speech production leads to two
central questions:
For the second language case, what conditions make it most likely that parallel
functioning can be maintained smoothly?
For the second language case, when there is breakdown, and serial processing
results, how can parallel processing be re-established?
A first major advantage of discourse structure is that it eases the relationship between
the Conceptualiser and the Formulator. Ongoing speech requires parallel operations
in which the speaker simultaneously has to keep track of immediate propositional
content, and also the relationship between that content and the wider discourse. As
a result, Conceptualiser operations can be demanding, and the relationship between
Conceptualiser and Formulator is ongoing and often attention consuming. On occasions where there is structure, however, the speaker has a clear overall framework
within which to speak and so is more able to give attention to the more detailed level
of ongoing pre-verbal messages. The result is that, other things being equal, more
230 Peter Skehan
a ttention is available for Formulation, and so there is a little bit of spare capacity in the
system should occasions arise, as they will, where accessing the mental lexicon, and
retrieving and exploiting lemma information is more demanding. As a result of this, it
is less likely that the Formulator will require excessive attention, and parallel processing can be maintained more often.
But the second advantage of structure to consider relates to retrieval from breakdown. It is a problem when one is speaking a second language that when things go
awry, it can be difficult to retrieve or keep track of the general plan one was trying to
put into operation. The second language mental lexicon is not as extensive, or elaborate, or as well organised, or as fast-access, as the first language lexicon. The demands
placed upon it during continuous speech cannot easily be met, since its operation is
effortful and slow. The result is that one has to find attentional resources not only for
ongoing communication, but also for the route back, so to speak, to enable the original
plan to be recovererd and executed. This can have a very considerable effect on the
harmonious production of language.
A major advantage of structure in the speech production process is that a clear
macrostructure to what is being said provides the speaker with multiple opportunities
to restart from a reasonably well-defined point. For example, if a narrative is based on
a tight situation-problem-solution-structure, a particular sub-section may cause problems, but then when the next section is reached, the decks are cleared, as it were, at a
well-defined point in the overall structure, and smoother speech production can be
restarted. This support-through-structure avoids the need to have to reorganise the big
picture of what is being said (a difficult task indeed) and enables the speaker to get on
with relatively local proposition expression. In other words, although a parallel mode
of processing may have been disrupted, a new starting point can enable parallel processing to be regained. The result is that once again Conceptualiser-Formulator relations become more harmonious, and the second language mental lexicon has attention
available to handle the pre-verbal message demands made by the Formulator. And of
course, this can be done more than once! So we see that structure can make a useful contribution to both the questions posed earlier in relation to the maintenance
of parallel processing -easier availability of attentional resources for the Formulator,
and also multiple potential re-entry points to recover parallel processing in the wider
discourse.
Processing, task complexity, and cognition

All the chapters in this volume have subscribed to a view of attentional and working
memory capacities as limited and have explored consequences of such limitations.
The next section tries to bring together issues which derive from a central aspect of
this the impact of processing conditions on second language task performance. We

have touched on this briefly already in the section on planning, where on-line planning
research is essentially concerned with processing conditions (and so we will not revisit
that research in this section again). Here we will look at the chapters which make processing more central in their research designs, and which, by so doing, shed light on the
Cognition-Tradeoff debate.
The Cognition Hypothesis (Robinson 2011) proposes that task complexity is an
important influence on performance, and that (counter-intuitively for me) greater task
complexity pushes speakers (or writers) to achieve higher complexity and accuracy
simultaneously, as they mobilise attention to do justice to the greater complexity of
the task which is being done. Language complexity is raised through the demands of
the greater task complexity, and accuracy is raised as the greater need for precision
becomes more influential. The specific influences derive from what Robinson (2001)
terms resource-directing factors, such as time perspective, number of elements, and
reasoning demands, which he links to particular language needs. In contrast resourcedispersing factors, such as plus-or-minus planning time, prior knowledge, and single
tasks, affect attention generally and do not link with pushing attention to respond to
particular language demands. To restate this, first, the Cognition Hypothesis (CH)
proposes a particular set of task features, such as time perspective, which share the
quality of being resource directing (although what constitutes that common quality
is not always clear). Second, these factors drive task complexity. Third, when there is
greater task complexity, accuracy and language complexity are jointly raised. It should
be noted, therefore, that the Cognition Hypothesis is not simply saying that task complexity can be defined variously it has to be associated with a resource-directing
feature. In addition, it is not saying that task complexity leads to greater language complexity alone there needs to be an impact on language complexity and accuracy.
These stringent predictions are what give the Cognition Hypothesis its greatest distinctiveness and appeal.
The Limited Capacity (or Tradeoff) approach does not make any distinction
between resource-directing and resource-dispersing factors. Instead its central premise is that attention and working memory capacity are limited, and that these limitations constrain what is possible in performance. There will be times, in other words,
when general available attention will be lower, and this will have a damaging effect
on performance. The Limited Capacity approach also proposes that task features and
task processing conditions can impact on selective aspects of performance (akin to
resource-directing factors) but does not propose any quality of resource directedness.
Where there are influences on specific aspects of performance, for example, complexity, accuracy, it proposes that these have to be accounted for by specific independent
variables acting on specific performance measures. The areas explored in the previous
two sections (planning and task structure) are examples of this. Each of them has
231
232 Peter Skehan
a complex relationship with aspects of performance. Planning, as we have seen, can

push for greater task complexity (the opposite of a CH account), or it can ease processing demands, depending on the particular nature of the planning undertaken. Structure can influence language complexity and accuracy, but, like planning, is regarded as
a resource-dispersing variable within the Cognition Hypothesis, which, from the present perspective, is a little curious. Interestingly, a Limited Capacity account accepts
that there will be times when accuracy and complexity will be jointly raised, but proposes that this is because of the separate effect of different task factors or processing
conditions. To justify these interpretations, it makes links with the psycholinguistics of
speaking, here also attempting to generalise Levelts first language model of speaking
to the second language case. A great deal that happens then follows from the ways a
second language mental lexicon is less adequate than that in the first language.
The chapters to be discussed in this section focus on processing and time perspective. Two are experimental studies, and one is largely (but not wholly) based on
qualitative research techniques. The two experimental studies used video-based narratives, as indicated earlier. This research design decision was made for two reasons.
First, given that the majority of studies of narrative task performance are based on
picture cartoon series, it simply widens the database of findings that is available, and
so might permit more robust generalisation. Second, the use of video-based narratives, especially under conditions of simultaneous story telling, raises the processing
demands, since it gives the participants in the studies interesting input-handling problems, which might well make experimental effects more difficult to obtain. In other
words, the need to handle the remorseless flow of new input may swamp the learner
and give other influences such as structure little spare capacity to work with. The pressure is on, in other words, for the variables which are used in the design of the studies
to show their worth.
The relevant studies here are Skehan and Shum (Chapter 7), Wang and Skehan
(Chapter 6), and Pang and Skehan (Chapter 4). The first two are also studies that figured in the discussion of task structure, and this is a good indicator that the two factors, structure and processing, are linked, so that the present section will simply have
a different emphasis and try to concentrate on processing issues even though they
are bound up with structure. The section is also the closest direct investigation of the
Tradeoff and Cognition Hypotheses, since the major implementation of a processing
variable concerns a contrast between Robinsons Here-and-now and There-and-then
conditions, an example of his resource-directing variables.
Skehan and Shun (this volume) contrasted four processing conditions, three of
which were Here-and-now and one of which was There-and-then. A major focus of
their research concerned the different Here-and-now conditions. In addition to a base
condition, they also used the Here-and-now format, but with some mitigation, either
in the form of a Summary given to participants before they started a video narrative
retelling, or by having the ability to pause the video. These two conditions were associated, slightly, with higher levels of performance, with the provision of a summary
before the narrative was done having a good effect on accuracy. The clearest effects
here were with the There-and-then condition which generated raised performance in
all areas, but especially fluency, where end-of-clause pausing was significantly lower
and with reformulations, which were higher, i.e. more dysfluent, but where one could
argue that the greater number of reformulations indicated a greater engagement with
the discourse that was being produced. However, and this is a very important qualification, the cell numbers for the comparisons of processing condition were small, so
the conclusions one can draw from this study are only tentative.
Pang and Skehan (this volume), in their qualitative study of reported planning
behaviours, draw attention to two types of report which bear upon processing. First,
there were participants who reported, in effect, overdoing their ambition when they
were planning, so that the subsequent performance was pressured because of their
own behaviour. Second, there were reports of participants anticipating the problems
of pressure, and preparing for it, to some degree. This study does not directly address
processing influences, but it is interesting that some of the participants themselves
seemed to be aware of the importance of processing pressure, and better performance
was associated with those participants who were realistic and prepared.
The major study in this volume which directly addressed the processing issue is
Wang and Skehan (this volume). As we have just seen, they used a research design
which manipulated structure (discussed above), vocabulary difficulty, and time perspective (as the variable bearing upon processing). They report a general and strong
effect for time perspective with There-and-then performances which have more structurally complex, more accurate, more lexically complex, and more fluent language. But
in addition there is an effect for structure, but only with the There-and-then condition,
in that Structured There-and-then conditions produce the highest performance of all.
This is a much clearer effect than Skehan and Shun (this volume) and it shows that
processing pressure, which comes from the need to retell a narrative while a video is
running and therefore providing considerable and ever-changing input, has a clear
effect on performance, and one which is not good, in any performance area.
Broadly, then, processing pressure is an issue. One minor and two major factors
will be proposed here as relevant. The minor factor is that of vocabulary difficulty.
It was interesting to manipulate this factor for the first time, but unfortunately, its
impact was limited. Earlier post-hoc work with previous studies had suggested that
vocabulary difficulty can have a disruptive effect on processing, and push second language speakers into more serial processing modes, as second language mental lexicon
problems have a strong impact. The evidence in Wang and Skehan (this volume) was
not supportive. Further research here may be warranted, but in a perverse way, we can
conclude for the present that vocabulary burdens for second language speakers were
233
234 Peter Skehan
not as troublesome as was expected, which may have its attractions, pedagogically! If
task input lexical demands had a strong impact on performance irrespective of other
variables, considerable effort would have to be put into scrutinizing pedagogic tasks
in order to avoid accidental points of difficulty. The first major factor is the issue of
quantity/speed of input that has to be handled for a task to be effectively transacted.
The faster and more extensive the input, the greater the problem that is posed for the
speaker. In a sense, this is the opposite of the claim made in the previous section about
structure that it enables effective restarts after trouble has been encountered. Here,
the problem is the remorselessness of input, and the way this poses serious problems
for the second language speaker. More input means that the speaker has to struggle
more and more to keep up with the input, and its lexical and propositional demands,
with the result that a serial mode of speaking dominates. Restarting is possible, but
not with anything other than a reactive mode. There are no opportunities to regroup,
and link to any general discourse macrostructure the speaker simply has to deal with
whatever is new in the input, with the result that effective parallel processing becomes
close to impossible.
Related to this is the second major factor that of non-negotiability. In simultaneous narrative tasks, the input has to be heeded. But with other types of task, there are
times when the speaker can shape what is to be said, and in so doing, make things easier for themselves. In a way, an aspect of the difficulty of simultaneous narrative tasks
is that they are based on input that is non-negotiable and with no time to interpret or
reframe it the speaker then has the role of describing that input in its own terms. In
other tasks, selection, reorganisation, and alternative orientations become possible.
These have the considerable advantage that the speaker can then play to their strengths
and away from their weaknesses (Foster & Skehan 2013). Since narratives provide less
scope for this to happen, it is clear that they make serial processing more likely, independent of amount of input they deprive the speaker of methods of personalising
the task. Perhaps discourse structure has a role here. At least with some awareness of
the overall structure the speaker can free themselves of input dominance, and decide
on narrative paths of their own, thereby enabling them to find a pathway through the
task that is strategically easier to manage, and so making parallel processing just a little
more likely. In that respect it is worth noting that the chapters in the present volume
(with the exception of Li, this volume) only used narrative tasks. It is likely that interactive tasks are going to be much less susceptible to this influence of non-negotiability.
They are much less input-based, and also more likely to be shaped by the directions
suitable to the participants.
The results reported in Wang and Skehan (this volume) do provide some
encouragement for Cognition Hypothesis advocates. The There-and-then condition, proposed as more complex by this hypothesis, did generate higher accuracy
and complexity, as predicted by the Cognition Hypothesis. Unfortunately for the
hypothesis, it also generated greater fluency, contrary to predictions. In addition,

Skehan (2009a) proposes that for the joint accuracy-complexity effect to be supportive of the Cognition Hypothesis, there needs to be, in addition to statistical
group effects, a correlation between accuracy and complexity scores to demonstrate that the effect operates at the individual level. The correlations reported were
not supportive of this. So this raises the question as to why there is such a joint
accuracy-complexity group effect. As argued in Chapter 6, it is proposed here that
we have a processing effect, which contrasts the communicative pressure of a Hereand-now narrative (despite the memory easing that might be involved) with the lack
of immediate processing pressure, and the time to make communicative choices,
which leads to much higher performance even though memory issues are involved.
In other words, the input dominance of the Here-and-now condition is avoided,
and the speaker has the space to shape and repackage the story, especially when a
structured task is concerned (which itself perhaps eases memory difficulties). Hereand-now is non-negotiable. There-and-then is negotiable. This contrast seems to
have more importance than time perspective linked to memory, the factors that the
Cognition Hypothesis would propose as vital.
So once again a processing, Tradeoff-consistent interpretation is adequate,
and supports the claim (Skehan 2009a) that it is the conjoint influence of different
processing-linked variables which can account for results. The lack of correlation
between accuracy and complexity in Wang and Skehan (Chapter 6, this volume) is
consistent with this. The lack of correlation (and see Skehan (2009a) for comparable
results in other studies) suggests that second language speakers can prioritise one
of these aspects of form, but find it difficult to do well with both. (And indeed, the
individual differences aspect of task performance beckons as one of the major unresearched aspects of the field.) Once again, we can relate this to the Levelt model, and
serial and parallel processes. A Here-and-now condition, with its remorseless input,
is constantly making fresh processing demands on the speaker. The immediate presence of input material, in its way, then constitutes a hindrance, since it provides the
speaker with even more information that could be encoded (since selectivity with the
input is more difficult to achieve), rather than an easing factor for memory. A serial
mode is then unavoidable, and with that the unsatisfactoriness of trying to relaunch a
flow of discourse which is soon, in turn, disrupted. There-and-then, in contrast, is not
input pressured, and enables shaping and negotiating on the part of the speaker. Being
derailed by new input is then less likely, and a plan can be formulated which is more
likely to be stuck to. There are memory demands, it is true, but these are compensated for by the other features of the There-and-then condition. When the narrative
is structured, so much the better, since the structure, presumably, makes the memory
demands less acute. The organisation of the structured story enables the speaker to
avoid getting lost, and to get on with immediate performance.
235
236 Peter Skehan
Selective attention and task conditions

We have looked, so far, at task features and conditions which impact on performance.
The previous section was concerned with processing issues more directly. We continue that emphasis here, but the focus is more on the act of speaking itself rather
than the input conditions which surround that speaking. The broad theme is how
attention can be used selectively during task performance, and specifically how it can
be nudged towards a focus on form. Two related questions underlie this discussion.
First, we have to ask whether attentional functioning and prioritising is influenced
by the learner, or by the task, or the conditions under which a task is done, or some
combination of all of these. Second, we need to explore whether, for performance
on second language speaking tasks, there is any sort of default position regarding
where attention will be directed to meaning and fluency, to lexical or structural
complexity, or to accuracy.
Learner characteristics have been remarkably under-researched in the task
literature. Working memory has been the most examined, but there are many
other possibilities. Skehan (1986) has discussed analysis-oriented learners versus
memory-oriented learners, and there are many other learner characteristics which
might be relevant here (Skehan 1989; Dornyei 2005), such as learning style, field
independence, personality, and so on. For now we can only recognise the possibility that these various factors might have some impact on typical approaches to
attentional priorities, and we can hope for future research to explicate matters.
More central to task explorations have been research studies exploring how different task characteristics can influence performance. Robinson (2001), through
the Cognition Hypothesis, proposes task complexity as the driver here, with its
joint impact on accuracy and complexity. Skehan (2009a), using a limited capacity account, prefers to look for specific potential influences, explored separately or
in combination, to see if there are generalisations that can be made regarding task
quality-by-performance associations. For example, with complexity, tasks which
require information integration (Foster & Skehan 1996) or tasks with problemsolution structure lead the speaker to encode the relationships which characterise the task in more complex syntax, essentially a resource-directing influence. In
fact, the limited attention approach is perfectly comfortable with the concept of a
resource-directing influence: what it does not extend to is a joint influence of task
complexity on accuracy and structural complexity, as a bundle. In any case, the
literature contains many reviews of systematic task influences on performance (e.g.
Skehan 2001; Ellis 2009). The basic point being made here is that task characteristics induce speakers to devote attention to particular areas of performance, and so
the decision to choose a particular task, with its attendant qualities, is a decision to
push for higher levels of performance in particular dimensions. Similar arguments
can be put forward regarding task conditions, as we have seen in this volume, and
factors such as planning or time perspective can be argued to lead to attention

being directed to particular performance areas.
But a wider issue with tasks is the notion of task difficulty, an issue of obvious
relevance to assessment, but one which also has a strong role to play in how attention is used. The difficulty with difficulty is that performance is multi-dimensional:
good tasks can have multiple interpretations, and so establishing a simple scale along
which tasks can be ranged has resisted clear progress. Even so, we would all accept
that some tasks are more difficult than others, and more demanding of attentional
resources when they are being completed. So, even in the absence of clear definitions
of difficulty, we will proceed on the assumption that tasks vary in this regard. For the
sake of this argument, which is concerned with identifying appropriate tasks to work
with in research and pedagogic situations, we only need to describe tasks as easy,
intermediate, and difficult (with the meaning of these words obviously depending
on the abilities of the second language speakers concerned). We can characterise the
three levels as follows. Easy tasks are those where the challenge of the task (ideas, language, etc.) do not pressure the attentional resources that are available. Difficult tasks
are those where there is serious challenge to attentional resources, to the extent that
performance may be difficult to sustain, and serial processing inevitable. Intermediate tasks, the most interesting tasks, are those where there is pressure on attention,
but at least some of the goals of the task can be met, provided that effective decisions
are made as to where to allocate attention. It is such tasks that we will assume are
operative in this section.
What this implies is that intermediate difficulty tasks are susceptible to influence
through the manipulation of task conditions. If attention is limited and if tasks can be
designed, or task conditions can be arranged, so that no undue demands are made on
attention, the consequence could be that there is greater accuracy. The link between
content familiarity and accuracy would attest to this (Foster & Skehan 1996). If second
language speakers would like, ideally, to achieve accurate performance, and if a reason they do not do this is because things get in the way (as when excessive demands
are made on attentional resources during speaking), then easing tasks and conditions could create space for attention to focus on form and, if not pushed to handle
greater complexity, then to achieve higher levels of accuracy. In this case, accuracy
would be the goal, but also something of a luxury, since it is a lower priority to achieve
than getting a task done. It is raised when all the conditions are right, but possibly at
the expense of stretching the second language speaker in other dimensions of performance. It implies that the language which is chosen and the accuracy with which
language is used may be affected through the conditions under which the task is done.
So such tasks are vital pedagogically, since decisions can be made which push the
learner in desired directions as far as development is concerned.
But a more challenging and interesting goal here is not simply to create unpressured conditions and to hope that accuracy will be raised, as a sort of default for
237
238 Peter Skehan
a ttention, but instead to try to guide attention in some way, so that accuracy, specifically, is fostered and made more important for the speaker. The two studies to be
reviewed in this section are concerned with exactly this challenge, and they explore
ways in which that most difficult aspect of performance to influence, accuracy, can be
nurtured deliberately, rather than simply as a somewhat lucky consequence of a less
demanding task.
Wangs study (this volume) manipulated several variables, as we have seen. Relevant to the present section is her supported online planning condition. She reported
that when online planning was facilitated, with a slowed video retelling, there was no
impact on performance. However, when this online planning condition was preceded
by the opportunity to engage in strategic planning, it produced higher accuracy and
complexity. In other words, simply having more time was not enough here. It was also
necessary that there was some push towards an orientation towards form. As we saw
in Wangs chapter, the planning probably pushed speakers to have more ideas that they
wanted to express. It may also have pushed them to retrieve and rehearse material,
material which could then be more effectively recalled and used because of the less
pressured performance conditions. A default view of attention here would have been
to argue that simply having more time should have led to an increase in accuracy. This
did not happen. It is clear that something more was needed to bring out the potential
for a focus-on-form. Language had first to be mobilised, and then it assumed greater
importance for the speaker. In other words, a form of guiding was necessary to exploit
the greater attentional potential that the on-line planning condition permitted. Wangs
on-line planning condition when supported by pre-task planning raised form selectively. Here it appeared that the organisation and preparation facilitated by the pretask planning needed the space provided by the on-line planning condition (a slowed
video) to release attention which could raise accuracy and complexity. There was nothing in the conditions here which oriented learners towards form it was rather the
availability of attentional resources (released by the strategic planning) which could
be channelled in this way. One assumes that attentional availability was used to enable
more effective monitoring to be carried out.
The one chapter which directly addresses the issue of selective attention is Li
(Chapter 5). Following earlier research by Foster and Skehan (1997, 2013) she explored
whether anticipating the need to do a post-task transcription of ones own performance
would have a selective impact on accuracy in performance. Confirming Foster and
Skehan (2013), she demonstrated such an accuracy effect. This suggests that within the
attentional resources that are available during task performance, where tasks are of the
appropriate level of difficulty, it is possible to prioritise particular areas. Li (this volume) showed that learners who were anticipating post-task transcriptions produced
significantly more accurate language. Knowing that they would be confronted by their
own voices seemed to alert them the problem of error and the way mistakes they made
would have to be embarrassingly transcribed. So they devoted attention to avoiding

having to do this. This simply confirmed Foster and Skehan (2013). But the additional
feature of Lis study is that different contexts of post-task transcription were used.
Two of these conditions seemed to induce a focus on complexity, in different aspects.
Individually-based transcription (as used in Foster & Skehan 2013) led to particularly
raised lexical complexity, as measured by the use of less frequent words. Pair-based
transcription, where collaboration was required, was associated with greater syntactic
complexity, as though this was the performance area which the act of transcription
highlighted and caused to be the focus of attention. Finally, transcription with revision
was, for the decision-making task, associated with greater accuracy. It appears that the
act of revision focusses on correcting errors, and this then becomes the performance
area, during tasks, which benefits from attention.
These two studies, the supported online condition from Wang (this volume) and
the post-task activities from Li (this volume), have in common that the conditions
under which a task is done can have an impact on accuracy, that most elusive of performance goals, and also complexity. In reflecting on these two studies, it may seem
strange to start by exploring why second language speakers do tasks, but this is a useful point of departure. After all, in research studies (and in many pedagogic activities) we ask participants to engage in task performance, cold, and so gaining some
insight into their motivations for their performance is hardly an irrelevant issue. Of
course there is the motive of reimbursement! Researchers recompense participants,
even if only modestly, and so one would hope this payment engages the participants
to some degree. But there is the issue that second language speakers often have to do
something, with language, which has no real relationship to their lives or their real
personalities. One can ask therefore how they react to this request, and whether there
are differences between learners, maybe across studies, because they make decisions
about how much to be engaged in different ways. A range of possibilities exist for how
participants behave:
To placate the researcher: We have just mentioned payment, and more generally,
participants may view a minimum level of cooperation as being involved, and
even calculate how little they can get away with. Alternatively, if the researcher has
engaged their interest, they may try harder.
To handle the input: Obviously this only involves input-heavy tasks, but we have
seen several of these in the present volume.
To say something you want to say: The clear starting point here is that the chosen
task engages the speaker, who has relevant things that can be said. A decisionmaking task could be of this nature, if the decisions connect with the value system
of the participant. Planning might make this more likely if it enabled the speaker
to bring to bear on the task their own personal opinions more cogently.
240 Peter Skehan
To look good, or at least, to avoid looking silly: In effect this would concern situations where the speaker has awareness of their own speaking, and might want to
look good in the most obvious way to avoid error. A post-task condition might
fit in here.
To do better: Clearly here the starting point is to ask, better than what? Which
brings out that the repetition condition in Wang (this volume) would illustrate
this situation a reference performance has been established which is still likely
to be in the memory of the speaker.
To get a good grade: Here the focus is clearly on being tested. This motive is worth
including because it draws attention to the split between findings in the task literature itself, and studies which have been done with tasks-as-tests (Iwashita etal.
2004). The difference may well be linked to differences in perceived purpose in
doing a test-task.
This discussion of purposes for doing tasks frames the discussion of the two studies
reviewed in this section. The discussion is proposing, essentially, that it is important to
have drivers for tasks which are done, influences, that is, which inject some purpose
(Bygate & Samuda 2009). The purpose may come from the participant, or the purpose
may be contrived through the conditions which are used. When there is such a purpose,
either inherent to the participant, or contrived by the researcher, it is assumed that the
degree of focus that the participant brings to doing the task is heightened, and performance may be influenced. In Wang (this volume), under the supported on-line condition, we see that the strategic planning gives the speaker something to say, something
which then energises the performance later, so that later the supportive time conditions are exploited to do greater justice to what has been planned, under the conditions
where retrieval and rehearsal are more accessible. With Li (this volume), the post-task
conditions signal effectively to the speaker that doing the task is not everything, and
that there will be consequences later. As a result, anticipation of what will come later
is the motive which causes speakers to allocate attention to the aspects of performance
that they have been induced to value because of the later condition. An accuracy effect
was found for all conditions, but in addition, it was interesting that individual transcription led to more impressive lexical performance, that pair-based transcription
pushed for greater complexity, and that transcription with revision strengthened the
accuracy effect. In broad terms, Li (this volume) is consistent in her findings with the
same broad class of influence reported in Skehan and Foster (1997) and Foster and
Skehan (2013), although the detail of her different conditions adds to our knowledge
considerably.
It is important to stress here that we assume that the tasks involved in both these
studies are neither very easy nor excessively difficult. But for speakers of the intermediate level of proficiency involved, they do constitute a challenge. The speaker has to
make choices to allocate attention to particular performance areas, so the conditions

which were used are important: they induce selectivity, to some degree, because the
purposes and priorities that come from the conditions lead the speaker to value particular performance areas which otherwise they might not.
Interestingly, though, there is a connection here with the Cognition Hypothesis,
and one which can slightly modify the above conclusion. Robinson (2001, 2011), who
takes a radically different view to that represented here about the limitations in attentional capacity, argues that attention can expand to meet the demands placed upon it.
For him, this is central in enabling accuracy and complexity to be jointly raised, when
pressured by task complexity. The present analysis suggests, in contrast, that attention
does have a maximum capacity, and that whatever the task complexity, that cannot
be exceeded. After all, limitations on working memory are very well established. But
it may be the case that the maximum available attention is not always used. In other
words, we may wish, as researchers, that participants in our studies give 100% all the
time. It is possible, though, that they are not galvanised to do so all the time, or even a
lot of time, whether by the token recompense they are receiving or from the low level
of inherent interest of the task. If variable commitment in maximum potential attentional resources becomes a factor, then interpretations of findings in research studies
become even trickier!
And here the purpose of the task, as perceived by the participant, becomes vital.
If they are engaged fully by a task (through its interest) or if the purpose of the task is
clear to them, (and they fully accept this purpose), then their performance on the task
may change. General attentional availability may increase, but so could the attention
devoted to a particular performance area, such as error avoidance. In that respect, the
SLA literature on monitoring is extensive (Kormos 1999), and it is generally accepted
that it is an important process in second (and first) language performance (indeed, it is
integral to the Levelt model). What Li (this volume) has shown is that task purpose can
be manipulated to raise accuracy, and this presents strong evidence that what is happening is that a greater degree of monitoring is engaged in by these second language
speakers. The monitoring seems mainly to be directed towards avoiding error (which
would otherwise have to be embarrassingly transcribed), but it is clear it can lead to
other aspects of performance too. It appears that in the hurly-burly of task completion, monitoring is not the first call on scarce attention. With appropriate conditions,
though, its use can be enhanced.
This is the general effect that has been found, but there is now enough evidence
to go a little beyond such a generalisation. Two issues are relevant. First there is the
issue of task. Broadly, post-task research has mainly explored narrative and decisionmaking tasks, and indeed, has demonstrated an accuracy effect with both sorts of task.
But it is intriguing that the effect tends to be larger with the decision-making tasks, as
if there is something about such tasks which is particularly supportive for the accuracy
241
242 Peter Skehan
focus engendered by a post-task activity. Second, and even more interesting, though,
is that there is now evidence that when a decision-making task is concerned, there
can also be effects for structural complexity. In other words, the focus on form is not
simply on accuracy, but also includes a sensitivity to using more complex language.
Once again there is no correlation between accuracy and complexity, suggesting that
while form is in focus, typically speakers can only manage to achieve more highly in
one area, not both.
We can return to the earlier discussion of the accuracy effect found in Wang's
study for supported online planning. There it was argued that such a condition makes
it more likely that there will be a Conceptualiser-Formulator balance, and that clarity about the general task will free up attention during performance if conditions
are appropriate, as when the video on which the narrative is based is slowed. It may
be that decision-making tasks, because of their turn-and-turn-about nature, create
similar conditions. The opportunity to regroup while one interlocutor is speaking
can give a speaker the opportunity to create the equivalent of a structured task, and
with each new turn, embark on a parallel processing approach to speech. It appears
that with Li's revision condition, the interactive nature of the decision-making task
gives participants time to focus on language, and the revision condition directs this
attention to avoiding error. It is a combination of circumstances which produce a particular result. So in their way, Wang (this volume) and Li (this volume) have achieved
a similar thing a good Conceptualiser-Formulator balance, and greater chances of
second language speakers engaging in parallel-process-based speech production.
A problem here though is that there are too many explanations for this one effect.
Li (2010) proposes a sociocultural account participants collaborate and build an
encounter which takes them collectively further than they would go if operating alone.
At a lower level, and focussing on interactive opportunities, Foster and Skehan (2013)
propose that decision-making tasks enable stealing of the interlocutor's language, a
finding which would perhaps apply to complexity and accuracy. They also suggest that,
within interaction, when it is the turn of the interlocutor, one could feign listening,
and thereby finesse planning time. Possibly, also, the immediate and obvious presence
of an interlocutor pushes for precision, and therefore greater accuracy. Finally, if one
has an interactive task, which is not driven by input in the same way as a narrative,
then there is greater negotiability as to what might be said, with the result that difficulties can be avoided and strengths exploited.
This list of possibilities has been briefly enumerated so that we can make a link to
Wang's (this volume) study. Her supported on-line planning condition linked strategic
planning (and opportunity for Conceptualiser work linked to Formulator retrieval and
rehearsal) with an opportunity to speak where unpressured conditions enabled what
had been prepared for to be retrieved and utilised. Here, within interaction, we have
opportunity to plan, during interaction when one's interlocutor is speaking, or opportunity to steal (similarly while one's interlocutor is speaking) and then immediately
an occasion to use what has been prepared for during this rest period from speaking.
Given the focus injected into the task towards a focus on form (through the post-task
condition) plus the useful planning to performance conditions, it is perhaps not so
surprising that interactive tasks are a good locus for experimental effects to be found.
These, following the above analysis, can often involve both complexity and accuracy,
but not often both for the very same participant.
Second language performance: Positive and negative influences

We have now reviewed the studies which form the heart of this volume. The survey has
been extensive, not least because the range of variables which have been studied has
also been extensive. Accordingly, Table 2 shows, in summary form, the range of findings from the studies. Setting them out in this way can then be a prelude to a reflection
on the nature of second language performance and the ways it can be supported, but
equally the ways that difficulties can be caused.
Table 2. Summary of influences on second language performance
Effects
Interpretation
Planning
Familiarity
Associated with greater

lexical sophistication.
Little effect on other
performance areas.
More familiar topics enable more specialist

vocabulary to be accessed.
Conventional
pre-task planning
Greater structural
complexity and fluency.
Few effects on accuracy.
Ideas make the transition from planning

to performance most generally and most
dependably. Dangers of excessive ambition.
Supported on-line
planning
Greater structural
complexity and greater
accuracy.
Good Conceptualiser engagement, and

then good Formulator conditions for use of
rehearsal and retrieval from planning.
Repetition
Strong effects on
complexity, accuracy, and
fluency.
First performance (a) enables ideas and

language to be made more salient, and (b)
triggers deep lemma activation which is still
available for subsequent performance.
Structure
Raises accuracy and

complexity, especially
under There-and-then
condition.
Clarifies what Conceptualiser needs to do

Releases attention for Formulator
Enables restarts after serial processing.
Processing
Accuracy and complexity

benefit when processing
pressure is reduced,
through There-and-then
tasks.
Less input pressure enables more focus on

form, as does the opportunity to choose how
to narrate the story, with freedom from input
dependence.
(Continued)
243
244 Peter Skehan
Table 2. Summary of influences on second language performance (Continued)

Effects
Interpretation
Selective Attention
Supported on-line
planning
Raises accuracy and

complexity.
Less pressured conditions are vital because

the planning has prepared the ground, and
the retrieval conditions exploit this planning.
Post-task conditions
Raise accuracy, and also

lexical and structural
complexity, the latter
especially with an
interactive task.
Pedagogic norms are emphasised through

anticipation of the post-task, and attention,
even though under some pressure, is
directed towards form, especially accuracy.
Skehan (2009a), revised in Skehan, Bei, Li and Wang (2012), attempts to characterise influences on second language spoken performance through two interlocking
systems. First, influences are organised according to Levelts stages of speech production (Conceptualisation, Formulation-lexical, Formulation-morphosyntactic, Articulation), and second, task and task condition influences are grouped as leading (a) to
complexification, (b) to pressured performance, (c) to eased performance, and (d) to
focussed/monitored performance. This same approach underlies Table 2. The table
attempts to organise what we have learned about influences on task performance,
largely following a Tradeoff account. The discussion which follows recapitulates the
evidence and argument from several publications (Skehan 2009a, 2009b; Skehan et al.
2012), but emphasises what is new, especially what derives from the chapters in the
present volume.
It is helpful to restate the basics of the Levelt model. Conceptualisation is concerned with the ideas to be expressed, and delivers what is termed the pre-verbal
message. This is the starting point for Formulator operations, involving retrieval
from the (second language) mental lexicon, the building of morphosyntax, and the
preparation of phonological representations. Finally, the Articulator takes the output of the Formulator and produces actual phonological realisations to capture what
the speaker is trying to say. Each component in the model is meant to function in
modular fashion, so that all three components are working simultaneously, but on
different things. Importantly, of course, this process is ongoing, as speaking continues, so that Conceptualisation continues to drive forward Formulator and Articulator operations.
A major aspect of Conceptualiser operations, especially for the second language
case, is that it has both a general and also an ongoing-specific aspect. If, for example,
there is planning, the speaker effectively tries to load the Conceptualiser with material which will continue to have an impact later in performance. The Conceptualiser,
in other words, has a slow burn impact on communication that is not immediate.
On other occasions, and this is obviously the norm, cycles of operation mean that
Conceptualiser operations at Time 1 are passed on to the Formulator at Time 2 (and
to the Articulator at Time 3) while new material will be occupying the Conceptualiser at Time 2, and so on. In this case, the ideal scenario is that the pre-verbal messages demands of the Conceptualiser are met by a rich mental lexicon, and speaking
modules proceed in parallel. As we have seen, this is often not the case with second
language speakers, especially those below advanced levels of proficiency. For them,
speaking is often a process of rescue, as the ideas they would like to express have to be
modified or expressed much more slowly.
However, there is a third manner in which the Conceptualiser can have an impact
on performance which is mid-way between the two just outlined. This occurs when
the Conceptualiser, through planning or through quick-thinking, is able to exploit
macrostructure in what is being said. Conceptualiser operations are partly concerned
with retaining the general structure, and the speakers place in it, and partly with the
ongoing detail of current speech. In this case, there is considerable potential for Conceptualiser operations to span several time periods if we are thinking about the macro
planning role that it may discharge.
Against this background, we can discuss the findings in this volume as they impact
upon the second language speakers balance of parallel and serial speech performance.
It would seem that the following influences promote a parallel mode of functioning:
doing tasks which are structured, with this impacting on Formulator operations,
as there can be focus on what is being said at a particular moment because the
speaker does not have to wrestle with wider organisational issues
doing structured tasks in which the speaker is pushed down to serial processing,
but where the task structure enables parallel functioning to be regained, since a
fresh starting point can be identified through the clarity of task structure
preparedness, which can promote parallel processing variously
{{
through retrieval and rehearsal operations which are recalled during the actual
performance, and which then ease Conceptualiser and Formulator operations
{{
through supported online planning, which combines effective use of planning
with unpressured retrieval conditions while speaking
{{
familiarity, through greater access to relevant lemmas and the information
that they contain
{{
immediate repetition, which activates all aspects of performance, and which
specifically triggers lemma access to the greatest extent possible, thereby
advantaging Conceptualiser, Formulator, and Articulator
unpressured performance conditions, where there is not a constant (and possibly
rapid) flow of new input (included here would be There-and-then time perspective tasks)
245
246 Peter Skehan
mediation in some way which introduces some level of organisation to what is

being said
the opportunity to negotiate what is going to be said, rather than being forced to
express whatever the input contains
Conversely, of course, there are conditions which make it more likely that serial processing will be engaged in. At the risk of repetition, with points which simply reverse
those covered above:
ineffective preparedness
{{
where the speaker has been overly ambitious in the planning which is
done, with the result that the speaker tries to take on language which is too
demanding
{{
where the speaker has tended to focus on material (e.g. specific form, which
tends to fade and then not be usable in actual performance)
unstructured tasks, in which there is little clear overall structure that the speaker
can use for guidance, or as the basis for retrieving parallel processing after it has
become unsustainable
heavier processing pressure, such as quantity or speed of input, typical, in fact, of
Here-and-now conditions
This long section has surveyed the findings from the different research studies in the
book, and shown that portraying performance in terms of complexity, accuracy, lexis,
and fluency is still viable and useful. It has also brought out what progress has been
made through the different chapters. Finally it has attempted to relate these findings to
the way second language speakers are supported or frustrated in achieving the sort of
parallel processing that is the norm for first language speakers. We turn next though
to application, and how the sort of research which has been described might have an
impact within the classroom.
Pedagogy
There are two parts to the Pedagogy section. The first explores issues operative at the
within-lesson level and tries to relate the research-based discussions from the rest
of the book to decisions which have to be made in this context. The second section
outlines some wider principles for the use of tasks over more extended pedagogic
sequences. However, note that it is not the intent of this chapter or book to provide a
detailed discussion of issues of task sequencing (Robinson 2007) or of how a series of
tasks can be linked, within or across lessons, as in a scheme of work (Van Den Branden
2006), or of how project work could be a wider framework within which tasks could
operate (Skehan 1998, 2013). The focus here is on how the kinds of tasks utilized in the
research reported in this volume, and the findings from that research, might provide
one basis for informing task-based pedagogy, though certainly not the only basis.
Tasks within lessons

The emphasis in this chapter so far has very much been on performance, on the task
and variables which influence it, on the dimensions which capture it, and on the theories which might account for it. What then, can be said about pedagogy? This is indeed
a difficult question, since considerable extrapolation is required from these earlier sections. But it is argued here that we have a responsibility to try to tease out what relevance such research has, even if this only means to understand its limitations. So this
final section of the final chapter will confront the pedagogy problem, and try to be
even-handed in its approach. It will offer an account of pedagogy where task-based
approaches have a major contribution to make. It will also argue that there are areas of
pedagogy where a task-oriented approach has less to say. But it will, wherever possible,
try to ground any claims that are made on research evidence, and try to make links
with what we currently know about second language acquisition.
To that end, we can have in mind a view of pedagogy in terms of the following
stages:
Pre-task activities, where these might be planning or a range of other activities
which prepare learners to do tasks more effectively.
During-task activities, where this can mean choosing particular tasks (on the
basis of criteria of some sort) and then implementing such tasks under different
choices; a variation here (Willis 1996) would be to re-do a task, within the task
cycle, in such a way that there is opportunity for reflection or even teacher input
before the task is re-done.
Post-task activities, where these can be brief, as when, for example, a post-activity
such as transcription is involved, or more extensive, when some focussed work may
be done which develops something which has come up within task performance.
One additional feature of a task based approach is important. There are assumptions
here about the role of the teacher and the pedagogic activities s/he orchestrates. It
is assumed, for example, that the teacher is able, at the pre-task stage, to devise and
organise activities that are relevant to task completion other than explicit presentation and teaching. In addition, at the during task stage, it is assumed that the teacher
will not be intrusive, but will nonetheless be very alert and in some way paying attention to the language which is used, and even possibly findings ways of recording it
without interfering in the way tasks are completed. Finally, at the post task stage, it
is assumed the teacher will have some knowledge of what has happened while the
247
248 Peter Skehan
task was taking place and can draw on this knowledge effectively when any focussed
language work is carried out. The discussion which follows will presuppose such a
(fairly obvious) set of teaching possibilities and teacher behaviour and will link different stages to ways in which tasks can be used more effectively. This is obviously quite
restricted in treatment and a good as well as broader account of task-based teaching
can be found in Norris (2009).
The framework for this discussion is a series of stages which can be proposed for
second language acquisition (Skehan 2002). The stages are intended to capture how
new language develops, and then how progressively greater control is achieved over
that language. The sequence implied is meant to apply to any particular element in
an emerging interlanguage system, but it is assumed that different elements of the
language being learned will be at different points on this sequence. The sections which
follow will clarify each of these stages. They are:
noticing
hypothesising
complexifying/extending
restructuring/integrating
repertoire creation, availability, accessibility
achieving supported control, avoiding error
automatizing
lexicalising
We can discuss each of these in turn. The importance of noticing has been recognised
through the work of Schmidt (1990), particularly for noticing in input, and Swain
(1985, 1995) for noticing in ones own output. Schmidt (1990) argued that noticing is
a necessary precursor to subsequent acquisition that which will be acquired has first
to be noticed. He emphasised the way input may lead to noticing, but that if conditions can be created where noticing is more likely to occur, then there are greater possibilities for intake (Corder 1981) and processing. Swain, in contrast, was concerned
with the idea of noticing the gap where, through communication, a speaker becomes
aware of a deficiency, and only then may do something to address this deficiency.
Clearly, for each of these possibilities, noticing is only a starting point, but it is a very
important starting point. Two additional points are worth bringing out in that regard.
First, to repeat: noticing is a necessary but not a sufficient condition. More needs to
happen, particularly in developing and consolidating what has been noticed (and see
below for more discussion of this). Second, noticing meshes rather neatly with notions
of developmental readiness (Pienemann 2003). If one assumes, following much second language acquisition research, that there are sequences of development, then it is
important not simply to notice, but also to notice the right thing, as it were, in terms
of development. In this way, the learner is more likely to be able to make progress with
what has been noticed, as opposed to something coming into awareness which then
leaves awareness just as quickly.
A benign view of second language acquisition would be that interaction contains
all that is necessary for development to proceed. I am assuming that this is not the
case a cornerstone of the proposals being made here is that something more needs
to happen. In that light, what is important is that noticing can be built on, and nurtured (Skehan 2013). The noticing has to come from the learner, but the teacher can
attempt to trigger and/or elicit and certainly respond to such noticings, and return
to them, with the broad aims of developing and consolidating them. In the stages
indicated above, noticing could easily occur at the pre-task stage, for example, when
planning is taking place or when pre-task input, such as text, is being provided. This
perhaps would be more likely to be a noticing-the-hole in projected output for the
task-to-come. It could also occur at the actual task stage, as input is received from
an interlocutor, input perhaps which is particularly salient because it is important in
task fulfillment, and so the form-function mapping of particular input will be clearer.
In any case, at either of these stages, pre-task or task, noticing could easily occur. We
return then to the role of the teacher. It is important that the noticing does not occur
and then disappear teacher activity can be good at reminding learners about their
own insights, and then working with these insights.
Similar considerations apply with hypothesising. Once again, it is entirely likely
that hypothesising will take place at the pre-task or task stages. Learners may want
to say something (or they hear something) and realise that this prompts reflection
on interlanguage structure. Once again the motive is the language made salient by
the need to do the task. For example, they may realise that they can extrapolate from
some particular item of language because they see how it may be connected to a wider
rule. Context, as with noticing, is the key, since the language is related to what they
are trying to achieve in doing the task, but hypothesising is potentially more powerful than noticing. It indicates a greater breadth and depth regarding the target language. And the key here is even more clearly the post-task stage. Of course, during
task preparation the learner may formulate a hypothesis and do so very well, so that
little more needs to be said, but it is more likely that there is a tentative or even unnecessarily circumscribed nature to the hypothesis. For these reasons, teacher-focus on
this hypothesis afterwards is crucial. Given that the language involved has been made
salient, and given that there is every chance of readiness, since it was the learner who
formulated the hypothesis, the moment is ripe for teacher follow-up. In other words,
in these circumstances, where the language has been made salient by the learner doing
a task, it may now be appropriate for the teacher to be explicit (where being explicit
earlier risked flirting with the irrelevant). The teacher in other words can now reinforce the hypothesis, extend it, link it with other parts of language, or even correct a
250 Peter Skehan
mistaken hypothesis. Naturally the teacher will have to be judicious in judgments that
are made one learners hypothesis risks being another learners boredom or confusion. But assuming the teacher can make good judgements here, the post-task stage,
once again, is vital for ensuring that a good insight, a perceptive hypothesis, is not
abandoned but built upon.
So far, with noticing and hypothesising, the use of tasks has been essentially as a
vehicle so that certain processes occur, and then these processes are exploited at the
post-task stage. At this stage, the teacher can actually be a teacher! In fact, we continue
this pattern (although it will change soon!) with the next couple of stages. Learning a
language is a complex undertaking. Languages are complex systems and sub-systems
and so making inroads into such systems is not easy. Noticing and hypothesising are
good, but only go so far. They are also likely to be limited in scope. If what has been
noticed or hypothesised about is small in scope and self-contained, then perhaps little
more is involved in real development, but often (think of the development of tense
systems, or modality) what is noticed fits into a larger whole. So while the noticing is
essential, what is even more important is that the outcome of that noticing is extended
and connected with other parts of the developing interlanguage system. In other
words, following noticing, there may be a need to complexify, and to see that what is
new bears a relationship to other aspects of the language being learned.
This analysis, though, does not cover all types of development. Sometimes progress means understanding that previous understanding was partial, and that a larger
system is involved, which pushes the learner to restructure and reorganise. The past
tense in English would be a good example, where at some point the coexistence of
regular and irregular past has to be organised into a more complex system than the
separate item-based or rule-based systems that were previously dominant. In other
words, there is a need to take two steps back to go three steps forward. In such
cases restructuring a developing system, or integrating what was regarded as a self-
contained and independent system into some other larger system may be a fairer way
to capture what is going on in development. Once again, the pre-task and task phases
may provide the insights which are important, particularly if we still think in terms of
readiness. But it is likely to be the post-task phase which is most effective. Then language which has been made salient during earlier phases (provided that some record
is available of that language) can come into focus and enable the teacher to help learners deal with what was uncertain and only partially understood. Once again one has
to emphasise the importance of this being the agenda announced by the learner so
that the teacher, in helping restructuring and integration to occur, is counterpunching to the input that is relevant to the learner. The importance of the post-task phase
cannot be overstated for these developments to occur. Equally, the central way in
which learners can recall the insights that emerged in earlier preparation or communication is vital. They are what drive the usefulness of teacher contributions at the
post-task stage. This is where the teacher can contribute expertise about language,
and consolidate and clarify so that learners have some confidence in the learning that
has taken place (Willis & W
illis 2007).
To this point the reader may be thinking that this is an odd presentation of what
happens in task-based approaches to instruction. The focus has been on using tasks,
certainly, and preparing for these tasks, but then the real action seems to come at the
post-task stage. The tasks have been important as vehicles to enable useful language to
emerge, but then the significant work is deferred until later. The reason for this is the
problem of new language, sometimes erroneously regarded as a deficiency in a taskbased approach (Swan 2005). What the previous discussion brings out is that there are
ways in which such new language can be brought into focus, something which it seems
has to be made clear for critics of a task-based approach. The other side of this is the
vital importance that the methodology outlined above has the central feature that what
is announced as the focus for such language work is material that makes sense to the
learner and which the learner is ready for. The problem in instruction (task-based or
otherwise) is not new language in itself, but which new language. The view taken here
is that it is important that it is the learner, not the syllabus designer or materials writer,
who is influencing what will be done. It is considered that this is crucial, and consistent with contemporary second language acquisition. That the treatment comes a little
later than the need was announced is not the issue (although, of course, the teacher
may also respond to learners mid-task, if that is appropriate, and not disruptive or too
extensive). The central point is that the language concerned makes sense given the
learners current stage of development (Skehan 2007).
Now we can move on, and in so doing, bring conventional approaches to task
performance more into focus. What we have done so far is look at the language which
emerges from completing tasks, and how that language can be made less transitory
and instead contribute to an evolving and complexifying interlanguage system. However, now we are assuming that some new language has been noticed, hypotheses have
been formulated, and complexification, extension, and integration have taken place,
wherever they are appropriate. In other words, the learner is clearly aware about features of the target language which he or she was not before.
Essentially, the next stage consists of exploring how a degree of control can be
achieved with this new language. How, that is, can language which has been wrestled with, possibly laboriously, be converted into language which can be readily used,
appropriately, and with reasonable speed and lack of error? The first stage in acquiring
greater control is to be able to use such new language under supportive conditions
(and possibly not totally accurately at first). In other words, it is assumed that when
some new language is apprehended and linked to previous interlanguage, there is no
magic way in which this language is suddenly available for correct use in real time in
a range of situations. It has to be nurtured and control developed gradually. It is here
251
252 Peter Skehan
that much of the previous discussion on tasks is relevant. We have seen (and this is
captured in Table 2) that a whole range of influences enhance levels of accuracy and
fluency. In effect, these are the conditions which are needed to make the development
of greater control a reality. So, for example:
opportunity for supported on-line planning

familiar information
opportunity to repeat tasks
post-task activity, especially with revision
are all very important here. By choosing tasks which maximise accuracy and fluency,
and task conditions likewise, the learner is being supported to achieve greater levels of
control. This can assist first stages in making the transition from halting speech production to the capacity to use language in real-time. In effect we are dealing here, more
generally, with either creating tasks and conditions for more attentional resources to
be available for the speaker, so that greater accuracy, for example, can be achieved; or
for tasks and conditions to push learners to higher performance in particular areas,
again highlighting accuracy; or for a situation where attention is nudged in particular
directions. So knowledge of research into second language production is relevant to
helping learners to follow desirable pedagogic directions.
An important point needs restating here. It is clear that the use of tasks in this way
draws on the importance of implicit learning and even of practice, since the learner
is being given (supportive) opportunities to gain control over language. We are now
beyond the stages where new language has been noticed, complexified, integrated, and
instead are concerned with making this newly-acquired knowledge implicit. A proponent of a presentation-practice-production (3Ps) approach might then claim that
this view of tasks is no different from the role ascribed to them by task critics such as
Swan (2005) or Bruton (2002), or even task sympathisers such as Littlewood (2004).
The key point, though, is the issue of what language is being used, and that in turn
connects with the issue of pre-selection. A 3Ps approach is characterised by selections
being made by the teacher/materials writer, and then the presentation phase is a phase
working on something selected by someone other than the learner. A task approach
is one where the language which was earlier selected for treatment was selected by
the learner or emerged from learner performance. The use of task-informed criteria
here to promote control is consistent with this view that the selection comes from the
learner. There is no pre-selection of forms for task performance that is the learners
choice. The decisions linked to task selection and task implementation are intended
to create conditions to support control of the language which is chosen by the learner.
In fact, developing this point, we can return to the usefulness of the post-task
stage. Where the language that is being focussed on has emerged from the learner,
there is no reason, if a teacher deems this appropriate, for the post-task stage not to
include practice activities. So far we have regarded this stage as one where noticing,
hypothesising, complexifying, and restructuring generate a fairly cognitive view of
language itself, of its patterns and of the emerging rule-governed system, but developing language skill, as we all know, involved more than insight and understanding.
It also involves performance, and if a teacher decides that some aspect of language
is emerging, but could benefit from more traditional practice activities, then there is
no reason not to use them at this later stage in a teaching sequence. The point, a bit
laboured by now, is only that what is being practised is a response to learner need, not
syllabus prescription.
Continuing this analysis of tasks in terms of the development of control, the next
two stages are essentially extensions of what we have already seen. Automatising, following Anderson (2004), consists of speeding up performance while eliminating error.
Supported control, the previous stage, is likely to be characterised by slow performance, and the intrusion of errors. Automatisation does not really involve much that
is different but it does lead to a greater degree of confidence in performance, and even
robustness in face of contextual difficulties. With tasks, a similar analysis operates. A
range of influences are relevant in creating the conditions in which automatisation is
more likely to occur, and in some ways, these are an extension of what was mentioned
with supported control. But another factor which is important is, in a sense, the reverse
argument. Choosing tasks and task conditions to nurture automatisation is one thing,
but it does not serve learners if they only develop the capacity to use language in supportive conditions. So another goal, where automatisation is concerned, is to choose
tasks and task conditions to put pressure on learners so that they feel more comfortable functioning in the wider range of circumstances they will encounter in the real
world. So automatisation here is a speed factor, but also a generalisation challenge,
which can only be attempted when some degree of automatisation has been achieved.
The final stage which is given here is lexicalisation. This stage is proposed (Skehan
1998) to reflect a dual-mode system in which on the one hand we can use rule-based
language produced quickly as highly automatised, or on the other, the products of such
rule-based language can be lexicalised and then produced as exemplars or chunks.
Such a mode enables not simply speed of processing but also the advantage that there
are not many computational demands, so that attention is, to some extent, is freed up
while performance is ongoing (Skehan 2013).
Clearly using tasks is compatible with fostering fluency on the basis of lexicalisation. Using tasks which support Formulator operations, for example, will help in any
process of lexicalisation that might occur. But in all truth, it is likely that it is asking too much of tasks to expect them to contribute significantly to any such process.
Of course they will do much more than many other teaching methodologies, but the
amount of communication that is necessary for any process of lexicalisation to occur is
probably too great for any task-based approach to deliver. There simply is not enough
253
254 Peter Skehan
time for things to develop in this way. Lexicalisation is a desirable goal, but one that
probably can only be achieved by long and extensive exposure to the target language
in question. Task-based syllabuses developed to cover several years may achieve this,
although such a claim is currently speculative. Shorter-term task use would struggle to
achieve wide-ranging lexicalization.
The sequence which has just been described starts with the new, with something
that is perhaps not understood completely and is used haltingly and sometimes incorrectly, to a point where the formerly new language is now well-integrated into a developing system and can be used, in real-time fluently and correctly, and even without
undue processing effort. But there is another aspect of the sequence which we have
temporarily left out and now needs further consideration. This is the issue of saliency
or repertoire creation, what in French can be referred to as disponibilit. The previous
discussion has assumed there is some aspect of the language system and its choice
for use is self-evident if something is known, it will be used. But such an approach
misses a very important point about language learning one may know a great deal
of language that one does not use. So in addition to trying to teach new things, a goal
of teaching has to be to increase the access that the learner has to what is known, but
whose relevance and usefulness may not be appreciated. In other words, if learners
have found methods of solving communicative problems which are not pretty, or helpful for development, but are nonetheless effective, they may learn new aspects of the
target language, but not use them. They may plateau at a certain, unnecessary level
because they can get by using older methods of solving problems. So the teaching challenge is not simply to introduce new forms effectively, but also to get those new forms
to supplant older language or at least to become part of a communicative repertoire.
For this goal, a task-based approach is very well suited. Tasks, especially tasks
which are reflected upon afterwards, can support learners to develop such a repertoire,
and for them to see how what has been learned has communicative utility. In this case,
the range of variables which have been shown to influence performance can be related
to the promotion of accessibility. The emphasis here will be on Formulator operations,
rather than Conceptualisation, so that either attention is made more available when the
surface structure of language is being put in place, or there is a focus on accuracy to
some degree (e.g. through post-task activities, or through monitoring). These can make
it more likely that the newer forms will not be there for a rainy day, so to speak, but have
sufficient salience that they can become usable even in more difficult circumstances.
Principles for using a task-based approach

The last section has tried to clarify the pedagogic contributions that a task-based
approach can uniquely make. The section, though, was driven by the sequence of what
happens during acquisition. In addition to such an account, it is also useful to have

a set of principles for the use of tasks over a more extensive timescale, ranging over
several lessons, or even a period as long as one term. In Skehan (1998), I put forward
five principles for a task based approach. The five principles were:
1.
2.
3.
4.
5.
Choose a range of target language elements1

Choose tasks which meet Loschky and Bley Vromans (1993) utility criterion
Select and sequence tasks to achieve balanced goal development
Maximise the chances of a focus on form through attention manipulation
Use cycles of accountability
What this set of principles tried to address is the tension between two statements:
Second language acquisition demonstrates that internal factors have a strong influence
on patterns of development, such that learners do not necessarily learn what teachers
teach
vs.
Some degree of system and completeness in what is being learned is preferable
The principles tried to strike a balance between these two statements. Unrestrained
applications of a task-based approach based on relatively brief speaking tasks would
risk over-valuing the first statement at the expense of the second. Applications of traditional approaches would risk over-valuing the second at the expense of the first. So,
if one regards the first two principles above as somewhat preliminary, the third and
the fourth attempt to nudge learners towards a focus-on-form, and the fifth, final, and
very important principle suggests that teachers have an important role in monitoring
the development of their students and designing pedagogic activities which deal with
the lacunae in learning, and orient the input (e.g. through pre-task work, task selection) towards areas which have not been developed.
But there is vagueness in these principles, and I would like to modify these proposals slightly. First, I would now add to the third and fourth principles a set of subprinciples. These are:
complexifying
pressuring
easing
focussing
monitoring
. This proposal is not driven by any precept based on learners functional needs (such as Long &
Crookes 1992). It could accommodate such an approach, but in fact offers greater freedom to the
teacher or course-designer regarding the basis for task choice.
255
256 Peter Skehan
In other words, to make the general principles more accessible, I would suggest using
the sorts of outcomes captured in Table 2, based as this table is on a range of empirical results, to make more specific the sorts of things that could promote balanced goal
development, and a strong focus on form. We have learned quite a lot from research
as to how a general focus on form can be promoted. So the principles are not quite so
abstract now as they were when first proposed in 1998, and the claims follow from a
range of research results.
Second, I would also like to add a new principle, and ideally place it as a new fifth
principle (pushing the old last principle, use cycles of accountability) down to sixth
position. The new principle would be:
5. Use the post-task phase to nurture language made salient by the task, through:
explanation
extension
integration
practice and consolidation
As we have seen in earlier discussion, the post-task phase is vital as the place to capitalise on the language which has been made salient by the task. The language which
emerges in the task is the language which is relevant to learners. But the operations
on that language orchestrated by the teacher can enable the sixth principle, the use of
cycles of accountability, to function more effectively. Because it is here that the teacher
can select from the range of language made salient that particular language which it is
most propitious to work on, safe in the knowledge that it will be learner-led language.
This may be developmental language, or it may be language which needs further consolidation and practice. It could even be new language, in the sense that a task may
have created a need to mean, and then the teacher can supply that need in a focussed
manner. This is quite a challenge for the teacher. Learners differ; experiences differ.
As a result, there may be a range of candidate language elements to pick up on at the
post-task stage, not all of which can receive focus. It is reliant on the teachers professionalism and training which of these to work with, which to defer until possibly later,
and which to ignore.
We can now represent the set of principles in a more complete, comprehensive
form:
1. Choose a range of target structures
2. Choose tasks which meet Loschky and Bley Vromans utility criterion
3. Select and sequence tasks to achieve balanced goal development through
complexifying
pressuring
easing
focussing
monitoring
4. Maximise the chances of a focus on form through attention manipulation through
complexifying
pressuring
easing
focussing
monitoring
5. Use the post-task phase to nurture language made salient by the task, through:
1. explanation
2. extension
3. integration
4. practice and consolidation
6. Use cycles of accountability
So far, we have taken what could be considered to be a micro stance towards pedagogy.
Sub-principles for 3 and 4 concern relatively small-scale tasks, and the task cycle that
is envisaged here would be completed with one or two lessons (and could be broadly
similar to the methodology proposed by Willis & Willis 2007). But teaching extends
over more than just a short time span, obviously, and so, if pedagogic planning is to be
effective, it needs to have means of organising these longer stretches of time.
Project work is one such method of linking a series of tasks in ways that retains
the focus on meaning that tasks provide, but at the same time is susceptible to longer stretches of planned teaching. Projects, and series of projects, can be designed to
occupy long stretches of teaching. But if that is done, the post-task work which has
been described so far needs to be conceptualised slightly differently, since it is here
that the sixth principle becomes important. Micro post-task work takes what has
emerged from a task or group of tasks, and responds to the needs and opportunities
which emerge (Principle 5). In a sense, the teachers decision is to examine what is
available, what has become salient through the task, and from the range of possibilities,
choose those which would sensibly be worked on. If, though, one has more extensive
task based performances to work with, which extend over time, then there is the need
to keep records, explore what has been achieved over longer timespans by particular
learners, and make decisions accordingly, decisions which can be collaboratively negotiated and made with students. In other words, the notion of using cycles of accountability, where responsibility is shared between learners and teachers, becomes more
important. It can become clearer, with reflective post-task work of this sort, where there
are still gaps and what needs to be focussed on in the future. The broad parameters of
learners not necessarily learning what teachers teach still apply, but the reflection can
257
258 Peter Skehan
give insights as to what tasks might best be chosen and how they might be exploited. In
this way, the need to ensure some degree of systematicity is enhanced very considerably,
and we have a bridge between micro and macro perspectives on tasks.
References
Anderson, J.R. (2004). Cognitive psychology and its implications (6th ed.). New York, NY: Worth.
Bruton, A. (2002). From tasking purposes to purposing tasks. English Language Teaching Journal,
56, 280288.
Bygate, M. (2001). Effects of task repetition on the structure and control of oral language. In M. Bygate,
P. Skehan, & M. Swain (Eds.), Researching pedagogic tasks (pp. 2348). London: Longman.
Bygate, M. (2006). Areas of research that influence L2 speaking instruction. In E. Uso-Juan &
A. Martinez-Flor (Eds.), Current trends in the development and teaching of the four language
skills (pp. 159186). Berlin: Mouton de Gruyter.
Bygate M., & Samuda V. (2009). Creating pressure in task pedagogy: The joint roles of field, purpose,
and engagement within the interactional approach. In A. Mackey & C. Polio (Eds.), Multiple
perspectives on interaction (pp. 90116). New York, NY: Routledge.
Bygate, M., Skehan, P., & Swain, M. (Eds.) (2001). Researching pedagogic tasks. London: Longman.
Corder, S. Pit (1981). Error analysis and interlanguage. Oxford: OUP.
11, 367383.
Dornyei, Z. (2005). The psychology of the language learner: Individual differences in second language
acquisition. Mahwah, NJ: Lawrence Erlbaum Associates.
Hoey, M. (1983). On the surface of discourse. London: George Allen and Unwin.
Kintsch, W. (1994). The psychology of discourse processing. In M.A. Gernsbacher (Ed.), Handbook
of psycholinguistics (pp. 721740). San Diego CA: Academic Press.
Kormos, J. (1999). Monitoring and self-repair. Language Learning, 49, 303342.
Erlbaum Associates.
Levelt, W.J. (1999). Language production: A blueprint of the speaker. In C. Brown & P. Hagoort
(Eds.), Neurocognition of language (pp. 83122). Oxford: OUP.
Li, Q. (2010). Focus on form in task based language teaching: exploring the effects of post-task activities
and task practice on learners oral performance. Unpublished Ph.D. thesis. Chinese University
of Hong Kong.
Littlewood, W. (2004). The task-based approach: Some problems and suggestions. English Language
Teaching Journal, 58(4), 319326.
Long, M.H., & Crookes, G. (1992). Three approaches to task-based syllabus design. TESOL Quarterly, 26, 2755.
Loschky, L., & Bley-Vroman, R. (1993). Grammar and task-based methodology. In G. Crookes &
S.Gass (Eds.), Tasks and language learning: Integrating theory and practice. Clevedon: Multilingual Matters.
McDonough, K., & Trofimovich, P. (2009). Using priming methods in second language research.
London: Routledge.
Norris, J. (2009). Task-based teaching and testing. In M.H.Long & C.Doughty (Eds.), Handbook of
language teaching (pp. 578594). Oxford: Blackwell.
O'Malley, J.M. & Chamot, A.U. (1990). Learning strategies in second language acquisition. Cambridge:
Cambridge University Press.
John Benjamins.
Pienemann, M. (2003). Language processing capacity. In C. Doughty & M. H. Long (Eds.), The handbook of second language acquisition (pp. 679714). Oxford: Blackwell.
Pinter, A. (2005). Task repetition with a 10-year-old. In C. Edwards & J. Willis (Eds.), Teachers exploring tasks in English language teaching (pp. 113126). Basingstoke: Palgrave Macmillan.
Robinson, P. (2001). Task complexity, cognitive resources, and syllabus design: A triadic framework
Robinson, P. (2011). Second language task complexity, the Cognition Hypothesis, language learning, and performance. In P. Robinson P. (Ed.), Second language task complexity: Researching
the Cognition Hypothesis of language learning and performance (pp. 338). Amsterdam: John
Benjamins.
Robinson, P., & Gilabert, R. (2007). Task complexity, the Cognition Hypothesis, and second language
learning and performance. International Review of Applied Linguistics, 45, 161176.
Schmidt, R. (1990). The role of consciousness in second language learning. Applied Linguistics, 11,
1746.
Skehan, P. (1986). Cluster analysis and the identification of learner types. In V. Cook (Ed.), Experimental approaches to second language acquisition (pp. 8194). Oxford: Pergamon.
Skehan, P. (1989). Individual differences in second language learning. London: Edward Arnold.
Skehan, P. (2001). Tasks and language performance. In M. Bygate, P. Skehan, & Swain M. (Eds.),
London: Longman.
Skehan, P. (2007). Task research and language teaching: Reciprocal relationships. In S. Fotos &
H.Nassaji (Eds.), Form-focused instruction and teacher education: Studies in honour of Rod Ellis
(pp. 5569). Oxford: OUP.
Skehan, P. (2013). Nurturing noticing. In J. Bergleitner & S. Fotos (Eds.), Festschrift in honour of
Richard Schmidt. Honolulu, HI: National Foreign Language Resource Center.
Skehan, P., Bei, X., Li, Q., & Wang Z. (2012). The task is not enough: Processing approaches to taskbased performance. Language Teaching Research, 16(2), 170187.
260 Peter Skehan

Skehan, P.,& Foster, P. (1997). Task type and task processing conditions as influences on foreign
Swain, M. (1985). Communicative competence: Some roles of comprehensible input and comprehensible output in its development. In S. Gass & C. Madden (Eds.), Input in second language
acquisition. Rowley, MA: Newbury House.
Swain, M. (1995). Three functions of output in second language learning. In G. Cook & B. Seidlhofer
(Eds.), Principle and practice in applied linguistics (pp. 245256). Oxford: OUP.
Swan, M. (2005). Legislating by hypothesis: The case of task-based instruction. Applied Linguistics,
26, 376401.
Tannen, D. (1989). Talking voices: Repetition, dialogue, and imagery in conversational discourse.
Cambridge: CUP.
R.Ellis (Ed.), Planning and task performance in a second language (pp. 239276). Amsterdam:
John Benjamins.
Wang, Z. (2009). Modelling speech production and performance: Evidence from five types of planning and two task structures. Unpublished Ph.D. thesis. Chinese University of Hong Kong.
Willis, J. (1996). A framework for task-based learning. London: Longman.
Willis, D., & Willis, J. (2007). Doing task-based teaching. Oxford: OUP.
Winter, E. (1976). Fundamentals of information structure: A pilot manual for further development
according to student need. Hatfield, Herts: The Hatfield Polytechnic Linguistics Group, School
of Humanities.
Author Biodata
BUI Hiu Yuet, Gavin obtained his Ph.D. in applied linguistics from The Chinese
University of Hong Kong. Currently he is Assistant Professor at the English Department of Hang Seng Management College in Hong Kong where he teaches linguistics
and applied linguistics courses with some occasional addition of EAP/ESP classes.
Dr. Buis research interests include task-based language teaching, psycholinguistics,
and second language acquisition.
LI Qian, Christina, obtained her Ph.D. in applied linguistics from the Chinese
University of Hong Kong. Currently, she is an assistant research professor in English
at Guangdong University of Foreign Studies, China. Her interests include task-based
language teaching and research, the acquisition of formulaic sequences by L2 speakers
and bilingual lexicography. Her most recent articles appeared on Language Teaching
Research (2012) (coauthored with Skehan, Bei, Wang) and Foreign Language Teaching
and Research (2013).
PANG Soi Meng, Francine, obtained her Ph.D. in applied linguistics from the Chinese
University of Hong Kong. She is currently Associate Professor at Macao Polytechnic
Institute, and before that she was Assistant Professor and Postdoctoral Fellow at the
Chinese University of Hong Kong and the University of Macao. Dr. Pang has lectured
in applied linguistics, psycholinguistics and Business English. Dr. Pangs research
interests include second language acquisition, second language reading, and second
language task planning behaviour.
SHUM Sabrina obtained her MA in applied linguistics at South China University
of Technology, Guangzhou, China. She worked as a research assistant for Professor
Skehan in a research project which is the basis for one of the chapters in this volume.
She is currently an Assistant Lecturer in Cantonese at the Yale-China Chinese Language Centre, The Chinese University of Hong Kong. Her current research interests
include Chinese grammar and teaching Chinese as Foreign Language.
Peter SKEHAN is a Professorial Research Fellow at St. Marys University College,
Twickenham. He received his Ph.D. from Birkbeck College, University of London. His
major interests are in second language acquisition, especially task-based performance,
and language aptitude. He supervised the Ph.Ds of contributors to this book, as well
as directed the Hong Kong RGC research projects that are the basis for three of the
chapters. He has also been a Visiting Professor at the University of Auckland.
262 Investigating a Processing Perspective on Task Performance
WANG Zhan (Jan) is a postdoctoral researcher in the Learning Research and Development Center (LRDC), University of Pittsburgh. She works on projects related to
fostering second language fluency and first language reading development, funded by
the NSF at the Pittsburgh Science of Learning Center (PSLC). She received her Ph.D.
in Applied Linguistics from the Chinese University of Hong Kong.
Index
A
accuracy ix, 24, 7, 10, 1218,
3437, 4142, 4453, 6365, 68,
7172, 7680, 8290, 99100,
102, 107111, 116121, 131133,
138, 140150, 155163, 166178,
187194, 197207, 213223,
227228, 231244, 252, 254
accuracy clause length 7677, 83
accuracy versus complexity
effects 96
analysis-oriented learners 236
anticipating post-task
transcription 238
articulation 10, 2729, 3638,
4749, 79, 221223, 229, 244
articulator 5, 2830, 34, 218,
244245
AS-unit 16, 138, 142143, 147,
168169, 198199, 221
assessment 1, 237
attention xi, 3, 10, 3435, 4853,
82, 96, 126127, 129132,
145150, 178179, 189191,
203204, 206207, 211212,
214, 221223, 228231,
236244, 252255, 257
attentional limitations 3, 13,
156157, 176
authenticity 218
automatic processing 29, 31
automatisation 21, 175, 181, 253
automatising 253
avoiding error 206, 215,
241242, 248
B
background information 156,
190
beginning-middle-end
structure 196, 227
breakdown 1920, 41, 71, 7376,
80, 83, 8587, 132, 167168, 172,
175, 192, 197, 205, 229230
breakdown fluency 19, 71,
7376, 80, 83, 8587, 167, 175,
192, 197
British National Corpus 22
C
causal structure 195
CHAT 1516, 21, 42, 101, 138,
166, 196197
CHILDES 15, 72
Chinese University xi, x, xii,
910, 211, 261262
chunks 22, 79, 8182, 161, 253
CLAN 15, 21, 4142, 138,
166167, 197
clause boundary pausing 19,
113, 118
clause-end pauses 71, 74, 75
cloze test 135
coding scheme 11, 95, 9799,
102104, 116, 125, 215216
Cognition Hypothesis ix, xii,
34, 79, 13, 68, 96, 120121,
155161, 163, 174177, 191,
231232, 234236, 241
Cognition-Tradeoff debate 212,
231
cognitive comparison 31, 145, 148
Cohens d 42, 7277, 83, 86,
141144
collaborative dialogue 130
collaborative transcribing
133134
communicative language
teaching (CLT) 1, 32, 130
complexity ix, x, 27, 910,
1216, 3237, 4142, 4450,
6365, 68, 7172, 7778, 8088,
90, 96, 102103, 111, 116121,
131132, 140144, 146150,
155163, 166179, 181, 189194,
197199, 201208, 211223,
227228, 230232, 234244, 246
complexifying 9, 14, 157, 178,
208, 248, 251, 253, 255257
conceptualisation 10, 99, 107,
157, 179180, 189, 213, 221, 223,
228, 244, 254
Conceptualiser 5, 79, 95,
107, 175176, 178179, 190,
206208, 215, 217, 220222,
224, 226, 229230, 242245
conjoint influence 13, 235
content-based instruction 89
content familiarity 212, 237
control ix, 23, 3335, 3740,
101, 147, 181, 202, 248, 251253
controlled processing 49
critical period 32
cycles of accountability 255257
D
declarative memory 29, 32
default view of attention 238
disponibilit 254
dual-mode system 253
easing 1314, 35, 52, 157, 178180,
190, 208, 235, 237, 255, 257
effect size 4243, 45, 7374,
7677, 139, 142143, 201
emerging rule-governed
system 253
encapsulated 221
encoding specificity
principle 79
end-of-clause pausing 21, 175,
187, 201, 227, 233
error correction 148
error free clauses 41, 46,
107108, 110, 142, 166, 168, 171,
192, 198, 201
error gravity 18, 193, 197
errors per 100 words 1718,
7677, 138, 141, 192193
exemplars 253
exemplar-based system 79
extended pedagogic
sequences 246
F
factor analysis 167, 169
false starts 20, 71, 73, 76,
168170, 172176
familiarity 57, 14, 51, 63, 6571,
7390, 104, 212, 214, 217218,
222, 237, 243, 245
feedback 2, 7, 14, 40, 130, 132,
137, 147148
filled pauses 1921, 71, 197
flow 2, 18, 20, 28, 160, 197, 204,
224, 229, 232, 235
264 Index
flow in performance 224
fluency ix, 34, 10, 1821, 27,
3237, 41, 4450, 5253, 6365,
68, 7176, 7881, 8390, 96,
102, 107109, 111112, 114,
117121, 131132, 156162,
166169, 173176, 178, 192,
194, 197199, 201202, 204,
206207, 213219, 222223,
227228, 233, 235236, 243,
252253, 262
focus-on-form ix, 3, 8, 51, 146,
150151, 238, 255
focussing 14, 157, 178, 208, 242,
255, 257
foreground information 189
formality 82
form-focussed instruction
Formulator 5, 10, 2830, 34,
3738, 7879, 81, 107, 159, 161,
175176, 178, 180, 189, 192,
203204, 207208, 214215,
217, 220, 222, 226, 228230,
242245, 253254
formulation 10, 2938, 4553, 65,
7880, 189, 221223, 230, 244
fragility of the accuracy
effects 219
G
guided planning 51
guided planners 51
H
here-and-now 89, 13, 40, 157,
159163, 165166, 169170,
173181, 184, 190191, 226, 228,
232, 235, 246
Hong Kong Research Grants
Council 9
hypothesising 248250, 253
I
implicit learning 252
inauthenticity 218
incomplete lemma access 218
individual transcription 134, 240
information integration 157,
181, 236
information organisation
189190
inner speech plan 28, 52
input domination
input-handling 232
instructed L2 learning
intake 248
interactional processes 1
interactive peer revision 134
integrating 248, 250
integration 87, 156157, 181, 236,
250251, 256257
interlanguage structure 249
interaction x, 12, 75, 78, 8384,
89, 129, 144, 146148, 155,
172176, 181, 205, 242, 249
intermediate difficulty 237
L
L2 mental lexicon 31
L2 proficiency 31, 78, 84
Lambda 22, 46, 102, 107108, 115,
138, 160161, 165, 168169, 171
lemma retrieval 178, 180, 189
length accuracy 18, 193, 200201
length of run 20, 7375, 79,
107109, 132, 175
less frequent vocabulary 165,
171172
levels of proficiency 16, 107,
225, 245
Levelt model 56, 910, 14, 48,
99, 107, 116, 157, 178, 180, 187,
208, 215, 220, 223, 229, 235,
241, 244
Levelt speech production model
33
lexical complexity 68, 82, 150,
214, 239
lexical density 2123, 37
lexical difficulty 13, 116, 163, 172,
174176
lexical diversity 2123, 4142,
44, 46, 138, 140142, 171
lexical and grammar planning
104105, 107, 116, 126
lexicalisation 253254
lexical planning codes 216
lexical retrieval 97, 107, 120,
176, 192
lexical sophistication 2123,
37, 46, 82, 102, 108, 115116,
119120, 142, 144, 149, 168169,
171, 173, 214, 217, 243
limited attentional capacity 7, 211
M
macro planning 103107, 125,
245
macrostructure 81, 159, 175,
178179, 189190, 192,
203204, 208, 228, 230,
234,245
meaning priority 27, 53
mediated narration 194
memory demands 3, 159160,
162, 175, 177, 191, 235
memory-oriented learners 236
mental lexicon 4, 9, 23, 28,
3132, 79, 99, 119, 157, 161, 178,
214, 217, 225, 229230, 232233,
244245
metacognitive planning 104,
106107, 117, 127
metacognitive strategies 116
metatalk 147
microplanning
mid-clause pausing 19, 21,
111112, 114115, 117, 119,
169170, 172, 174176
mid-clause pause 74
models of speaking 13, 99
modified output 148
modular 5, 10, 180, 221, 244
monitoring 2728, 46, 4853, 65,
80, 8283, 97, 99, 107, 116, 189,
207208, 215216, 228, 238,
241, 254255, 257
monitoring strategies 97
Mor 196197
multivariate analysis of variance
N
narrative 1112, 15, 36, 3839,
51, 95, 97102, 108109, 111,
135144, 149, 156157, 164165,
173, 179, 187, 189190, 194,
203, 206, 215, 225228, 230,
232235, 241242
native speakers 7, 19, 81, 108,
114, 165, 192, 197, 204
naturalistic L2 learning 129
negotiation of meaning 2, 130
nominal phrases 192
non-negotiable 234235
non-negotiability 234
non-native speakers 19, 81, 165,
192, 197, 204
Index 265
notice the gap 32
noticing 129, 132, 145146,
248250, 253
noticing the hole 146
noun phrase complexity 8284
nudged 236, 252
O
on-line planning 8, 1011, 14,
27, 33, 3537, 3940, 42, 4548,
5053, 63, 65, 7980, 87, 96,
118, 159, 176, 191, 219220, 222,
231, 238, 242244, 252
operating principles 121
opportunity to negotiate 246
organisational structure 204, 221
over-ambition 95, 111, 219, 224
overt speech plan 28, 52
P
pair-based transcription 239240
pair transcription 134
pair transcribing 137138, 140,
142144, 146147, 149150
parallel (mode of) processing
partial lemma access
pausing 1821, 73, 102, 107,
109, 111115, 117119, 168172,
174176, 187, 189, 196197, 201,
204, 207, 227, 233
pause location 192
pedagogic norms 244
pedagogic principles 14
pedagogy x, 1, 4, 1314, 27, 52,
88, 129, 131132, 149, 151, 155,
206, 208, 211, 218, 246247, 257
phonation time 20, 73, 180
phonological plans 215
pickup points 221
planning xi, x, 3, 5, 714,
27, 3340, 4253, 6061,
6371, 7390, 95107, 109, 111,
114122, 124127, 130, 136, 156,
158159, 176, 178, 189, 191192,
205206, 211224, 228,
231233, 237240, 242247,
249, 252, 257, 261
planning efficiency 220
planning time 11, 3436, 3940,
45, 50, 65, 73, 75, 81, 8385, 87,
89, 98, 100101, 111, 118120,
124, 127, 158159, 214215,
219221, 223, 228, 231, 242
planning-as-familiarity 214
planning-as-organisation 221
planning-as-time 214
planning-while-speaking 191
PLex 22, 165
post xi, 3, 5, 89, 12, 51, 96,
129, 131134, 136139, 141148,
150151, 156, 238244, 247,
249252, 254, 256257
post-task activities 5, 8, 51, 84,
129, 131133, 137, 150, 156, 239,
247, 254
post-task focus stage
post-task manipulation 12
post-task phase 3, 9, 14, 132, 250,
256257
post-task stage 129, 131, 136,
144146, 151, 249252, 256
post-task transcribing
condition 129, 138
post-task transcription 12,
133134, 138, 144, 148, 150,
238239
practice 32, 66, 136, 146, 165,
252253, 256257
practice activities 253
prefabricated expressions 81
preparedness 5, 78, 11, 14, 65,
89, 191, 212213, 217218, 220,
223, 245246
pre-selection 252
presentation-practiceproduction 252
pre-task influence 212
pre-task planning 3, 7, 10, 33,
3536, 4951, 6465, 70, 7576,
8082, 130, 191, 214, 221222,
238, 243
pre-verbal message 28, 49,
52, 7879, 107, 157, 230,
244245
pre-watching 3336, 3940, 47,
4950
pressuring 14, 157, 178, 180, 208,
220, 255257
prime 223
primed 78, 222
priming 217, 223
problem of new language 251
problem-solution structure 164,
187190, 196, 200, 203, 207,
225, 227, 236
procedural memory 29, 32
processing ix, x, xi, xii, 13,

5,8, 10, 1314, 2732, 37, 39,
4849, 5152, 64, 68, 7982,
107, 119, 145, 155, 157, 160163,
171, 175, 177180, 187, 190,
195196, 198202, 205208,
211212, 215, 217218, 220237,
242243, 245246, 248,
253254
processing approach xi, 68, 242
processing capacity 64, 79, 82
processing conditions 8, 10, 13,
32, 162, 175, 178, 187, 190, 195,
199200, 202, 205, 207208,
215, 223, 226, 231232
processing limitations 3
processing pressure(s)
proficiency range 224
project work 246, 257
propositional demands 234
pruned words 7273, 168
pseudo-filled pauses 2021
pseudo-filled pausing
psycholinguistic processes 45,
21, 157, 176, 180
R
readiness 11, 6368, 8082,
8490, 212213, 217, 220, 222,
248250
reasoning demands 157158, 231
re-entry points 230
reformulation 41, 4445, 71, 73,
101, 133, 166, 168, 197198, 201,
203205
rehearsal 33, 36, 52, 6368, 86,
89, 97, 107, 116117, 120, 127,
159, 212, 216218, 221, 223, 240,
242243, 245
rehearsal strategies 97
repair 51, 71
repair fluency 20, 73, 76, 78,
83, 8586, 166167, 175, 192,
197198
repertoire creation 248, 254
repetition 5, 7, 1011, 20, 27,
34, 3640, 4244, 4648, 50,
5253, 61, 6567, 73, 84, 86, 89,
101, 156, 166168, 203, 214215,
217218, 222223, 240, 243,
245246
replacement 20, 73, 167168
resource deficits 3031
266 Index
resource directedness 7, 231
resource directing 7, 231
resource-dispersing 7, 8,
157159, 177, 231, 232
response deadline 39
restructuring ix, 147, 248, 250,
253
retrieval depth 218
retrieval strategies 97
retrospective interviews 11, 95,
9799, 101104, 215
Revised Hierarchical Model 31
revising 133134, 137138, 140, 148
revision after transcribing 129,
144, 148149
risk-taking 2, 147
role of the teacher 247, 249
rule-based language 253
rule-based system 2, 79
S
scale of structure 188, 227
schema 6668, 188, 213
schematic familiarity 6667, 86,
89, 212
script 188
second language pedagogy 52,
129, 132
selective attention 31, 51,
211212, 236, 238, 244
self-monitoring 28, 80
self-repair 51
self-reported planning
behaviours 95, 106, 109, 120
self-revision
sequences of development 248
serial processing 221, 224, 229,
233234, 237, 243, 245246
silent pauses 19
sociocultural 242
speech fluency 3536, 4447,
4950, 52
speech monitoring 27, 48, 5053
speech production x, 10, 2729,
3134, 3639, 4752, 68, 88,
95, 107, 161, 178, 189190,
206,221222, 229230, 242,
244, 252
speech rate 20, 3536, 48, 51, 71,
7375, 79, 168
speed fluency 166167, 169
speed of input 234, 246
strategic planning 27, 33, 3536,
3840, 4247, 50, 5253, 61,
6370, 7378, 8089, 96, 192,
212215, 217219, 221222, 238,

240, 242
structural complexity 68, 81,
109, 155, 159, 167, 169, 175176,
189, 213, 236, 242244
structural priming
structured narratives 177, 187,
201, 203
student-initiated transcribing 133
subordination 16, 4142, 4446,
77, 102, 107108, 111112, 117,
119120, 143, 168169, 171, 173,
175, 178179, 189190, 192, 198,
201, 203204, 206208, 228
supported on-line planning 176,
220, 242244, 252
supported processing conditions
syllables per minute 20
syllabus 4, 6, 251, 253
syntactic encoding 157
T
task-based syllabuses 254
task characteristics xi, 2, 67, 9,
12, 84, 155156, 236
task complexity x, 37, 9, 13, 121,
155, 157158, 160, 163, 176177,
191, 212, 228, 230232, 236, 241
task conditions 13, 5, 7, 23,
36, 38, 40, 6465, 84, 96, 132,
155157, 176, 181, 205, 236237,
252253
task cycle 131, 137, 247, 257
task difficulty 56, 96, 157, 237
task external readiness 66, 89
task familiarity 6667, 86, 212
task input 234, 249
task internal readiness 63
task phases 250
task processing 5, 8, 231
TaskProfile 1618, 21, 42, 101,
166167, 197
task readiness 6364, 66, 87
task repetition 27, 3637, 52,
6567, 84, 86, 89
task sequencing 88, 246
task structure x, 6, 9, 13, 111,
118, 155, 157, 159, 161163, 165,
167, 177, 187, 200201, 212, 223,
227228, 231232, 245
task types ix, 56, 66, 135136,
142144, 156
teacher behaviour 248
teacher-initiated
transcribing 133
tempo-naming 39
temporal aspects of speaking 79
test fairness 68, 88
test-task 240
there-and-then 89, 13, 155,
157166, 169170, 173179, 181,
184, 190191, 194, 202, 207,
226228, 232235, 243, 245
time perspective 89, 13, 96,
155, 157, 159163, 167, 169170,
172173, 175176, 179, 181,
190191, 231233, 235, 237, 245
time pressure 5, 8, 27, 3038, 40,
4448, 51, 53, 67, 176, 179, 191,
194, 203, 206207, 227
topic familiarity 63, 6571,
7387, 8990
total silence 73
trade-off 39, 53, 82, 84, 120, 149,
156160, 162, 175178
Trade-off Hypothesis 156158,
160, 162, 175, 177
transcribing 51, 129, 132134,
136150
transcribing-and-revising 134
triadic componential
framework 157
type-token ratio 21, 46
U
unfamiliar tasks 70, 73, 75,
82, 88
unguided planners 5152
unguided planning 51
unmediated narration 194
unpressured performance
conditions 245
utility criterion 255256
V
vocabulary difficulty 155, 165,
167, 172174, 233
VocD 21, 42, 138
W
within-task planning 33, 6467,
80, 212
words per AS unit 72, 7778,
8183, 85, 138, 141143, 168
words per clause 72, 77, 8182,
192, 198, 201202, 206208
words per minute 41, 68, 71, 168
working memory 34, 30, 33,
8081, 156, 160, 177, 189, 211,
230231, 236, 241

9027207259

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

9027207259

Hochgeladen von

Copyright:

Verfügbare Formate

Processing Perspectives on Task Performance

Task-Based Language Teaching:

University of Hawaii at Manoa

Kris Van den Branden

John Benjamins Publishing Company

The paper used in this publication meets the minimum requirements of

Library of Congress Cataloging-in-Publication Data

2014 John Benjamins B.V.

Investigating a Processing Perspective on Task Performance

Series editors preface to Volume 5

Investigating a Processing Perspective on Task Performance

Investigating a Processing Perspective on Task Performance

The context for researching a processing

St. Marys University, Twickenham

The context for researching a processing perspective on task performance

The context for researching a processing perspective on task performance

The framework is not intended to provide any theoretical insights. Instead, it is

The context for researching a processing perspective on task performance

Task characteristics, as it happens, are analysed differently by Peter Robinson. The

be another type of preparedness, as more generally, would be the greater relatedness

The context for researching a processing perspective on task performance

The structure of the book

The context for researching a processing perspective on task performance

The context for researching a processing perspective on task performance

the competing merits of the Cognition and Tradeoff approaches to understanding

view of preparedness, of the role of familiarity, of on-line planning, and of repetition.

The context for researching a processing perspective on task performance

The context for researching a processing perspective on task performance

As a result, we now have three accuracy measures: error-free clauses, errors

The context for researching a processing perspective on task performance

Pseudo, e.g. like, actually

The context for researching a processing perspective on task performance

roposed as a measure of automatisation in speech (Towell, Hawkins & Bazergut

The context for researching a processing perspective on task performance

The context for researching a processing perspective on task performance

On-line time pressure manipulations

This chapter is concerned with an investigation of the underlying mechanisms of

On-line time pressure manipulations

forms that speakers encounter, the morpho-phonological formulation as well as their

Table 1 clarifies how the three processors in speech production work in an

L2 speaking processing and time pressure

In comparison to L1 speakers, L2 speakers generally have more time pressure

On-line time pressure manipulations

learners, grammatical rules whose computation depends upon procedural m

L2 speaking intervention targeting the bottleneck of time pressure

On-line time pressure manipulations

fluency, and complexity in speech production (Skehan 1998). Connecting with

Table 3. Control and experimental conditions

Evidence from the

Skehan & Foster 1999

Bygate 1996, 1999, 2001;

On-line time pressure manipulations

On-line time pressure manipulations

English proficiency pre-test

On-line time pressure manipulations

Slowed video for on-line planning

5 Watched Strategic Planning

Task conditions and instructions

Measures of speaking performance

On-line time pressure manipulations

of speaking performance are important. Researchers generally regard speaking

The average length of pauses at the end of

The average length of pauses in the

The number of strings in a speech sample

The context for researching a processing perspective on task performance

The context for researching a processing perspective on task performance

The context for researching a processing perspective on task performance

The context for researching a processing perspective on task performance

The context for researching a processing perspective on task performance

The context for researching a processing perspective on task performance

The context for researching a processing perspective on task performance

The context for researching a processing perspective on task performance

The context for researching a processing perspective on task performance

The context for researching a processing perspective on task performance

The context for researching a processing perspective on task performance

The context for researching a processing perspective on task performance

On-line time pressure manipulations

On-line time pressure manipulations

On-line time pressure manipulations

On-line time pressure manipulations

On-line time pressure manipulations

On-line time pressure manipulations

On-line time pressure manipulations

The average length of pauses at the end of

The average length of pauses in the

The number of strings in a speech sample

Total number of words in a speech

The average number of morphemes per

Total number of subordination clauses

Total number of error free clauses (which

Total number of error free clauses divided

Adjusted type token ratio8 (Malvern &

On-line time pressure manipulations

On-line time pressure manipulations

On-line time pressure manipulations

On-line time pressure manipulations

On-line time pressure manipulations

On-line time pressure manipulations

On-line time pressure manipulations

On-line time pressure manipulations

On-line time pressure manipulations

On-line time pressure manipulations

Topic familiarity (prior