Beruflich Dokumente
Kultur Dokumente
The Second Language Acquisition Research series presents and explores issues bearing
directly on theory construction and/or research methods in the study of second
language acquisition. Its titles (both authored and edited volumes) provide thor-
ough and timely overviews of high-interest topics, and include key discussions of
existing research findings and their implications. A special emphasis of the series is
reflected in the volumes dealing with specific data collection methods or instru-
ments. Each of these volumes addresses the kinds of research questions for which
the method/instrument is best suited, offers extended description of its use, and
outlines the problems associated with its use. The volumes in this series will be
invaluable to students and scholars alike, and perfect for use in courses on research
methodology and in individual research.
Of related interest:
Second Language Acquisition
An Introductory Course, Fourth Edition
Susan M. Gass with Jennifer Behney and Luke Plonsky
Second Language Research
Methodology and Design, Second Edition
Alison Mackey and Susan M. Gass
EYE TRACKING IN
SECOND LANGUAGE
ACQUISITION AND
BILINGUALISM
A Research Synthesis and
Methodological Guide
Aline Godfroid
First published 2020
by Routledge
52 Vanderbilt Avenue, New York, NY 10017
and by Routledge
2 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN
Routledge is an imprint of the Taylor & Francis Group, an informa business
© 2020 Taylor & Francis
The right of Aline Godfroid to be identified as author of this work
has been asserted by her in accordance with sections 77 and 78 of
the Copyright, Designs and Patents Act 1988.
All rights reserved. No part of this book may be reprinted or
reproduced or utilised in any form or by any electronic, mechanical,
or other means, now known or hereafter invented, including
photocopying and recording, or in any information storage or
retrieval system, without permission in writing from the publishers.
Trademark notice: Product or corporate names may be trademarks
or registered trademarks, and are used only for identification and
explanation without intent to infringe.
Library of Congress Cataloging-in-Publication Data
A catalog record for this title has been requested
ISBN: 978-1-138-02466-3 (hbk)
ISBN: 978-1-138-02467-0 (pbk)
ISBN: 978-1-315-77561-6 (ebk)
Typeset in Bembo
by Deanta Global Publishing Services, Chennai, India
Visit the eResources: https://www.routledge.com/9781138024670
To Koen
CONTENTS
9.3 Getting Started 338
9.3.1 Ideas for Research 338
9.3.1.1 Research Idea 1: Entry-Level: Create a
Sentence-Processing Experiment 338
9.3.1.2 Research Idea 2: Entry-Level: Create a
Text Reading Study 339
9.3.1.3 Research Idea 3: Entry-Level: Study Script
Effects in Reading 341
9.3.1.4 Research Idea 4: Entry-Level: Create a Visual
World Study 343
9.3.1.5 Research Idea 5: Intermediate: Replicate a L1
Reading Study with L2 Readers 345
9.3.1.6 Research Idea 6: Intermediate: Replicate a L1 Visual World
Study with L2 Listeners or Bilinguals 347
9.3.1.7 Research Idea 7: Intermediate: Conduct an
Interaction Study 348
9.3.1.8 Research Idea 8: Advanced: Examine L2 Listening as a
Multimodal Process 349
9.3.1.9 Research Idea 9: Advanced: Study Cognitive Processes dur-
ing Intentional Vocabulary Learnings 351
9.3.1.10 Research Idea 10: Advanced: Conduct a Synchronous
Computer Mediated Communication (SCMC) Study with
Eye Tracking 353
9.3.2 Tips for Beginners 355
9.3.2.1 About the Equipment 355
9.3.2.2 About Data Collection 356
9.3.2.3 About Data Analysis 361
Notes 363
References364
Index of Names 401
Index406
FIGURES
a new field come new questions and possibilities for innovation. At the same time,
many design issues in eye-tracking research do generalize across language-related
disciplines. Therefore, I hope other researchers working across the language sci-
ences will find this book useful as well.
The structure of this book reflects the different stages of the research cycle. I
organized it this way with my graduate students in mind, who, over the course
of a 15-week seminar, learn the ropes of eye tracking in the Second Language
Studies Program at Michigan State University. Depending on where you find
yourself in the research process, you may find it useful to read the corresponding
chapters in the book. Chapter 1 introduces eye-tracking methodology in rela-
tion to other real-time data collection methods. It will be most useful if you are
considering whether eye tracking could enrich your research program and what
other data sources eye tracking can be triangulated with. Chapter 2 summarizes
my reading of the cognitive psychology literature. It boils down nearly 45 years
of fundamental research to a chapter-length, accessible summary of fundamental
facts about eye movements that probably every eye-tracking researcher in the lan-
guage sciences should know. Chapters 3 and 4 present the findings of a synthetic
review of eye-tracking research in second language acquisition and bilingualism,
highlighting major themes and developments in the field so researchers can situ-
ate their own work and find a topic. Chapters 5 and 6 cover the design of eye-
tracking studies. Chapter 5 is a stepping stone for Chapter 6, in that it introduces
basic principles of experimental design. Equipped with this knowledge, readers
can tackle the eye-tracking-specific information in Chapter 6. After reading these
chapters, you will be able to conceptualize and design your own eye-tracking
project. Chapter 7 provides a comprehensive overview of eye-tracking measures
in second language acquisition and bilingualism. You can focus specifically on
those measures you use in your own research or read with the aim of exploring,
so you can diversify and expand your current selection of measures. Chapter 8
covers topics in data cleaning and analysis. It caters to readers with different levels
of statistical literacy by providing both an overview of current statistical practices
and an in-depth introduction to newer statistical techniques (linear mixed-effects
models and growth curve analysis) that have gained importance in recent years.
Lastly, Chapter 9 brings the reader full circle by providing practical advice on
purchasing or renting an eye tracker, setting up a lab, tips for data collection, and
ideas for research. Overall, this book will explain the details of how and why to best
collect and analyze eye-movement recordings for well-designed and informative
language research.
ACKNOWLEDGMENTS
Writing is a process, and this process would not have been as rewarding and at
times even fun without the help and encouragement of a group of talented and
caring people. I am thankful to the series editors Susan Gass and Alison Mackey
for giving me an opportunity to write this book and to the Routledge editors
Ze’ev Sudry and Helena Parkinson for overseeing the publication process. My
thanks also go to Paula Winke, Co-Director of the Second Language Studies Eye-
Tracking Lab, and to all the previous students in my eye-tracking course and in
particular JinSoo Choi, Caitlin Cornell, and Megan Smith, who have commented
on different chapters in this book. Markus Johnson from SR Research and Wilkey
Wong from Tobii have been most helpful answering my questions about eye
trackers and provided feedback on the final chapter of the book. Chapters 4, 6,
and 8 also benefited from conversations with Gerry Altmann and Denis Drieghe.
I owe a special debt of gratitude to Carolina Bernales, Bronson Hui, and
Kathy MinHye Kim for their numerous contributions to the book, which have
made it better in so many ways. My thanks also go to Dustin Crowther and
Kathy MinHye Kim for their help coding the studies. Elizabeth Huntley, Wenyue
Melody Ma, and Koen Van Gorp proofread all the chapters and made many astute
suggestions that have made this book a better read. I thank all the eye-tracking
researchers who generously contributed examples from their studies and whose
work has had an obvious impact on my thinking. I thank my writing partners in
crime, Patricia Akhimie, Claudia Geist, Gustavo Licon, and Kelly Norris Martin,
for motivating me to show up and do the work week after week. I thank my East
Lansing friends Natalie Philips and John McGuire and my Ann Arbor friends for
making cold Michigan winters a little warmer and lastly, I thank my family in
Belgium and Koen for their unwavering love and support as I pursued my aca-
demic dreams on a new continent.
Aline Godfroid, Haslett, Michigan
1
INTRODUCING EYE TRACKING
In a fast-changing and multilingual world, the study of how children and adults
learn languages other than their native tongue is an important endeavor. Questions
about second language (L2) learning are at the heart of the sister disciplines of
second language acquisition (SLA) and bilingualism, and researchers who work in
these areas have an increasingly diverse and sophisticated methodological toolkit
at their disposal (Sanz, Morales-Front, Zalbidea, & Zárate-Sández, 2016; Spivey
& Cardon, 2015). In addition, it seems that in the 21st century, the preferred way
to investigate questions of language processing and representation is online—that
is, as processes unfold in real time—because the data obtained in this way offer a
more fine-grained representation of the learning process than any offline meas-
urements could (Frenck-Mestre, 2005; Godfroid & Schmidtke, 2013; Hama &
Leow, 2010). This book is about one online methodology that is well suited for
studying both visual and auditory language processing, namely eye-movement
registration, commonly referred to as eye tracking.
Eye tracking is the real-time registration of an individual’s eye movements, typ-
ically as he or she views information on a computer screen. Within the Routledge
Series on Second Language Research Methods, this guide on eye-tracking method-
ology is the third to be devoted to an online data collection method, follow-
ing Bowles’s (2010) meta-analysis of reactivity research regarding think-alouds,
and Jiang’s (2012) overview of reaction time methodologies. This shows how
eye tracking is part of a collection of online techniques that have been gain-
ing momentum in SLA and bilingualism (also see Conklin, Pellicer-Sánchez, &
Carrol, 2018). Across the language sciences, linguists, applied linguists, language
acquisitionists, bilingualism researchers, psychologists, education researchers, and
communication scientists have similarly embraced the recording of eye move-
ments in their research programs. Although the research reviewed in this book is
2 Introducing Eye Tracking
primarily from SLA and bilingualism, the principles for researching language with
eye tracking generalize to other domains that use similar materials as well, which
gives the methodological part of this book a broad, interdisciplinary reach.
To understand where eye tracking fits within the larger movement toward
online research methodologies, and to appreciate some of its strengths and
weaknesses, I introduce eye tracking along with three other concurrent meth-
odologies—think-aloud protocols, self-paced reading (SPR) and event-related
potentials (ERPs)—which present themselves as complements and sometimes
competitors to the eye-tracking method.
1.1.1 Think-Aloud Protocols
Thinking aloud is when a participant says out loud his or her thoughts while
carrying out a particular task, such as solving a math problem, reading, or taking
Introducing Eye Tracking 3
a test. That particular task, known as the primary task, is the one researchers want
to study, and thinking aloud is sometimes referred to as the secondary task (e.g.,
Ericsson & Simon, 1993; Fox, Ericsson, & Best, 2011; Godfroid & Spino, 2015;
Goo, 2010; Leow, Grey, Marijuan, & Moorman, 2014), which is used to shed
light on the main task of interest. Thus, think alouds are a tool researchers use
to study cognitive processes as they unfold during some type of human activity,
such as language processing.
Think-aloud protocols stand out among the family of concurrent or online
data collection methods because they yield qualitative, rather than quantitative,
data as their primary outcome. This makes them an interesting supplement for
other online methods, which produce quantitative data, even though it is also
possible to analyze think-alouds quantitatively after data coding (e.g., Bowles,
2010; Ericsson & Simon, 1993; Leow et al., 2014). An ingenious study that trian-
gulated think alouds and eye tracking was Kaakinen and Hyönä (2005). Kaakinen
and Hyönä manipulated L1 Finnish participants’ purpose for reading, by asking
them to learn more about one of two rare diseases that a friend had supposedly
contracted. Using the eye-tracking data, the authors showed that sentences that
were relevant to their participants’ reading perspective (i.e., their friend’s disease)
generated longer first-pass reading times than sentences that dealt with the other
disease also discussed in the text. In addition, participants more often showed evi-
dence of deeper levels of processing in the think alouds they produced after the
longer first-pass reading times.1 An interesting secondary finding was that verbal
evidence of deeper processing coincided with elevated reading times, but not
with the presence of task-relevant information per se (i.e., not all sentences about
the target disease elicited deep processing). This would seem to suggest that the
longer eye fixation durations were the factor that mediated between text infor-
mation and the participants’ depth of processing.
Think-alouds are a versatile research methodology (see Fox et al., 2011, for a
recent review and meta-analysis). Within SLA research, think alouds have been
collected to study questions pertaining to noticing and awareness (Alanen, 1995;
Godfroid & Spino, 2015; Leow, 1997, 2000; Rosa & Leow, 2004; Rosa & O’Neill,
1999) the processing of feedback during writing, including “noticing the gap”
(Qi & Lapkin, 2001; Sachs & Polio, 2007; Sachs & Suh, 2007; Swain & Lapkin,
1995); depth of processing (Leow, Hsieh, & Moreno, 2008; Morgan-Short, Heil,
Botero-Moriarty, & Ebert, 2012); strategy use in vocabulary acquisition (De Bot,
Paribakht, & Wesche, 1997; Fraser, 1999; Fukkink, 2005; Nassaji, 2003) and test
taking behavior (Cohen, 2006; Green, 1998). The prevalent view is that think-
aloud protocols reflect the contents of the speaker’s short-term memory, which
are believed to be conscious (Erricson & Simon, 1993; Pressley & Afflerbach,
1995). Therefore, unlike the other online methodologies reviewed in this chapter,
which also capture unconscious processes—some would even claim SPR, eye
tracking, and ERPs capture only unconscious processes (Clahsen, 2008; Keating &
Jegerski, 2015; Marinis, 2010; Tokowicz & MacWhinney, 2005)—it is important
4 Introducing Eye Tracking
to emphasize that think alouds measure primarily conscious processes that involve
information in the speaker’s awareness (Godfroid, Boers, & Housen, 2013).
Leow et al. (2014) reviewed research on the early stages of L2 learning that
relied on either think-alouds, eye tracking, or reaction time (RT) measurements.
Of these three methodologies, think alouds emerged as the only measure that
can differentiate between levels of awareness and depth of processing. In contrast,
Leow et al. concurred with Godfroid et al. (2013) that eye tracking may provide
“the most robust measure of learner attention” (Leow et al., 2014, p. 117). Not
surprisingly, Leow and his colleagues also found RTs shared many properties with
eye-movement data—eye-movement data being a special type of RTs—but noted
RT tasks were less expensive and easier to implement than eye tracking. Thus, a
major strength of think alouds compared to other online methodologies is that
think alouds can illuminate the how and why of (conscious) processing (Leow et
al., 2014), whereas I would argue the same information is represented differently
in eye-movement records and is represented only to a limited extent in RT tasks.
Think alouds differ from eye tracking and ERPs, but not from SPR, in that
they require participants to engage in an additional task: speaking out loud, in the
case of think alouds, and pressing a button for SPR. Secondary tasks like these are
sometimes criticized because they carry the risk of altering participants’ cognitive
processes and changing the primary task under investigation. For think-aloud meth-
odology, this issue is known as reactivity (see Leow & Morgan-Short, 2004); it can
compromise the internal validity of a study, because researchers may no longer be
studying the cognitive process they intended to study. For instance, participants may
perform a task more analytically or with greater focus when they are asked to think
aloud concurrently.The potential reactivity of think alouds has enjoyed a good deal
of research attention in SLA and was the object of a meta-analysis in Bowles (2010).
Using a sample of 14 primary studies, Bowles analyzed how task and participant fac-
tors influenced the reactivity of thinking aloud with verbal primary tasks. She found
an overall “small effect” (p. 110) for think alouds, although this effect differed as a
function of type of verbal report, L2 proficiency level, and the primary task. Bowles
concluded that “the answer to the question of reactivity and think-alouds is not a
simple ‘yes’ or ‘no’ but rather is dependent on a host of variables” (p. 110).
Godfroid and Spino (2015) revisited the reactivity question for think alouds
and extended the concept to eye-tracking methodology. By investigating the
reactivity of eye tracking, the authors tested the widely held assumption that
reading with concurrent eye-movement registration is representative of natural
reading, a claim that had not been evaluated empirically since Tinker’s (1936)
study. Participants in Godfroid and Spino were English college majors at a Belgian
university, who had an upper-intermediate to advanced English proficiency level.
The participants read 20 short English texts embedded with pseudo words in an
eye-tracking, a think-aloud, or a silent control condition. Using both traditional
statistical tests and equivalence tests, Godfroid and Spino found converging evi-
dence that thinking aloud or eye tracking did not affect the learners’ text com-
prehension (see Figure 1.1), which was consistent with the results of Bowles’s
FIGURE 1.1 Converging evidence from traditional null hypothesis significance testing and equivalence tests that neither think alouds nor eye
tracking affect reading comprehension.
(Source: Adapted from Godfroid & Spino, 2015).
6 Introducing Eye Tracking
(2010) meta-analysis for think alouds. However, thinking aloud had a small, posi-
tive effect on participants’ posttest recognition of the pseudo words, while the
results for eye tracking were mixed. These findings lend some empirical support
to the claim that eye tracking is “considered to be the closest experimental paral-
lel to the natural reading process” (Cop, Drieghe, & Duyck, 2015, p. 2) although
much more research on the potential reactivity of eye tracking, following similar
work for think alouds, is needed.
1.1.2 Self-Paced Reading
Of all online methodologies, SPR—the self-paced reading of sentences that are
broken down and presented in separate segments—is the one that resembles eye
tracking most closely. Proponents of SPR highlight the practicality of the method
and its fitness-for-purpose ( Jegerski, 2014; Mitchell, 2004). Mitchell (2004) put it
most strongly, when he questioned researchers’ natural tendency to go for “nuclear
weaponry” (p. 15) when choosing a research methodology even though “a simple
and apparently crude method is often all that is needed” (ibid.).
Participants in a SPR experiment read sentences or short paragraphs in a
word-by-word or phrase-by-phrase fashion. A new segment appears and, in cur-
rent versions of the paradigm, the previous segment disappears each time the
participant presses a button. Because the participant controls the text presentation
rate, reading is said to be self-paced or subject-paced, unlike rapid serial visual
presentation (e.g., Aaronson & Scarborough, 1977; Forster, 1970; Potter, 1984),
where every word remains on the screen the same amount of time and presen-
tation is researcher- or experimenter-paced. In the 40-year existence of SPR,
researchers have experimented with different versions of the paradigm: centered
or linear text presentation and, in the case of linear presentation, cumulative or
non-cumulative formats. Figure 1.2 displays an example sentence from Hopp’s
(2009) SPR study, (a) with the original linear, non-cumulative display, (b) a linear,
cumulative display, and (c) using a centered format (always non-cumulative).
In a classic study, Just, Carpenter, and Woolley (1982) compared the button-
press times obtained under each of the previously mentioned presentation modes
with the eye gaze durations recorded with an eye tracker for the same texts.
(The eye-tracking data were the object of another oft-cited study, namely Just
and Carpenter [1980]). A total of 49 undergraduate students across the two stud-
ies read 15 short scientific texts in their native language (L1), English, follow-
ing either an eye-tracking or a SPR procedure. Just et al., (1982) modeled the
relationship between ten word- and text-level variables, and the reading times
within each data collection method. They found that the non-cumulative, linear
SPR data resembled the gaze data from eye tracking most closely, in that the
word- and text-level properties influenced both data types in a broadly simi-
lar way (e.g., increased times for word length and shorter times for word fre-
quency; see Table 1.1 and Chapter 2). Centered SPR also reproduced most of the
(a) Non-cumulative, linear display, also known as the moving window technique
___ ______ ____ ___ ______ __ _______ ___ _______ ________ ____
Ich glaube ____ ___ ______ __ _______ ___ _______ ________ ____
___ ______ dass ___ ______ __ _______ ___ _______ ________ ____
___ ______ ____ den Läufer __ _______ ___ _______ ________ ____
___ ______ ____ ___ ______ am Sonntag ___ _______ ________ ____
___ ______ ____ ___ ______ __ _______ der Trainer ________ ____
___ ______ ____ ___ ______ __ _______ ___ _______ gefeiert ____
___ ______ ____ ___ ______ __ _______ ___ _______ ________ hat.
Ich glaube ____ ___ ______ __ _______ ___ _______ ________ ____
Ich glaube dass ___ ______ __ _______ ___ _______ ________ ____
Ich glaube dass den Läufer __ _______ ___ _______ ________ ____
Ich glaube dass den Läufer am Sonntag ___ _______ ________ ____
Ich glaube dass den Läufer am Sonntag der Trainer ________ ____
Ich glaube dass den Läufer am Sonntag der Trainer gefeiert ____
Ich glaube dass den Läufer am Sonntag der Trainer gefeiert hat.
Ich glaube
dass
den Läufer
am Sonntag
der Trainer
gefeiert
hat.
FIGURE 1.2
Three presentation formats in self-paced reading. Note: Ich glaube dass
den Läufer am Sonntag der Trainer gefeiert hat, “I believe that the trainer
celebrated the runner last Sunday.”
8 Introducing Eye Tracking
word- and text-level effects, but cumulative linear SPR did so to a lesser extent.
The reason is that some participants in the cumulative SPR condition pressed
the button multiple times at once to display longer stretches of text, which dis-
rupted the link between the reading time data and ongoing cognitive processing
(also see Fernanda Ferreira & Henderson, 1990). To avoid this issue, current SPR
researchers opt for a non-cumulative, linear display or sometimes a centered dis-
play. Specifically, centered displays have had a place in L2 research when the goal
was to replicate earlier work with native speakers that had relied on centered SPR
(Roberts, personal communication, August 12, 2015). Without such a precedent
in the L1 literature, however, non-cumulative, linear SPR may be the preferred
presentation format.
Although the moving window procedure fared generally well in Just et al.’s
comparison, a few differences with eye tracking are worth pointing out. First,
goodness of fit (i.e., how well the word- and text-level variables could account
for the reading times) systematically decreased: from R2 = .79 for eye-movement
data (gaze duration), to R2 = .56 for linear non-cumulative SPR times, to R2 =
.45 for centered SPR times, and finally R2 = .39 for linear cumulative SPR times.
Second, readers in all three SPR conditions spent approximately twice as long on
Introducing Eye Tracking 9
a word as those in the eye-tracking group (see intercept values of 289, 333, and
381 in Table 1.1, which represent the hypothetical reading time on a 0-letter, 0 log
frequency, etc. word).The longer reading times in SPR are a constant finding in the
literature (Rayner, 1998, 2009) that have concerned some proponents of eye track-
ing. Indeed, participants in SPR experiments “have a substantial amount of ‘unallo-
cated time’ (i.e., time not used in the service of word recognition or eye movement
control)” (Clifton & Staub, 2011, p. 905) of which the nature is unknown. Finally,
compared to the statistical output for the eye-movement data, Just et al. (1982)
found that “the moving-window condition appear[ed] to decrease the size of the
word-length and word-frequency effects by a factor of two but to magnify most
other effects [e.g., word novelty, first mention of topic] by a factor of three or four”
(Just et al., 1982, p. 233, my addition). This can be seen by comparing the regres-
sion coefficients for these variables in Table 1.1. How serious these departures
from natural reading are depends on the topic of one’s study. For example, Just and
colleagues’ results do not support using SPR to study the L2 acquisition of new
lexical (and perhaps also new grammatical) forms or discourse-level phenomena,
because new forms, by definition, have a low frequency. In other research areas, the
safe option is no doubt to replicate SPR findings using eye-tracking methodology.
An area in L1 research that has relied extensively on both SPR and eye tracking
is parsing, or the real-time computation of syntactic structure. A major question in
parsing research is whether sentence processing is modular (Fodor, 1983; Frazier,
1987) or interactive (MacDonald, Pearlmutter, & Seidenberg, 1994; Marslen-Wilson
& Tyler, 1987;Tanenhaus & Trueswell, 1995); that is, whether the initial parse (analy-
sis) of a sentence involves only structural (syntactic) or both structural and non-
structural (lexical, semantic, discourse-level) information. Because of the importance
of measuring the parser’s original analysis, rather than a reanalysis, to adjudicate
between these theories, work in this area has produced a fair deal of research that has
used both SPR and eye tracking (see Ferreira & Clifton, 1986; Ferreira & Henderson,
1990;Trueswell,Tanenhaus, & Kello, 1993;Wilson & Garnsey, 2009, for examples).
Wilson & Garnsey (2009) reported on two reading experiments—the first with
word-by-word, non-cumulative SPR and the second with eye tracking—in
which participants read temporarily ambiguous sentences like the following:
(1) (a) The ticket agent admitted the mistake because she had been caught.
(b) The ticket agent admitted the mistake might not have been caught.
10 Introducing Eye Tracking
(2) (a) The CIA director confirmed the rumor when he testified before Congress.
(b) The CIA director confirmed the rumor could mean a security leak.
In both examples, the bolded noun phrase is ambiguous because it can represent
either the direct object of the main clause, as shown in (1a) and (2a), or the subject
of an embedded clause, as shown in (1b) and (2b).
Wilson and Garnsey were particularly interested in the sentences with a direct
object interpretation, which are arguably the simpler of the two possible structures.
They wanted to know whether the statistical properties of the main verb—that
is, whether for example, admit and confirm occur more often with direct objects
or with subordinate clauses—influence the speed with which readers parse the
sentence correctly. Both the SPR data and one eye-movement measure (go-past
time; see Chapter 6) suggested this was the case. Thus, their findings lent support
to interactive models of sentence processing. Syntax (a general preference for the
simpler direct-object construction) did not take precedence over verb informa-
tion in the earliest stages of processing. Although Wilson and Garnsey had already
made their point using the moving-window technique, they chose to replicate
their findings with eye tracking in an effort to separate out reanalysis effects from
a reader’s initial parse.They explained that “in part because readers cannot go back
and re-read earlier sentence regions when they encounter difficulty, self-paced
reading times are probably influenced by both initial processing difficulty and the
work done to recover from that difficulty” (p. 376).
Mitchell (2004) noted that SPR and eye tracking have provided converging
evidence about L1 sentence processing, whereby the SPR studies often precede
comparable eye-tracking research by a few years. Mitchell’s claim is yet to be evalu-
ated for L2 processing, given that, to my knowledge, no field-specific comparative
research on SPR and eye tracking is available to date. Filling this research gap will
be important for both methodological and theoretical reasons. For instance, in a
review of L2 syntactic processing research, Dussias (2010) concluded that the evi-
dence for or against structure-based processing in non-native speakers coincided
with research methodology. She noted that SPR data support Clahsen and Felser’s
(2006a, 2006b) shallow structure hypothesis that L2 speakers do not compute full
syntactic structures, whereas eye-movement research suggests the contrary (but see
Felser, Cunnings, Batterham, & Clahsen, 2012, for an apparent exception).To clarify
this issue, there is a need for comparative research on the shallow structure hypoth-
esis where the same sentences are read in a self-paced fashion and with eye tracking.
1.1.3 Eye Tracking
Eye tracking is the colloquial term used for eye-movement recordings, which are
typically (but not necessarily) made as participants perform a task on a computer
screen. The interest in eye movements dates back to the 18th century (Wade,
2007; Wade & Tatler, 2005). Given that there was no technology to record eye
Introducing Eye Tracking 11
and eye-tracking groups took about the same amount of time, even though the
actual time on task was higher in the think-aloud group (unreported data). To
ensure comparability between the two groups, both think-aloud and eye-tracking
participants met with the researcher one-on-one in this study; however, in other
research contexts, recording think-aloud data from multiple participants at once
(e.g., in a lab) could speed things up considerably. A similar observation applies to
SPR experiments, but not to eye-tracking or ERP studies, where only the largest
research facilities have the equipment and staff to support parallel data collection
sessions.
1.1.4 Event-Related Potentials
Whereas think-alouds, SPR and eye tracking are all behavioral measures, ERP is
a brain-based method that consists of recording a participant’s electrical potentials
directly on the scalp by means of electrodes (for an illustration, see Figure 1.3).
The raw, continuous recording of electrical brain activity is called the electroen-
cephalogram, or EEG; the EEG signal is picked up through a set of 20 to 256
sensors that are embedded in a skull cap (Steinhauer, 2014). The collected signals
are amplified and preprocessed before being time-locked to a critical stimulus in
the input, such as an ungrammatical or unexpected word. The resulting, averaged
Dimigen, Kliegl, & Sommer, 2012; Dimigen, Sommer, Hohlfeld, Jacobs, & Kliegl,
2011; Hutzler et al., 2007; Kretzschmar, Bornkessel-Schlesewsky, & Schlesewsky,
2009). Like ERPs, FRPs are based on EEG recordings, but this time, the onset of
an eye fixation, rather than a stimulus event, such as the presentation of a word,
serves as the reference for aligning and averaging different EEG segments. The
resulting waveforms, therefore, reflect the neural processing that occurred after
the eyes landed in a new location (e.g., a new word) and so natural reading is
part and parcel of this new paradigm. Importantly, natural reading also makes
the paradigm technically more challenging, because eye movements cause large
artifacts in EEG recordings (see, Dimigen et al., 2011, for extended discussion).
Using FRPs, Dimigen et al. (2011) replicated the N400 effect associated with a
word’s predictability (less predictable words eliciting larger N400s) and related the
effect to first-pass reading times recorded by the eye tracker. The authors found
that at the peak amplitude of the N400 (384 ms post-fixation-onset), only 25%
of participants were still looking at the target word and the majority of these
cases were refixations rather than initial eye fixations.The number rose somewhat
when the onset of the N400 effect was used as the reference, but overall the data
made it “hard to conceive [of] the measurable neural effects of predictability as
being causal in some way for the behavioral effects, because the bulk of the pre-
dictability effects in ERPs only followed those in behavior” (Dimigen et al., 2011,
p. 14). An interesting secondary finding of Dimigen et al.’s study was that N400-
like negative brain potentials were also observed at earlier time intervals (120–160
ms post-fixation-onset), albeit not in a statistically reliable manner.
While FRPs recordings have yet to be introduced into SLA and bilingualism,
a less challenging but still informative approach is to record EEG and eye move-
ments separately, for different participants or with the same participants at different
times (Dambacher & Kliegl, 2007; Deutsch & Bentin, 2001; Foucart & Frenck-
Mestre, 2012; Sereno, Rayner, & Posner, 1998). Foucart and Frenck-Mestre (2012)
reported on three ERP experiments and one eye-tracking experiment conducted
at different times with the same participants. Foucart and Frenck-Mestre revisited
the question of whether late L2 learners can acquire grammatical features—in
this case, noun-adjective gender agreement—in the absence of similar features in
the participants’ L1. The stimuli for the study were French sentences in which the
noun and adjective either agreed or did not agree in grammatical gender.Various
syntactic manipulations made it progressively more difficult to process agreement.
The ERP data revealed that L1 French speakers showed P600 effects (i.e., syntac-
tic reflexes to agreement violations) regardless of where the adjective occurred in
the sentence. Among the L2 French speakers, however, agreement violations elic-
ited a somewhat atypical P600 effect in the easiest condition, an N400 effect in the
intermediate condition, and no consistent response in the most difficult condition.
To clarify the null result in Experiment 3, which tested processing in the
most difficult condition, Foucart and Frenck-Mestre reran the same sentences
by the same participants after a few months, but this time using eye tracking and
Introducing Eye Tracking 17
natural reading conditions. Unlike in the ERP experiment, the English learners of
French now showed significant and comparable levels of grammatical sensitivity
to native speakers in both early and late reading measures.The learners spent more
time reading ungrammatical than grammatical adjectives. These results generally
supported late L2 learners’ ability to acquire grammatical gender. From a meth-
odological viewpoint, the contrasting findings of the ERP and the eye-tracking
experiment point to potentially reactive effects of word-by-word serial visual
presentation. As Foucart and Frenck-Mestre (2012) observed, serial visual pres-
entation might make higher demands on memory than natural reading, which
could prove especially taxing for L2 speakers, whose memory may already be
taxed more (McDonald, 2006).
1.1.5 Synthesis
In this section, I situated eye-tracking methodology among other online, or
real-time, methodologies that have gained currency in the last two decades of
SLA research (also see Sanz, Morales-Front, Zalbidea, & Zárate-Sández, 2016).
Following a brief introduction of each methodology, I focused on how think
alouds, SPR, and ERPs relate to eye tracking, in the belief that such contrasts aid
the understanding of what eye tracking is and is not, and how it can enrich one’s
research program. Table 1.2 summarizes the main themes that emerged from this
overview and rounds them off with some extra information.
Of the methodologies reviewed in this chapter, only think alouds provide the
researcher with qualitative data, which makes them a useful tool alongside SPR,
eye tracking, or ERPs in any study with a between-subjects design (see Section
5.2). For researchers for whom it is important to obtain verbal data and quantita-
tive measures from the same participants, in a within-subjects design (see Section
5.2), non-concurrent verbal reports such as stimulated recall (Gass & Mackey,
2017) and interviews offer an alternative. Stimulated recall and interviews also
seem to be the go-to methodologies in spoken language research, just like SPR
has a sister methodology—SPL—that can serve this purpose. On the other hand,
eye tracking and ERPs allow for use with either written or spoken language,
although in the case of eye tracking, this also means a new paradigm: the visual
world paradigm (see Chapter 4). In general, eye tracking stands out for being
so versatile, a property it shares to this degree only with think-aloud protocols.
However, the think-aloud methodology and SPR are more practical and cost-effi-
cient than eye tracking and some proponents of SPR (e.g., Mitchell, 2004) would
claim that SPR meets sentence processing researchers’ needs for an online read-
ing measure. Finally, ERPs can elucidate questions about the nature of processing
(i.e., semantic or syntactic) that remain outside the purview of most behavioral
research methods. Like SPR, ERPs offer high temporal and spatial resolution, but
they sacrifice ecological validity in return. Our conclusion is three-fold: (i) there
is a greater need for methodological research that directly compares the different
TABLE 1.2 Comparison of thinking aloud, SPR, eye tracking, and ERPs
FIGURE 1.4
Sample trial in a location-cuing experiment. Participants must press a
button that corresponds to the side of the screen where the target appears
while keeping their eyes fixated on a central point.This is a valid-cue trial
because the cue correctly predicts where the target will appear.
effects of covert orienting, because the attentional focus travels to the periphery
(covert orienting) while the eyes remain in place (overt orienting).
Wright and Ward (2008) described how the location-cuing paradigm can be
used to perform a “cost/benefit analysis” (p. 19) of covert orienting. On valid-cue
trials, such as the one shown in Figure 1.4, participants benefit from cuing because
they are already attending to the target site by the time the target appears. This
leads to faster and more accurate responses. However, on invalid-cue trials, when
the cue and target appear on opposite sides of the screen, participants incur a pro-
cessing cost. This is because on invalid-cue trials, participants must reorient their
attention from the cued location to the target location after the target is displayed,
which slows their response times and increases error rates. Posner and colleagues
captured the facilitative and inhibitory effects of covert orienting in a spotlight
metaphor of attention. As is most apparent on valid-cue trials, “attention can be
likened to a spotlight that enhances the efficiency of detection of events within its
beam” (Posner, Synder, & Davidson, 1980, p. 172). However, the spotlight meta-
phor also “capture[s] some of the dynamics involved in disengaging, moving, and
engaging attention” (Posner & Petersen, 1990, p. 35), which become important
in invalid-cue trials.
Research using the location-cuing paradigm has shown convincingly that cov-
ert and overt attention need not coincide: when eye gaze is fixed, the attentional
focus can still shift to a different part of the visual field (see Styles, 2006, and
Wright & Ward, 2008, for a review of studies). However, disjoining covert and
Introducing Eye Tracking 21
overt attention during visual perception requires some effort from the individual
(Wright & Ward, 2008). More importantly, dissociations of covert and overt atten-
tion have been demonstrated in the context of very simple cognitive activities,
such as cue detection, but may be harder to engineer in the context of more com-
plex cognitive tasks such as reading or looking-while-listening (Rayner, 1998,
2009). Looking ahead to the following chapters, the two most influential models
of eye movements in reading only allow for small dissociations between attention
and eye gaze, although it is disputed whether attention is allocated sequentially, to
one word at a time or in parallel, like a gradient (see Section 2.6). Similarly, covert
attention is hypothesized as the link between language and overt eye movements
in theoretical models of the visual world paradigm (see Section 4.1).
Wright and Ward (2008) reviewed three proposals about how eye movements
and covert shifts of attention relate: independent, one common system, or inter-
dependent systems. Because there is a large degree of overlap in the brain areas
underlying covert attention and eye movements (Corbetta, 1998; Corbetta &
Shulman, 1998; Grosbras, Laird, & Paus, 2005), the two mechanisms are likely to
be related to some extent, in line with the common system and interdependent
systems accounts. However, there is as of yet no consensus as to the strength of
this relationship. Because eye movements take about 220 ms to plan and execute
(Wright & Ward, 2008), whereas covert attention shifts are faster, attention will
arrive before the overt eye gaze when both are shifted to the same location (Wright
& Ward, 2008).This gives rise to preview effects in reading; that is, the finding that
processing of the next word in a text begins before the eyes have landed on it
(see Figure 1.5A and Textbox 2.1 following). However, proponents of a com-
mon-system account go one step further by positing a causal relationship between
eye movements and attention shifts. This is the central claim of Rizzolatti and
colleagues’ pre-motor theory (Rizzolatti, Riggio, & Sheliga, 1994; Rizzolatti,
Riggi, Dascola, & Umiltá, 1987; Sheliga, Craighero, Riggio, & Rizzolatti, 1997).
The argument is that covert attention is a side-effect of the motor programming
involved in preparing a saccade: it helps encode the spatial distance that the eye
needs to travel during an eye movement. Simply put, if the eyes are moving, the
brain needs to know where the eyes are moving to (spatial location). Shifts in
covert attention may help to encode the destination of the following saccade. In
this view, then, attention shifts are “planned-but-not-executed saccades” (Wright
& Ward, 2008, p. 195). Although pre-motor theory shares many characteristics
with an influential model of eye-movement control (E-Z Reader, described in
Section 2.6), Wright and Ward (2008) reviewed evidence from neuroanatomical
research (e.g., microstimulation studies with monkeys), suggesting the relationship
between eye movements and attention may, in fact, be more complex.
In sum, applied eye-tracking researchers study overt orienting, as reflected in eye
movements, to learn more about covert orienting, which is the “pure” attentional
process. This is because researchers assume that overt and covert attention largely
coincide, even though there are exceptions and the details of that relationship are
22 Introducing Eye Tracking
FIGURE 1.5 Examples of decoupling between eye gaze and cognitive processing. (a)
Parafoveal processing: although the reader is looking at ate (A2), he or she
is processing a hamburger (A3, A4). (b) Skipping: the reader moves straight
from ate (A2) to hamburger (A4) without looking at a (A3). Researchers
believe that a is processed together with the preceding or the following
word. (c) Parafoveal-on-foveal effects: the reader looks longer at ate (B2)
because the pnzburgers (B3) has an unusual spelling. Pnzburgers influences
processing of ate even though the reader has not looked at word B3 yet.
(d) Spillover effects: The reader looks longer at after (B4) because he or
she is still processing pnzburgers.
still being investigated. In reading, covert and overt attention are decoupled during
parafoveal processing and word skipping (see (a) and (b) in Figure 1.5); in both of
these cases, a word is processed without concurrent eye fixation. Word properties
may also continue to influence processing after a word was fixated, as indicated by
spillover effects (see (d) in Figure 1.5), and could even influence processing prior to
fixation, in parafoveal-on-foveal effects (see (c) in Figure 1.5).
Introducing Eye Tracking 23
1.3 Summary
This chapter provided an introduction to the what, why, and how of eye-move-
ment recordings. It was argued that the turn toward eye-tracking methodology is
a part of a larger movement in SLA research that emphasizes the use of concur-
rent data collection methods. Eye-tracking researchers are interested in studying
processing as it happens, often because they assume that such a perspective is more
informative than focusing solely on test data or questionnaires. A major appeal of
eye-tracking methodology is that it lends itself to studying many different types
of questions in a relatively unobtrusive way. The financial and time investments
necessary for acquiring an eye tracker and learning how to use it are some of the
methodology’s downsides.
Eye movements are overt orienting responses that signal the alignment of
attention with the object at the point of gaze. Eye movements and (covert) atten-
tion shifts are closely linked, although the jury is still out about whether the two
are actually different expressions of one and the same underlying system. Even so,
most applied eye-tracking researchers assume that eye movements offer a window
into cognition in that, by and large, the eye gaze indicates what information is
currently activated or being processed. This is why researchers record eye move-
ments using sophisticated machines known as eye trackers. Most modern eye
trackers infer gaze position based on video recordings of a participant’s pupil and
corneal reflection. In evaluating different eye trackers, both the intended applica-
tion and the desired data quality ought to be considered.
Notes
1 Participants were prompted to say their thoughts out loud each time a red asterisk
appeared on the screen. Participants did not know whether they would have to think
aloud until they had finished reading a sentence. In this way, the authors were able
to obtain pure reading-time measures and think-aloud data without conflating the
two and, specifically, without inflating the reading times (see Godfroid & Spino, for
discussion).
2 More affordable eye trackers such as iView, Eye Tribe, and EyeSee also exist, which may
be suitable for research that requires less accuracy and precision.
2
WHAT DO I NEED TO KNOW
ABOUT EYE MOVEMENTS?
FIGURE 2.1 Visual acuity in the fovea, parafovea, and periphery. Vision is clearest in
a small area around a person’s point of regard (the fovea) and gradually
degrades for information that is represented eccentrically.
(Source: Rayner, Schotter, Masson, Potter, & Treiman, So much to read, so little time: How do we read,
and can speed reading help? 17, 1, 4–34, copyright © 2016 by SAGE Publications, Inc., Reprinted by
Permission of SAGE Publications, Inc.).
FIGURE 2.2 The two major axes through which light travels in the eye.
cells and amacrine cells, which collect signals from cones and rods respectively,
incoming light will be converted to electrical signals and be transferred to the
brain through the optic nerve (Holmqvist et al., 2011;Wedel & Pieters, 2008).The
fovea consists mostly of cones (see Figure 2.3, solid line). However, cone density
steadily decreases away from the point of fixation with a concomitant drop in
visual acuity, while rod density increases (see Figure 2.3, dashed line). This is why,
to see a dim star at night, you often have to look slightly to the side of it, so the
light activates more rods on the edges of your retina (Springob, 2015). Conversely,
the chances of recognizing a target word at the edges of parafoveal vision (5° away
from fixation) are almost 0%: see Figure 2.3 and Rayner et al. (2012).
To describe the different subregions in the visual field, I introduced the con-
cept of degrees of visual angle (°), which is a common unit of measurement in
eye-tracking research.Vision researchers use angular units, because there is a close
correspondence between angular size and retinal image size (Drasdo & Fowler,
1974; Legge & Bigelow, 2011). Drasdo and Fowler (1974, as cited in Legge &
Bigelow, 2011) found that 1° of visual angle in central vision corresponds to 0.28
mm on the retina. To understand the concept of visual angle, one must imagine
the observer as the center point of a circle. The human vision field is ellipsoid-
shaped, as illustrated in Figure 2.4 (Burnat, 2015). It typically extends about 140°
horizontally (90° temporally and 50° nasally) and 110° vertically (50° superi-
orly and 60° inferiorly) (Spector, 1990) and one degree of visual angle equals
FIGURE 2.3 Cone and rod density in the fovea, parafovea, and periphery. Only when
people look directly at (i.e., foveate on) an object, do they have close to
100% chance of recognizing it. Note: solid line represents cones, dashed
line indicates rods, and dotted line refers to accuracy of identifying a
target word.
(Source: Adapted from Rayner et al., 2012).
What Do I Need to Know about Eye Movements? 27
Peripheral
Central
Binocular
45˚ 100˚
5˚ 10˚
M
on
oc
ul
a
r
Optic nerve
1/360th of the circle. Degrees of visual angle can be divided further into minutes
of arc (arcmin, ′) and seconds of arc (arcsec, ′′): 1° = 60′ = 3,600′′. Therefore, 1
minute and 1 second of arc represent 1/21,600th and 1/1,296,000th of a circle,
respectively.
How many degrees a given object subtends will depend on the object’s dis-
tance from the observer. It is good to know an object’s angular size because only
the center 2° of vision are clear. In reading studies, for instance, researchers will
often report the angular size of a letter or the number of letters per degree of vis-
ual angle (see Chapter 6) to provide readers with an understanding of how much
text can be taken in on any single fixation. To calculate angular size, formula (2.1)
can be used. Note that this formula can be applied to any type of visual informa-
tion (e.g., a letter, picture, or area on the screen, the face of an interlocutor, or a
projector screen in a classroom), as long as the size of the region and the distance
from the observer are known.
q x (2.1)
tan =
2 d
28 What Do I Need to Know about Eye Movements?
Additionally, researchers often convert visual angle into more readily interpretable
metrics, such as letters, pixels, or length units (e.g., cm, mm).Table 2.1 summarizes
the degrees of visual angle that characters in Courier font (a common font in
text-based eye-tracking studies) subtend at a range of font sizes and viewing dis-
tances that are typical of eye-tracking research. Note that because text is presented
on a computer screen, the font size is somewhat larger than in print materials. For
font size, one of my collaborators converted measurements in points into mm,
following Legge and Bigelow’s (2011) formula: size in mm = (point size/2.86).
Character width was manually measured in point (pt), the smallest unit of measure
in typography, using Adobe Photoshop 7.0 and was converted to mm. Because
Courier is a fixed-width (monotype, monospaced) font, each letter occupies the
same amount of horizontal space. Fixed-width fonts are preferred in eye-tracking
research (see Section 6.2.2), because they ensure equal horizontal angular size for
all characters, and therefore afford a better control of the visual input.
As can be seen in Table 2.1, the degrees of visual angle of each letter hori-
zontally range from 0.28 to 0.57, depending on the viewing distance. Thus, 1°
of visual angle equates to two to four letters. Rayner (1986) and Keating (2014)
FIGURE 2.5 Relationship of viewing distance d, stimulus size x, and the visual angle θ.
TABLE 2.1
Degrees of visual angle of Courier font point 16–24 at common viewing
distances
Viewing Font size Font size Degrees of visual Font width Degrees of visual angle
distance (in points) (in mm) angle vertically (in mm) horizontally
(in mm)
500 16 5.59 0.64 3.88 0.44
17 5.94 0.68 3.88 0.44
18 6.29 0.72 3.88 0.44
19 6.64 0.76 3.88 0.44
20 6.99 0.80 4.59 0.53
21 7.34 0.84 4.59 0.53
22 7.69 0.88 4.59 0.53
23 8.04 0.92 4.94 0.57
24 8.39 0.96 4.94 0.57
600 16 5.59 0.53 3.88 0.37
17 5.94 0.57 3.88 0.37
18 6.29 0.60 3.88 0.37
19 6.64 0.63 3.88 0.37
20 6.99 0.67 4.59 0.44
21 7.34 0.70 4.59 0.44
22 7.69 0.73 4.59 0.44
23 8.04 0.77 4.94 0.47
24 8.39 0.80 4.94 0.47
700 16 5.59 0.46 3.88 0.32
17 5.94 0.49 3.88 0.32
18 6.29 0.52 3.88 0.32
19 6.64 0.54 3.88 0.32
20 6.99 0.57 4.59 0.38
21 7.34 0.60 4.59 0.38
22 7.69 0.63 4.59 0.38
23 8.04 0.66 4.94 0.40
24 8.39 0.69 4.94 0.40
800 16 5.59 0.40 3.88 0.28
17 5.94 0.43 3.88 0.28
18 6.29 0.45 3.88 0.28
19 6.64 0.48 3.88 0.28
20 6.99 0.50 4.59 0.33
21 7.34 0.53 4.59 0.33
22 7.69 0.55 4.59 0.33
23 8.04 0.58 4.94 0.35
24 8.39 0.60 4.94 0.35
30 What Do I Need to Know about Eye Movements?
noted that 1° of visual angle will often correspond to three to four letters in
text-based eye-tracking studies, suggesting the smaller font sizes in this chart are
somewhat more common.
FIGURE 2.6
A sequence of fixations (circles) and saccades (lines) on a TOEFL®
Primary™ practice reading test item. These are the eye-movement data
of an eight- to ten-year-old child working through the reading test item.
Larger circles indicate longer fixations.
(Source: Ballard, 2017. Copyright © 2013 Educational Testing Service. Used with permission).
What Do I Need to Know about Eye Movements? 31
analyze, but without which humans could not perform the complex visual tasks
that they do.
Fixations are periods during which the eye is relatively still, and the individual
is looking at a specific area in the visual field. People sample (i.e., take in) the
visual environment during these periods of stillness. During most fixations, the
observer is extracting and processing information from the site where he or she is
currently looking, or foveating. This area is referred to as the point of gaze or the
point of regard. Fixation durations range from ca. 50 ms to over 500 ms (Rayner,
1998; Rayner & Morris, 1991). Fixations relate to the when aspect of eye move-
ments. This is because the duration of an eye fixation is determined by when the
system decides to initiate a new eye movement. At the same time, eye fixations
also tell us something about the where of eye movements; that is, the eyes are fix-
ated somewhere in the environment and fixation location is often taken to be
informative of ongoing cognitive processing (see Section 1.2). A large part of this
chapter will be devoted to exposing the factors that influence the when and where
of eye movements during language processing. Looking ahead, Section 2.5 will be
about the higher-level cognitive factors (frequency, predictability, and contextual
constraint) that influence fixation durations. It is argued that cognitive factors play
an important role in the when of eye movements. Section 2.4 will deal with the
lower-level, visual features of language (e.g., spacing or word length) as well as
oculomotor constraints and how these jointly determine the selection of a fixa-
tion location. Thus, this section will address the question of where the eyes look.
Active vision underscores the importance of eye movements across a range
of different tasks. While a large body of work deals with eye movements during
reading, reading is a very specific and highly specialized task. Relative to the entire
course of human evolution, reading is also a recent human accomplishment. An
interesting question, therefore, is how strongly eye movements during reading
and eye movements in other visual tasks are related. Because of the relative youth
of reading skills, Reichle et al. (2012) proposed that “the processes that guide eye
movements in other tasks … almost by definition have had to be co-opted and
coordinated (through extensive practice) to support the task of reading” (p. 176,
my emphasis). The authors demonstrated through computational modeling that
the basic assumptions of their reading model (E-Z Reader, see Section 2.6) can
be used to model fixation durations and locations across a range of non-reading
tasks. In so doing, these authors offered the first unified account of eye movements
across different tasks.
A complementary approach to computational modeling, and one that in many
ways precedes it, is the accumulation of empirical data. A good approach to disen-
tangle task-specific from general effects in eye movements is to compare viewing
from the same participants across a range of tasks. To the extent that the partici-
pants’ eye-movement metrics differ (or do not correlate) between the tasks, the
measures can be said to be domain-specific. Shared properties, in contrast, indicate
that domain-general mechanisms are at work. Luke, Henderson, and Ferreira (2015)
32 What Do I Need to Know about Eye Movements?
TABLE 2.2 The range of mean fixation durations and saccade length in different tasks
Saccade length
Fixation duration Degrees of visual angle Letters
Silent reading 225–250 2 7–9
Oral reading 275–325 1.5 6–7
Scene perception 260–330 4-5
Visual search 180–275 3
Note: Fixation duration in scene perception and visual search can vary strongly depending on the exact
nature of the task that participants are asked to perform.
(Source: Rayner, 2009).
What Do I Need to Know about Eye Movements? 33
processing of language. Thus, it is time we turn to saccades, which are the second
salient characteristic of eye-movement behavior.
“Saccade is a fancy name for eye movement” (Rayner, n.d.).The term saccade
was borrowed from French, where it means a ‘jerk’ or ‘twitch’ (Wade, 2007; Wade
& Tatler, 2005). This is an accurate designation for this type of eye movement,
considering that saccades, which occur in between two eye fixations, are very
fast, ballistic movements of the eye. They are the fastest displacements of which
the human body is capable (Holmqvist et al., 2011): fastest in people’s teenage
years (John Hopkins Medicine, 2014), slowing down with age (John Hopkins
Medicine, 2014), and potentially related to individual differences in impulsivity
(Choi,Vaswani, & Shadmehr, 2014). Saccades bring the eyes from one location to
the next to provide the cognitive system with new visual information.This is nec-
essary because the region of sharp vision is limited (see Section 2.1).Therefore, to
increase processing efficiency, humans and animals move their eyes so new infor-
mation falls on the high-acuity region of the eye, known as the fovea (also see
Figure 2.1).To foveate, then, is to look straight at a word or object so it is perceived
with the most sensitive part of the retina, which is the fovea.
Saccades can be described in terms of their amplitude, duration, velocity, acceler-
ation, and deceleration (e.g., Gilchrist, 2011): see Figure 2.7. Velocity represents the
speed and direction of movement. Saccadic velocity is expressed as degrees of visual
angle per second (°/s). Many eye trackers use the information about the velocity of
the eyes to distinguish saccades from fixations (see Section 9.1.3).Wright and Ward
(2008) report that saccades can have peak velocities of 600°/s to 1000°/s. However,
because saccades are brief (30 to 80 ms; Holmqvist et al., 2011), the actual distance
covered by the eye—that is, the saccadic amplitude—tends to be relatively small,
typically from < 1° to 15° (Gilchrist, 2011).1 For shifts in eye gaze larger than 15° or
20°, the head moves along with the eyes. The velocity of a saccadic eye movement
is not a constant, but characterized by a period of acceleration followed by decel-
eration, with peak velocity as the turning point (see Figure 2.7). Acceleration and
FIGURE 2.7 Idealized saccadic profile: eye gaze displacement, velocity, and acceleration.
deceleration are derived mathematically from velocity (see Holmqvist et al., 2011)
and are expressed as °/s2. Saccade amplitude, duration, peak velocity, and accelera-
tion/deceleration are all positively related. This relationship is known as the main
sequence (Bahill, Clark, & Stark, 1975), a term that was borrowed from astronomy.
To the best of my knowledge, saccade properties, with the exception of regres-
sions during reading, have yet to be analyzed in SLA research. Work on L1 read-
ing development points to a promising role for saccadic amplitude (i.e., saccade
length) in particular. This line of research has revealed that as children become
more skilled readers, their fixation time decreases and saccade length increases,
meaning the children make fewer and shorter fixations on each sentence (see
Blythe, 2014; Blythe & Joseph, 2011; Reichle et al., 2013, for reviews). Thus,
saccade amplitude indexes L1 reading skill. It seems worthwhile extending the
use of the amplitude measure to L2 reading research to investigate if more fluent
L2 readers also make longer eye movements. Preliminary support for this claim
comes from a study by Henderson and Luke (2014). These authors investigated
whether saccade length and fixation duration are similar across tasks (i.e., domain
general) or task specific. A group of healthy adults completed one L1 reading
and three non-reading tasks and then repeated the same tasks two days later.
Henderson and Luke found that saccade length was stable within individuals
across time, but did not correlate between scene viewing and reading. Henderson
and Luke concluded that where individuals move their eyes to—and hence saccade
length—was task-specific, although within a given task type, their participants
(proficient L1 speakers) tended to reproduce the same type of viewing behavior.
Therefore, changes in saccade length within individuals over time could point
to changes in the cognitive processes that support their reading or other men-
tal activity. Specifically, an increase in L2 readers’ mean saccade length might be
indicative of their L2 reading development.
A noteworthy property of saccades is that the eye and the brain seemingly do not
take in any new visual information during a saccadic eye movement. This phe-
nomenon is known as saccadic suppression (Matin, 1974).When the eyes make
a saccade, the rapid motion (smearing) of the image across the retina causes dif-
ficulty with spatiotemporal integration. Consequently, the retinal image is blurred,
yet this is not what people perceive (i.e., our vision does not become blurred,
or at least we do not think it does, every time we move our eyes). The details of
the neural mechanisms of saccadic suppression are complex and remain under
investigation (e.g., Binda, Cicchini, Burr, & Morrone, 2009; Cicchini, Binda, Burr,
& Morrone, 2013; Panichi, Burr, Morrone, & Baldassi, 2012; Thiele, Henning,
Kubischik, & Hoffmann, 2002; Thilo, Santoro, Walsh, & Blakemore, 2004). One
visual factor that contributes to saccadic suppression is backward lateral mask-
ing (see Matin, 1974, for review and discussion). Backward lateral masking is the
process whereby the more stable visual input at the eyes’ new resting place (i.e.,
after the eye movement) overwrites the transient stimulation during the preced-
ing saccade. Specific brain areas further contribute to the reduced visual sensitivity
What Do I Need to Know about Eye Movements? 35
during saccadic eye movements (e.g., Binda et al., 2009; Thiele et al., 2002; Thilo
et al., 2004), such that saccadic suppression likely has both visual and neural causes.
The net result is that in spite of frequent eye movements, people do not perceive
a blur, but a stable visual world. Although visual intake is strongly limited dur-
ing saccadic eye movements, there is evidence to suggest that lexical processing
continues (Irwin, 1998;Yatabe, Pickering, & McDonald, 2009).Yatabe et al. (2009)
presented participants with short English sentences that were split into two parts.
To read the second part of the sentence, the participants had to make a long sac-
cade to the right as shown in Figure 2.8. Yatabe and colleagues designated the
last word of the first part as the target word and the first word of the second part
as the spillover word. The sentences contained a frequency manipulation, such that
the spillover word (e.g., remained) was preceded by either a high-frequency target
(e.g., prison) or a low-frequency target (e.g., hangar). Low-frequency words tend
to elicit longer fixations, which is known as a frequency effect (for further details,
see Section 2.5).When the frequency effect is attested on the following word, this
is called a spillover effect, because the frequency effect is said to spill over on the
following word.Yatabe and colleagues were primarily interested in these spillover
effects.They found that the spillover effect on remained depended on the length of
the preceding saccade. Specifically, a low-frequency target word induced a larger
fixation-time increase on the spillover word when the two words were separated
by a 10°, rather than a 40°, saccade. Yatabe and colleagues took this to mean
that participants continued processing the low-frequency target word during the
next saccade. Because longer saccades take more time to complete, less processing
remained to be done after the eyes landed on the spillover word following a 40°
saccade.
Considering that cognitive processing continues during saccades (Irwin, 1998;
Yatabe et al., 2009), the question becomes whether saccade durations should be
added to fixation durations to obtain more accurate measures of processing time.
This proposal deviates from current practice because most algorithms calculate
processing time (i.e., fixation measures such as gaze duration or total time) based
on fixation times only (see Chapter 7).The question of saccade-inclusive process-
ing time is yet to be resolved. However, findings of one study by Vonk and Cozijn
(2003) suggested that saccade-exclusive and saccade-inclusive processing times
(i.e., fixation measures that include the preceding and following saccades) yield
similar results. In Vonk and Cozijn’s study, including saccade durations in first-pass
reading times led to increases in effect of -6 ms (from -48 ms to -54 ms) and
+2 ms (from 19 ms to 21 ms). This did not change the outcome of the statisti-
cal analysis in either case. Even so, the authors maintained that saccade durations
“should be included in measures of reading time as soon as fixation durations are
accumulated, i.e., if more than one fixation is contained in the measure, because
[saccade durations] contribute to language processing time as well as fixations do”
(p. 307, original emphasis).
Although fixations and saccades will be the primary characteristics that L2
researchers focus on in eye-movement data, these events make up only part of the
eye-movement repertoire. To support vision, especially vision in naturalistic set-
tings, other types of eye movements are also necessary. Specifically,
Vergence refers to the inward or outward (as opposed to parallel) rotation of the
eyes in order to focus on an object that is located at a different distance from the
two eyes (Krauzlis, 2013). The vestibulo-ocular reflex is a mechanism to com-
pensate for head movement, whereby the eyes move automatically in the direc-
tion opposite to the head movement (Krauzlis, 2013). Smooth pursuit is the
voluntary tracking of a moving visual target, such as a tennis ball flying through
the air or a cheetah running off in the wilderness. Compared to saccades, smooth
pursuit movements are slower, with reported peak velocities anywhere from
30°/s (Wright & Ward, 2008; Young & Sheena, 1975) to 90°/s (Meyer, Lasker,
& Robinson, 1985) or 100°/s (Holmqvist et al., 2011). Smooth pursuit consists
of a combination of smooth movements and catch-up saccades (Barnes, 2011;
What Do I Need to Know about Eye Movements? 37
FIGURE 2.9 Fixational eye movements during a three-second fixation. The bold line
strokes are microsaccades.
(Source: Reprinted from Engbert, R., 2006. Microsaccades: A microcosm for research on oculomotor
control, attention, and visual perception. Progress in Brain Research, 154, 177–192, with permission
from Elsevier).
Hafed & Krauzlis, 2010; Krauzlis, 2013). Current eye-tracking technology is not
yet capable of measuring them accurately.
Finally, within the class of eye fixations, there is a subcategory of miniature eye
movements, which are called fixational eye movements.What this tells us is that
the term eye fixation is a bit of a misnomer (Rayner, 1998), because what appears as
a fixation (i.e., a period of stillness) is in fact marked by miniature eye movements.
Figure 2.9 plots the data of one individual, who was asked to fixate three sec-
onds on a small marker that appeared in the center of the screen (represented in
the plot as the 0° crosshairs). This random pattern shows extensive slow, mean-
dering motions, known as drift, and very fast, tiny oscillations superimposed on
the drift, which are tremor (Engbert, 2006; Martinez-Conde & Macknik, 2007,
2011). Bolded in Figure 2.9 are three fast, linear eye movements, or microsac-
cades. Microsaccades are very short saccades, which typically span less than 1°
of visual angle and occur 1–2 times per second (Engbert, 2006). Involuntary and
unconscious, microsaccades carry an image across the retina to refresh the visual
input to the photoreceptor cells. This is necessary to counteract neural adapta-
tion; that is, without renewed visual stimulation, stationary images would quickly
fade from view (Engbert, 2006; Martinez-Conde & Macknik, 2007, 2011), much
like a frog cannot see a fly sitting still on a wall, but will spot and swallow the
insect as soon as it moves (Lettvin, Maturana, MsCulloch, & Pitts, 1968). Although
visual reactivation is the primary function of microsaccades, Martinez-Conde and
Macknik (2011) noted that microsaccades can also serve to correct prior saccades
that landed slightly off-target. In their view, microsaccades and saccades form a
continuum that is underpinned by “the same neural mechanisms” (p. 105). In
38 What Do I Need to Know about Eye Movements?
FIGURE 2.10
The visual field. The gray oval represents the perceptual span from
which readers extract information when processing text.
FIGURE 2.11
The gaze-contingent moving window paradigm. The size of the
rectangles (ten characters) represents the size of the window.
(Source: Adapted from Rayner et al., 2016).
technique, Ikeda and Saida (1978) and Choi and Koh (2009) observed larger
spans for Japanese and Korean readers. Their participants were able to process six
characters (Ikeda & Saida, 1978) or six to seven characters (Choi & Koh, 2009) to
the right of fixation, respectively.
In general, there is a tradeoff between information density in a language and
span size, such that the amount of information to be obtained from any single
fixation is about the same across languages (Rayner et al., 2012). Feng, Miller, Shu,
and Zhang (2009) noted that if words rather than letters or characters are used as
the basis of measurement, the perceptual span is the same in Chinese and English.
The perceptual span is an important construct in cross-linguistic research
because of how languages are read. Specifically, the direction of reading deter-
mines how the perceptual span extends spatially. Paterson and his colleagues
(2014) demonstrated this in a recent gaze-contingent moving-window study, in
which they measured the perceptual span of Urdu-English bilinguals. Urdu is
a Perso-Arabic alphabetic language read from right to left, which makes for an
interesting comparison with English. One notable feature of Paterson et al.’s study
is that letters outside the window were replaced with visually impaired filtered
text (see Figure 2.12), rather than different letters, to simulate visual resolution
degradation. Using the moving-window technique, the authors found that com-
pared to normal reading, the rate of processing Urdu text was the lowest when the
window was symmetric and small (0.5 degrees of visual angle to both the right
and left of fixation (.5_.5)). Reading in Urdu was the fastest when the window
was asymmetric to the left of fixation and larger (1.5_.5 and 2.5_.5). Conversely,
the same readers processing English text also read the slowest with the symmetric
window but fastest when the asymmetry of the text extended rightwards (.5_2.5).
42 What Do I Need to Know about Eye Movements?
FIGURE 2.12
Urdu and English sentences displayed in a gaze-contingent moving
window paradigm with different window sizes.
(Source: Paterson et al., 2014).
The initial landing patterns on a word reflect the influence of lower-level, visual,
and oculomotor variables (see Textbox 2.2).Whereas the eyes land in the center of
short (5-letter) words, they tend to be shifted to the front in longer words. This is
because of the limits of the perceptual span (i.e., a lower-level visual constraint; see
Section 2.3). Specifically, when the next word is long, some letters may fall outside
the perceptual span and therefore will not contribute to landing site calculations
(Balota, Pollatsek, & Rayner, 1985; McClelland & O’Regan, 1981; Rayner, Well,
Pollatsek, & Bertera, 1982). Landing sites can further be explained by the location
of the preceding fixation (i.e., the launch site), due to the so-called center-of-
gravity assumption (Vitu, 1991).Vitu explained that the closer the launching site is
to the beginning of the following word, the more likely the eyes are to overshoot
the center of the next word or even skip it (see Textbox 2.3 and 2.4). This is again
because of the amount of information represented in the perceptual span, which
will be greater if the launching site is closer. Likewise, the eyes will undershoot the
center of a word when the distance between the launch site and the beginning
of the next word is large, given that fewer letters will be detected in parafoveal
What Do I Need to Know about Eye Movements? 45
vision (termed the periphery in Vitu’s work). Figure 2.13 illustrates this with two
examples from Siyanova-Chanturia, Conklin, and van Heuven (2011), where the
preceding fixation influences the landing location on the following word. When
the launching site from the word sciences is far from the following word arts, as in
the case of (A), the following fixation falls short of the center of the following
word. On the other hand, the second fixation (refixation) on sciences in (B) is close
to arts, which may then have caused the reader to overshoot his or her target.
The influence of the current viewing location on the next saccade destination
is one reason it is important to control for (i.e., keep the same) the preceding text.
For example, Siyanova-Chanturia and colleagues compared L1 and L2 reading
patterns for frequent and infrequent binomials such as arts and sciences (frequent)
and sciences and arts (infrequent). When extracting eye-movement measures, they
defined the whole binomial phrase as a single interest area (see Figure 2.13 for a
reconstructed example). This approach was preferable to analyzing fixation times
for each noun separately because that would have introduced differences in pre-
ceding context between the nouns in the two conditions. For example, arts is
preceded by sciences and in the low-frequency condition shown in Figure 2.13,
whereas it is preceded by across in the corresponding high-frequency condition.
There are cases where it may be more justifiable to conduct a word-based anal-
ysis of two-noun sequences, even when the same noun occurs in slightly different
places in the sentence. In an incidental vocabulary learning study, Godfroid, Boers,
and Housen (2013) investigated whether strong contextual cues to the meaning
of a new word (i.e., near-synonyms) aided L2 vocabulary learning beyond simply
reading the novel words in context, with no near-synonyms supplied. Participants
read short English texts embedded with novel words, which were the targets for
learning in the study. In one condition, a contextual cue preceded the target word
(e.g., boundaries or paniplines) whereas in another condition, the cue followed the
target word (e.g., paniplines or boundaries). Despite this slight variation in word
order, the results of a word-based analysis of just the cue (i.e., boundaries) con-
verged with findings from an analysis of the whole phrase, which I reported in
FIGURE 2.13 Undershooting (A) and overshooting (B) of arts in the phrase sciences and
arts given different launch sites in the word sciences.
(Source: Siyanova-Chanturia, Conklin, & van Heuven, 2011).
46 What Do I Need to Know about Eye Movements?
my dissertation (Godfroid, 2010), but omitted from the journal article for space
reasons. The two analyses showed that learners utilized the semantic cues only
when they followed the novel word, an effect that was found in both early and
late eye-movement measures.
(1) Jill looked back through the open window to see if the man was there.
(Rayner & Well, 1996)
Word frequency information was retrieved from the CELEX corpus and entered
as log frequency. Researchers commonly use log frequency, rather than raw fre-
quency values, in their analyses to account for the fact that the frequency dis-
tribution of words in natural language is positively skewed. Furthermore, the
relationship between frequency and reaction times, of which eye fixation dura-
tions are a special case, is non-linear. Small changes at the low end of the fre-
quency scale (e.g., from 1 to 10 occurrences per million words) have a similar
effect on reaction times as much larger changes at the high end of the scale (e.g.,
from 1000 to 10,000 occurrences per million words). To account for these facts,
Kliegl and colleagues divided the words in their corpus into five “logarithmic
frequency classes: class 1 (1–10 per million): 242 words; class 2 (11–100): 207
words; class 3 (101–1,000): 242 words; class 4 (1,001–10,000): 227 words; class 5
(10,001–max): 76 words” (p. 267).
To understand how predictability, frequency, and word length affected reading
behavior, the authors conducted two types of analyses, one on the entire corpus
(the statistical control approach) and the other on a selected subset of target words
(the experimental control approach). The reason they ran two analyses is that
48 What Do I Need to Know about Eye Movements?
word length and frequency were correlated in the Potsdam Sentence Corpus
(r = -.64), as is normally the case in natural language: longer words tend to be less
frequent. Therefore, to disentangle the effects of length and frequency on read-
ing, the authors repeated the analysis on a subset of words for which length and
frequency were uncorrelated (r = -.01).
The results of the study revealed that, for the entire corpus, length and fre-
quency were related to fixation duration and fixation probability measures, while
predictability only influenced fixation probability. These findings are consistent
with previous studies that reported increases in fixation duration for longer words
and decreases for frequent words (see Table 2.3). Predictable words were skipped
more often, which also echoed findings from previous research. As for the target
word analyses, first-pass duration was influenced by all three variables.Target word
predictability, however, was strongly linked to second-pass duration; specifically,
low predictable words had higher rereading times. The comparison of effects for
corpus and target words indicated that word length and word frequency effects
generalized to all words (the regression coefficients for the predictors in both
analyses were similar). Findings for predictability generalized to fixation probabil-
ity but not fixation duration.Therefore, some caution is needed when extrapolat-
ing the findings from a tightly controlled experimental design to more naturalistic
sentence or text reading, as was observed in the corpus analysis.
The previous description provides some insights into the kinds of factors
researchers need to consider when designing their experimental materials. In so
doing, it lays the foundation for Chapters 3 and 4, which are devoted to study
design. Meanwhile, we turn to word familiarity and age of acquisition (AoA) as
two variables that are related to, yet distinct from, frequency.
In reading research, both AoA and word familiarity are often measured using
subjective ratings. AoA refers to the age at which a word was first encountered,
while word familiarity indicates the degree of familiarity with a word. Prior to the
main experiment, researchers may collect normative data of the tested variables.
Depending on whether AoA and word familiarity are the variables of interest
or are control variables, researchers aim to have either a wide range or very lit-
tle variability in the AoA and word familiarity ratings, respectively. This is what
the norming data are used for. For example, in a norming study for their second
experiment, Juhasz and Rayner (2006) measured AoA and word familiarity using
a 7-point Likert scale. Twenty undergraduate students rated words’ AoA from 1
to 7, with 1 indicating the word was acquired between age 0 to 2, and 7 reflect-
ing the word was acquired after the age of 13. Another group of undergraduates
similarly rated the words’ familiarity, with higher numbers on the rating scale
signaling a greater familiarity with the word. In a series of two experiments,
Juhasz and Rayner investigated the influence of AoA and frequency (with famili-
arity, concreteness, imageability, and length controlled) on word recognition.They
employed an experimental control approach. In Experiment 2, the participants
were exposed to sentences in four conditions (High Frequency [HF]–early AoA;
TABLE 2.3 Variables influencing fixation duration
HF–late AoA; Low Frequency [LF]–early AoA; LF–late AoA), where target nouns
in the sentences reflected the frequency and AoA of a given experimental condi-
tion. Given the high correlation between frequency and AoA (high-frequency
words are acquired relatively early in life), the criteria for low and high frequency
words in the study were more lenient than in previous research (mean frequencies
HF–late AoA, 75 per million; LF–early AoA, 6 per million) because the number
of words that qualified as HF–late AoA and LF–early AoA was limited. ANOVAs
revealed that there were main effects of frequency and AoA but no interaction
effects. Late-acquired words (average ratings of 4.9) were fixated longer than the
early-acquired words (average ratings of 2.75). These effects were independent
of frequency, which also influenced fixation duration, and familiarity, concrete-
ness, imageability, and word length, which were controlled in the present study.
These interesting findings notwithstanding, employing subjective ratings of AoA
suffered from one limitation: participants relied on more sources (e.g., order of
acquisition) to rate AoA since recalling the exact age of acquisition of a word is
difficult.
More recently, Joseph et al. (2014) examined the effects of AoA, using order
of acquisition (OoA) as a “laboratory analog of AoA” (p. 245). Of interest was
whether OoA influenced novel-word processing and acquisition when total
exposure to the words was held constant. Although the participants in the study
were English native speakers reading in their L1, similar research is being con-
ducted in bilingualism and SLA (Elgort, Brysbaert, Stevens, & Van Assche, 2018;
Godfroid et al., 2018; Godfroid et al., 2013; Mohamed, 2018; Pellicer-Sánchez,
2016). Participants in this study were exposed to 16 non-existing words over
five laboratory sessions, with half the words introduced on day 1 (early OoA)
and the other half on day 2 (late OoA). At the end of the five-day experiment,
all the words had occurred in exactly 15 sentences, meaning the total frequency
of exposure was controlled for. Word length was also held constant. Results from
linear mixed models revealed both exposure and OoA effects. The reading time
decreased after each encounter with a novel word, indicating the novel words
became more familiar to the readers. Interestingly, OoA had an effect on total
reading time only in the testing phase, but not during exposure. This is surprising
given that the test sentences were presented immediately after the last exposure
phase (day 5) and, from the participants’ point of view, were indistinguishable from
the exposure sentences. What differentiated the two sentence sets, however, was
the amount of contextual support. Target words appeared in meaningful contexts
in the exposure phase, but in neutral sentences in the test phase. (The latter served
as a test of implicit learning: see Elgort et al. (2018) for an example with L2 read-
ers.) The increase in total time for the late words, therefore, suggests that process-
ing of these words still relied to a greater extent on the surrounding context, so
when that contextual support was removed in the test phase, reading times went
up. Joseph and colleagues concluded that “the early words in [their] experiment
gained a higher quality of lexical status than the late words” (p. 245).
What Do I Need to Know about Eye Movements? 51
FIGURE 2.14
Models of eye-movement control.4 Note: POC (primary oculomotor
control) = models that assume low-level factors drive eye-movements,
PG (processing gradient) = models that assume attention is allocated in
a parallel fashion, SAS (sequential attention shift) = models that assume
attention is distributed serially, to one word at a time.
The Quarterly Journal of Experimental Psychology (Murray, Fischer, & Tatler, 2013)
devoted to serial vs. parallel processing in reading. The question of whether read-
ers process words serially or in parallel (explained in more detail following) applies
to cognitive control and oculomotor models alike. Although parallel processing
is perhaps associated more strongly with oculomotor models and serial process-
ing tends to be linked to cognitive-control models, serial-processing, oculomo-
tor models also exist (e.g., SERIF or the Competition/interaction model) and
parallel-processing, cognitive-control models exist as well (e.g., Mr. Chips) (Ralph
Radach, & Kennedy, 2013).This is why Jacobs (2000), Radach, Schmitten, Glover,
and Huestegge (2009), and Radach and Kennedy (2013) proposed to reclassify
eye-movement models along two axes: (i) autonomous saccade generation vs.
cognitive control and (ii) serial or parallel attention. These distinctions are neces-
sary to come to a more fine-grained understanding of how models of eye-move-
ment control differ. In particular, they can explain why the two most prominent
models of eye-movement control—E-Z Reader and SWIFT—are so different
(i.e., they fall in opposite corners of the two-dimensional space), even though
both are considered to be cognitive models (compare Figure 2.14 and 2.15).
According to SWIFT ([Autonomous] Saccade-Generation With Inhibition
by Foveal Targets), the eyes move forward through a sentence at more-or-less
fixed time intervals (Engbert et al., 2005), much as if humans had an internal
metronome or timer that dictated when the next eye movement must occur.
What Do I Need to Know about Eye Movements? 53
FIGURE 2.15 A grid of E-Z Reader and SWIFT models of eye-movement control.
This feature accounts for the autonomous saccade generation part of the SWIFT
model. At the same time, the internal metronome is subject to random and sys-
tematic noise. Furthermore, it can also be delayed when the reader experiences
difficulty processing the currently fixated word. Delaying the next eye movement
means that the current eye fixation will be longer because the eyes are staying
in place. Therefore, processing difficulty, for instance as a result of seeing a low
frequency or unpredictable word, will increase eye fixation duration through a
process of inhibition. It is this principle of foveal inhibition that makes SWIFT a
cognitive model, even though cognitive control is not the driving force behind
the eye movements, but autonomous saccade generation is.
Assuming the next saccade is programmed, where will the eyes go? Engbert and
colleagues (2005) posited that saccade targets are selected based on a competition
between words within “a dynamically changing activation field” (p. 778). The field
of activation is based on a complex mathematical model and represented visually
in Figure 2.16. It corresponds to the gray areas under the curve for the different
words and can be seen to change dynamically over time. In Figure 2.16, the thick
black line represents the eye making its way through the sentence in a sequence of
fixations (vertical lines) and saccades (horizontal lines). At most points in time more
than one word is activated and selection of the next saccade target (i.e., the end
point of the horizontal line) is determined through a competitive process between
all the currently activated words. For instance, 700 ms into the trial, vor (before, in),
Gericht (court), and nicht (not) are all activated and competing as fixation targets. Vor
eventually wins the competition when it is refixated around 800 ms post onset. An
important upshot of this theoretical view is that in SWIFT, attention is conceived as
a gradient (as opposed to a spotlight) that encompasses multiple words.The number
of co-activated words fluctuates over time, but Engbert and colleagues noted that
their results can be reproduced qualitatively by assuming concurrent activation of
three words: the currently fixated word n, the next word n + 1, and the word after
that, word n + 2 (p. 798). From a psychological viewpoint, this means that readers
54 What Do I Need to Know about Eye Movements?
are believed to attend to and process words in parallel, in a spatially distributed fash-
ion. Hence, SWIFT is a parallel-processing (or parallel-attention), cognitive
model with autonomous saccade generation.
Now what about E-Z Reader? Although E-Z Reader is also a cognitive model,
it differs from SWIFT with regard to both attention and saccade generation. In
E-Z Reader, the programming of a saccade begins, not after a set amount of time,
but when lexical access of the currently fixated word is imminent. This means that
the reader’s mental processor unconsciously makes an educated guess, based on the
information that has become available about the word form, that the retrieval of
word meaning will follow soon.Therefore, an early stage of lexical processing, called
familiarity check or L1, is the engine that moves the eyes through the text (see Figure
2.17; Reichle et al., 2013; Reichle, Pollatsek, & Rayner, 2006; Reichle, Rayner, &
Pollatsek, 1999, 2003; Reichle, Warren, & McConnell, 2009). In E-Z Reader, the
engine is cognitive (i.e., a part of word recognition and lexical processing), unlike
in SWIFT, where the engine operates largely autonomously. Furthermore, saccade
initiation is decoupled from full lexical access, or L2, because programming a saccade
takes about 150 ms to complete (Reichle et al., 2013; Reichle et al., 2012). Reading
What Do I Need to Know about Eye Movements? 55
will be more efficient if saccadic programming and lexical processing partly overlap.
Therefore, rather than proceeding serially, as was the case in the precursor Reader
model (Morrison, 1984), in the newer E-Z Reader model the lexical and oculo-
motor aspects of reading operate partly in parallel (see Figure 2.17). When lexical
access is complete, covert attention shifts to the next word (see Figure 2.17) and
overt attention (i.e., the eye gaze) normally follows soon after (for a discussion of
overt and covert attention, see Section 1.2). The time spent processing the next
word ahead of a direct eye fixation is known as a preview benefit (see Textbox 2.1).
Preview benefits result from parafoveal processing; that is, covert attention preceding
the eye gaze (see Section 1.2 and Figure 1.5a). It follows that attention in E-Z
Reader is word-based. It is allocated to individual words, one word at a time, and
it travels from one word to the next, like a spotlight or a beam, in a strictly serial
fashion, with lexical access as its guide. In short, E-Z Reader is a cognitive-control,
serial-attention model, in which two hypothesized stages of word recognition L1
and L2 trigger saccade initiation and attention shifts, respectively.
The question of whether words are processed serially or in parallel has occupied
a central place in contemporary reading research (Engbert & Kliegl, 2011; Reichle,
2011). As is well established in SLA circles, the nature of attention is a thorny issue.
Conceptualizations of attention in reading range from attention as a “one-word
processing beam” (Radach, Reilly, & Inhoff, 2007, p. 240) in serial-attention models
to an attentional gradient or “field of activation” (p. 241) in parallel-attention mod-
els. These questions have important implications for how information is encoded
and also concern L2 researchers. This is because many eye-tracking researchers in
bilingualism and SLA record eye movements precisely in order to study these very
phenomena—attention and processing. Furthermore, eye-movement data in our
fields are usually interpreted in relation to the region of analysis, or interest area,
for which they were observed. For example if the region of analysis is an ungram-
matical adjective, a recast, or a low-frequency noun, then eye-fixation durations
are commonly taken to reflect the processing of that adjective, recast, or noun.
Although there are exceptions, words or phrases are often the basic unit of analysis
and fixations tend to be interpreted in relation to the specific object on which
the eyes are fixated at any given time.5 This suggests most applied eye-tracking
researchers, including SLA researchers and bilingualism researchers, assume a tight
eye-mind link. In this regard, their work is probably aligned more closely with E-Z
Reader than SWIFT. At the same time, the preceding discussion has provided ample
evidence that the eyes and the mind do not always coincide (also see Figure 1.5).
There is “elasticity in the eye-mind assumption” (Murray et al., 2013, p. 417), such
that perhaps we ought to think about the eye-mind link as a “stretchy elastic band”
(Murray, 2000, p. 652). Understanding the amount of stretch, or the degree of eye-
mind decoupling becomes very important, as Murray and colleagues (2013) argued
that given enough flexibility, any serial-attention model can reproduce seemingly
parallel effects such as parafoveal processing and word n + 2 preview effects (Radach,
Inhoff, Glover, & Vorstius, 2013; Textbox 2.5). By the same token, parallel-attention
models can mimic serial processing if the span of attention is sufficiently reduced.
Readers who wish to examine these questions in greater depth are referred to
the literature on parafoveal-on-foveal effects (e.g., Drieghe, Rayner, & Pollatsek,
2008; Kennedy, Pynte, & Ducrot, 2002; Pynte & Kennedy, 2006; White, 2008).
Essentially, parafoveal-on-foveal effects are cases where the properties of the
upcoming word, which is seen parafoveally, influence the duration of fixations on
the currently fixated word n.The existence of such effects is uncontroversial at the
orthographic level but contested at the semantic level. If semantic parafoveal-on-
foveal effects are found to be true, they would offer strong evidence for parallel
processing.
Regardless of which theoretical view will ultimately prevail, good research
designs with appropriate control conditions will always be key to conduct-
ing valid and interpretable eye-tracking studies. As stated previously, most eye-
tracking studies in SLA and bilingualism are designed to compare eye gaze behav-
ior under different experimental conditions—ungrammatical vs. grammatical,
enhanced vs. unenhanced, ambiguous vs. unambiguous, and so on. Adequate
experimental control means these conditions differ only with regard to what the
researcher wants to study; there are no extraneous or confounding variables (see
Section 2.5 and Chapters 3 and 4). Given such a design, attentional allocation will
be similar under the different conditions. Attention may be serial in both condi-
tions or parallel in both conditions, but to a large extent these effects will cancel
each other out when drawing comparisons between conditions. What remains,
then, in the eye-movement record is primarily the effect, or signal, of the experi-
mental manipulation.Therefore, researchers can make claims about the processing
of a word or item that is currently in the reader’s eye gaze, although they cannot
rule out the possibility that neighboring words (word n + 1 and n + 2) are being
processed concurrently (Engbert et al., 2005).
2.7 Conclusion
The aim of this chapter was to provide the reader with a set of basic facts about
eye movements that are foundational to conducting research in language acquisi-
tion and processing. Many of these facts follow from the uneven layout of the
retina, with a small area of high-acuity vision in the center—called the fovea—that
is flanked by large areas of low visual resolution called the parafovea and periphery.
“The inhomogeneity of the retina and visual projections … [is] probably the
most fundamental feature of the architecture of the visual system” (Findlay &
Gilchrist, 2003, p. 2) and may be the only way a human-sized brain can combine
sharp vision with information intake from a large visual field (Findlay & Gilchrist,
2003). The inhomogeneity of the retina underlies some important phenomena,
including the perceptual span (also known as the functional field of view and the
perceptual lobe) and parafoveal processing. These phenomena were established early
on during the third wave of eye-tracking research (see Section 1.1.3), with the
development of the gaze-contingent moving-window paradigm (McConkie & Rayner,
58 What Do I Need to Know about Eye Movements?
Notes
1 Different authors provide somewhat different estimates of saccade characteristics.
Other proposed ranges of saccade velocity are 30–500°/s (Holmqvist et al., 2011),
130–750°/s Duchowski (2007) and, for peak velocity, 400–600°/s (Young & Sheena,
1975). Holmqvist et al. (2011) defined saccades as movements spanning 4–20° distance.
These authors recognized that smaller eye movements also occur, but categorized them
differently, namely as microsaccades and glissades.
2 The present discussion may remind the reader of the self-paced reading moving-win-
dow procedure, which was introduced in Section 1.1.2. Although moving-window
techniques in eye tracking and self-paced reading are similar, they differ in what causes
the window to move. As mentioned in Chapter 1, self-paced reading relies on read-
ers’ responses (button presses) for the display to change. In eye tracking, however, the
window moves along with the reader’s point of gaze on the screen.
60 What Do I Need to Know about Eye Movements?
3 As a reminder, the letter identity span is a subregion within the perceptual span where
the reader can identify specific letters.
4 1McDonald, Carpenter, and Shillcock (2005) for Serif; 2Feng (2006) for SHARE;
3
Yang (2006) for Competition/Interaction; 4Engbert et al. (2005) for SWIFT; 5Reilly
and Radach (2006) for Glenmore; 6Legge, Klitz, and Tjan (1997) for Mr. Chips;
7
Reichle, Pollatsek, and Rayner (2006) for E-Z Reader; 8Salvucci (2001) for EMMA.
5 One exception is the analysis of reading time data for a spill-over region, defined as the
word or words that follow a critical region. Researchers commonly analyze spill-over
regions to test for spill-over effects; that is, delayed effects from the critical region (see
Figure 1.5d for an example).
3
WHAT TOPICS CAN BE STUDIED
USING TEXT-BASED EYE TRACKING?
A SYNTHETIC REVIEW
Researchers are, by nature, curious. They tend to want to answer questions. Eye-
tracking researchers are no different, in that the tool they use—eye-movement
recordings—is a means that helps them answer research questions. This chapter is
about what types of questions have been successfully addressed using eye-tracking
methodology in SLA and bilingualism. Although questions can differ in their
level of granularity (see Bachman, 2005), I have attempted to cast them in gen-
eral terms, the goal being to render the breadth and diversity of contemporary
eye-tracking research. This will serve as a springboard for readers to formulate
their own research questions and kickstart their own research projects with eye
tracking.
In Section 3.1, I offer general advice on how to find a research topic. Section
3.2 represents the bulk of this chapter. It is a synthetic review of L2 eye-tracking
research with text that is organized thematically, by research strand. (A similar
synthetic review for the visual world paradigm will be presented in Section 4.2.)
I survey five research strands within the body of eye-tracking research that are
primarily text-based. A different subsection is devoted to each strand (Sections
3.2.1–3.2.5) and each subsection concludes with a list of key questions. Thus, at
the end of this chapter, readers will have a much clearer view of how their work
could fit into the current landscape of L2 eye-tracking research.
about how you could use it. Perhaps one of your professors or colleagues has an
eye tracker or someone in charge of a financial account gave you the welcome
news they are getting one for your program (stranger things have happened!).
Perhaps you are reading this chapter because you are wondering if eye tracking
would be worth your time and money, a good addition to your research arsenal
(for more on practical considerations regarding eye trackers, see Section 9.2.1).
Regardless of how you became interested in eye-tracking methodology, it is a
good idea to remind yourself that eye-tracking research shares with most other
types of research activity a goal to advance our knowledge of the world. This
means eye-tracking researchers build on their own and other researchers’ work
to produce knowledge that is both reproducible and generalizable. To contribute
to the field in this way, it is important to know the existing literature, because
this will help you identify interesting questions and potential topics for research.
I want to be clear that previous studies do not have to be eye-tracking studies
to be of interest to eye-tracking researchers. Ideas can come from any type of
behavioral research (e.g. experimental studies that yield accuracy data, reaction
times, or some other type of quantitative variable) and potentially observational
studies. Eye tracking can also offer a valuable perspective on many of the questions
addressed in ERP research (see Section 1.1.4). Therefore, a productive approach
for first-time eye-tracking researchers is to read existing literature “through eye-
tracking goggles”. By this I mean it can be a productive strategy to ask yourself
what, if anything, a study would gain from a replication with eye tracking. Not
all studies will gain from having eye-tracking data, but some will. And when you
find a study that does, you have a topic for your first eye-tracking project. In
Section 9.3.1, I present ten ideas for research to help you on your way. Ideas #8
and #9 are two examples from assessment (Cubilo & Winke, 2013) and vocabu-
lary learning (Lee & Kalyuga, 2011), respectively. These original studies did not
include eye-movement recordings, but they would benefit from a replication with
eye tracking. Examples from other areas of SLA research are available as well, so
the message is to read widely and keep an open mindset (put your goggles on).
Reading existing literature in this manner will train you to think deeply about
how eye tracking could enrich your research program and what questions eye
tracking can address for you.
In the remainder of this chapter I will take an approach that is complementary
to what I have described so far. I will focus on existing eye-tracking studies, not
because these are the only sources of inspiration, but because they represent well-
established areas of research where eye-movement recordings have already proven
their value. By identifying recurring research themes, I will offer one answer to
the question of what topics researchers can study using eye-movement record-
ings. The goal is to provide a list of research strands that have generated a good
amount of eye-tracking research, with the understanding that it is certainly pos-
sible to venture outside these strands based on your own reading of the literature.
Furthermore, the research methodology in these published papers is a good gauge
Research Topics in Text-Based Eye Tracking 63
of current practices in the field. Assuming eye tracking is like other areas of quan-
titative research, these practices will likely still evolve (e.g., Plonsky, 2013, 2014)
as the field of eye tracking in SLA and bilingualism continues to grow. Therefore,
I will draw on published literature in leading journals, but also offer my own
insights as I lay out methodological guidelines for doing eye-tracking research in
subsequent chapters.
involved watching a video with narrated audio (but no subtitles in Suvorov’s case).
Extensions of visual world experiments with printed words, rather than images,
on the screen were still considered visual world, in line with these studies’ self-
labeling (e.g., Tremblay, 2011). Two studies required further scrutiny. Bolger and
Zapata (2011) combined elements of text-based eye tracking and visual world
eye tracking in different parts of their vocabulary instruction research. Conversely,
Kaushanskaya and Marian’s (2007) study did not include any audio but showed all
the other characteristics of a visual world study. I categorized both studies as visual
world research (i) because of the similarities in visual display (a few large elements
on the screen) and (ii) because of the conceptual focus on interference effects (a
recurring theme in visual world studies). In sum, although the basic distinction
between text-based eye tracking and visual world eye tracking subsumed a variety
of studies in each category, it was possible to assign each study to a single category
using a relatively small set of decision criteria.
The literature search revealed there are about twice as many text-based stud-
ies (k = 52) as visual world studies (k = 32) in SLA to date. As can be seen
in Figure 3.1, the journal representation is skewed. Most eye-tracking stud-
ies with L2 speakers or bilinguals have appeared in a handful of journals, and
primarily in Bilingualism: Language and Cognition and Studies in Second Language
Acquisition. Bilingualism: Language and Cognition accounts for almost half of all
the visual world studies published in the field to date. To some extent, this may
reflect a shared interest in the bilingual lexicon by visual world researchers and
the journal’s readership; that is, many of the visual world papers that appear in
Bilingualism: Language and Cognition deal with questions of joint lexical activa-
tion of a bilingual’s two or more languages. Eye-movement research has also
FIGURE 3.1
Distribution of eye-tracking studies across 16 SLA journals. Note:
SSLA = Studies in Second Language Acquisition; VWP = visual
world paradigm.
Research Topics in Text-Based Eye Tracking 65
3.2.1 Grammar
The online search revealed a total of 19 text-based eye-tracking studies that
addressed topics related to the representation, processing, and acquisition of gram-
mar (for a summary table, see Table S3.1 in online supplementary materials). In
many cases, the authors of these studies took a formal-linguistic perspective on
L2 acquisition and processing, although psychologically inclined research studies
also exist. Together with vocabulary research, which is the second largest strand,
grammar studies account for most of the sentence-processing literature, because
trials in grammar studies tend to consist of single sentences or collections of just a
few sentences, rather than longer texts.2 There are at least four different approaches
to studying grammar. These approaches differ in what manipulation is embedded
in the critical sentences (i.e., the sentences that are of interest, as opposed to filler
sentences). (i) Studies in an anomaly detection or violation paradigm rely
on sentences that contain a grammatical, semantic, pragmatic, or discourse-level
anomaly (e.g., Clahsen et al., 2013; Ellis et al., 2014; Godfroid et al., 2015; Hopp &
León Arriaga, 2016; Keating, 2009; Lim & Christianson, 2015; Sagarra & Ellis, 2013;
Zufferey et al., 2015)). (ii) Research in an ambiguity resolution paradigm uses
sentences that contain a syntactic ambiguity (e.g., Chamorro et al., 2016; Dussias
& Sagarra, 2007; Roberts et al., 2008). (iii) Researchers working in a depend-
ency paradigm create sentences that have long-distance syntactic dependen-
cies, for instance wh-questions or relative clauses in English (e.g., Boxell & Felser,
2017; Felser & Cunnings, 2012; Felser et al., 2009, 2012). Selective reviews of the
work in these paradigms can be found in Dussias (2010), Jegerski (2014), Keating
and Jegerski (2015), and Roberts and Siyanova-Chanturia (2013). (iv) A fourth
approach to studying grammar does not involve any of the above manipulations;
66 Research Topics in Text-Based Eye Tracking
it is the non-violation paradigm (e.g., Godfroid & Uggen, 2013; Spinner et al.,
2013;Vainio et al., 2016). Incidentally, the non-violation paradigm is the default in
visual world research, where studies do not normally contain a grammatical anom-
aly, even when these studies aim to measure grammar knowledge (see Chapter 4).
So, what topics do researchers investigate using all these different paradigms? Not
surprisingly, there is quite a bit of variation. A major question is whether L2 speak-
ers have acquired a particular aspect of the grammar, whereby the focus is often
on morphosyntax (e.g., tense, person, number, gender, or case markings). In studies
that aim to measure learners’ internal grammar, acquisition (understood as whether
a given function is a part of the grammar or not) is operationalized in terms of
grammatical sensitivity (see Godfroid & Winke, 2015, for discussion). The assump-
tion is that when participants have knowledge of a grammatical function, they will
slow down (or react in some other manner) to forms that violate their internal
grammar. Consequently, questions of acquisition are typically addressed using an
anomaly paradigm, whereby the researcher compares processing of grammatical and
ungrammatical sentences.3 In an eye-tracking study, this means the researcher cre-
ates a grammatical and an ungrammatical version of the same item and compares
reading times and other types of eye-movement behavior for the two sentences. (In
most cases, participants will read only one of the two versions of the item, because
the design is counterbalanced; see Sections 5.2 and 6.3.1.1, for more information.)
Here are two examples from Hopp and León Arriaga (2016), who studied the pro-
cessing of case in L1 and L2 Spanish using an anomaly paradigm. In this and the
following examples, words in boldface represent the critical area(s) in the sentence.
Unless otherwise noted, participants read the sentences in regular print.
Example (1) is a sentence pair, or doublet, with a ditransitive verb, prometer, “to
promise”. Indirect objects in Spanish are marked with a (and a + el becomes al),
so the unmarked object el vecino (“the neighbor”) in 1b is ungrammatical.
A second line of inquiry concerns the nature of the mechanisms that interface
between people’s grammar knowledge and their overt linguistic behavior. The
technical term for this is parsing (also see Textbox 1.1). Juffs and Rodríguez
(2015) likened the relationship between the grammar and the parser to two states
of a combustion engine.
The grammar is the engine at rest, not driving the vehicle, but with the
potential to do so. Parsing is the engine in motion, subject to stresses and
possible breakdowns allowable by the system, and driving production or
comprehension in real time.
( Juffs & Rodríguez, 2015, p. 15)
Research Topics in Text-Based Eye Tracking 67
In Example (5), the noun phrase which animals moved out of its canonical direct
object position to form an indirect question. According to formal-syntactic the-
ory, the noun phrase left behind two traces or gaps in its movement up the syn-
tactic tree. The first trace is at its base-generated position (the plan would protect the
animals) and the second trace is at an intermediary gap site inside the complex
subject (the plan to look after/that looked after the animals). For the sentence in (5) to
be grammatical, the complex subject must contain an infinitive rather than a finite
verb; that is (5a) is grammatical but (5b) is not (Kurtzman, Crawford, & Nychis-
Florence, 1991; Phillips, 2006, as cited in Boxell & Felser, 2017).
The wh-phrase which animals is called a filler and the empty category at its base
extraction site is a gap. Of interest is whether L2 speakers will show a syntactic
reflex in their processing behavior known as filler-gap dependency processing.
Essentially, this means whether they will slow down at gap sites (to integrate the
filler with its original location) and at the same time avoid positing gaps in places
where they are illicit. Consequently, researchers who work in a dependency para-
digm will compare reading times for sentences where a slowdown is expected
(e.g., [5a]) with reading times for sentences where a slowdown is not expected
(e.g., [5b]).
Dussias (2010) highlighted the importance of data collection method in study-
ing the processing of syntactic dependencies. Given that the sentences in depend-
ency research tend to be complex, having participants read them word-by-word
or segment-by-segment (as is the case in self-paced reading), may further increase
cognitive load. This may be especially hard for L2 speakers, for whom processing
is generally more effortful. Indeed, it is noteworthy that self-paced reading studies
have generally supported the shallow structure hypothesis (e.g., Felser, Roberts,
Marinis, & Gross, 2003; Marinis, Roberts, Felser, & Clahsen, 2005; Papadopoulou
& Clahsen, 2003), whereas eye-tracking studies have found that L2 readers do
show sensitivity to gaps, only slightly later than L1 controls (Boxell & Felser, 2017;
Felser et al., 2012). This raises the issue of lexical processing speed again (Hopp,
2014). Could subtle processing differences between L1 and L2 speakers be due
to lexical processing differences rather than syntactic ones? To conclude that L2
speakers truly have difficulties with syntactic processing, it seems important to
account for lexical influences first.
Finally, a growing number of studies on grammar acquisition and processing
have adopted general-cognitive frameworks of learning, such as associative learn-
ing theory (Ellis, 2006), the noticing hypothesis (Schmidt, 1990), and the tuning
hypothesis (Cuetos & Mitchell, 1988). Studies by Ellis et al. (2014), Sagarra and
Ellis (2013), and Godfroid and Uggen (2013) explicitly draw on the idea of the
eye gaze as a marker of overt attention to study reliance on lexical or morpho-
logical cues (Ellis et al., 2014; Sagarra & Ellis, 2013) or noticing of morphology
(Godfroid & Uggen, 2013) during processing. Dussias and Sagarra (2007) offered
a frequency-based account of bilinguals’ parsing preferences, suggesting even
the L1 parser is not impervious to environmental influences. Of these studies,
70 Research Topics in Text-Based Eye Tracking
Godfroid and Uggen (2013) did not involve any ungrammatical, incongruent,
or ambiguous sentences. It will be discussed here as an example from the fourth
paradigm, the non-violation paradigm (see also Vainio et al., 2016)
Godfroid and Uggen investigated whether beginning learners of German dis-
tinguished between, or “noticed” (Schmidt, 1990), German verb stem variants
(i.e., allomorphs). In critical trials both stem variants of a verb appeared together
in two stacked sentences (see Figure 3.2). Godfroid and Uggen compared looks
to the marked verb stems (verb stems that had undergone a vowel change) with
looks to matched, unmarked verb stems that appeared in control trials. The
researchers wanted to know whether the participants, who had only been taught
how to conjugate the unmarked verb forms, would learn the verb allomorphs
from meaning-focused exposure. In this regard, they were especially interested
1. Have L2 speakers acquired a particular aspect of the grammar and can they put their
knowledge to use in real time? (violation paradigm)
2. How do L1 and L2 speakers of different languages and language pairings parse
sentences? Do L2 speakers rely on the same structure-based principles as L1 speakers
do? (ambiguity paradigm, dependency paradigm)
3. Do parsing routines transfer between a bilingual’s two (or more) languages? (ambiguity
paradigm, dependency paradigm)
4. What is the role of individual differences in syntactic processing? (any paradigm)
5. How do L2 learners engage with unfamiliar forms they encounter in the input? Is
their online processing behavior related to their learning of these forms? (non-violation
paradigm)
includes research on how lexemes (in theory of any size, though in practice single
words) are represented and accessed in the bilingual lexicon (Balling, 2013; Cop,
Dirix, Van Assche, Drieghe, & Duyck, 2017; Hoversten & Traxler, 2016; Miwa
et al., 2014; Philipp & Huestegge, 2015; Van Assche et al., 2013). What connects
these three approaches is a shared interest in bilinguals’ and L2 speakers’ lexi-
cons. Beyond this thematic overlap, researchers in these different areas have pur-
sued somewhat different research questions so that it makes sense to discuss the
areas one by one. (i) Studies that look into single-word processing have used
unfamiliar or low-frequency words, sometimes even pseudowords, to ensure that
participants have little or no prior knowledge of the target words.The main ques-
tion has been whether L2 speakers can learn these words from reading—that is,
how processing (reading) and vocabulary acquisition (word learning) relate. (ii) In
research on multiword sequences, on the other hand, the idioms and colloca-
tions are typically known or familiar, although studies may also include unfamiliar,
non-idiomatic expressions in the control condition. The focus here is on how
multiword units are represented in L2 speakers’ lexicon. In other words, research-
ers with an interest in multiword units study learners’ mental representation of the
multiword sequences and use the learners’ processing data to do so. (iii) A focus on
representation also characterizes empirical research on the bilingual lexicon. In
many cases, researchers who specialize in the bilingual lexicon study the process-
ing of words that enjoy a special status in a bilingual’s two languages, for instance
cognates (animal in French and animal in English), homographs (pie in English
and pie, meaning “foot” in Spanish), and homophones (belebt in German, mean-
ing “lively”, and beleefd in Dutch, meaning “polite”). The overarching question is
whether bilinguals, whose two languages may share part of these words’ lexical
representation (as in the animal - animal example above), process the words differ-
ently than monolinguals, who by definition will know the word in one language
only, or differently than control words that do not have any cross-lingual overlap
(e.g., pantalon in French and trousers in English).
Eye-movement studies on (single-word) vocabulary acquisition are situated
within the theoretical framework of incidental vocabulary acquisition (some-
times also called contextual word learning; Elgort et al., 2018). It has long been
known that vocabulary gains may accrue incidentally, as a by-product of another
meaning-focused activity, such as reading, watching a movie, or talking with a friend
(for a review, see Schmitt, 2010). Such incidental exposure is an important source of
vocabulary growth. Eye-tracking researchers exploit the potential of the eye gaze as
a measure of overt attention to understand the finer details of how incidental word
learning might occur in real time.That is, unlike most incidental vocabulary acquisi-
tion research where the focus is on offline test performance, eye-tracking researchers
who work in this area are primarily concerned with their learners’ online process-
ing behavior; that is, how the learners engage with new words in real time. These
real-time processing data are then triangulated with participants’ performance on
offline vocabulary tests (e.g., meaning recognition or meaning recall). For instance,
Godfroid et al. (2013) studied how L2 learners allocated their attention during the
Research Topics in Text-Based Eye Tracking 73
reading of short texts embedded with novel words that served as target words for
learning (see Figure 9.26 for an example). The authors linked target-word process-
ing and target-word learning, which was a novel finding and a novel application of
eye-tracking methodology at the time. More attention (longer eye fixations on) the
target words during reading resulted in better recognition of the target words on the
vocabulary post-test. This finding has since been replicated several times in reading
research (Godfroid et al., 2018; Mohamed, 2018; Pellicer-Sánchez, 2016) and has
been extended to the learning of words from watching captioned videos (Montero
et al., 2015) and the acquisition of grammar under different types of instruction
(Cintrón-Valentín & Ellis, 2015; Godfroid & Uggen, 2013; Indrarathne & Kormos,
2017). Meanwhile, the field of (single-word) vocabulary acquisition research has
moved progressively toward the use of longer reading materials, including short sto-
ries (Pellicer-Sánchez, 2016), graded readers (Mohamed, 2018), and chapters from
a novel (Godfroid et al., 2018) or a general-academic textbook (Elgort et al., 2018).
When the reading materials are longer, words will naturally occur more often and
so a new question in this line of research is how processing of target words changes
over time as readers encounter the words repeatedly in the text (Elgort et al., 2018;
Godfroid et al., 2018; Mohamed, 2018; Pellicer-Sánchez, 2016).
Eye-tracking research on multiword units is grounded in the view that language
is formulaic (e.g., Boers & Lindstromberg, 2009; Nesselhauf, 2005; Robinson &
Ellis, 2008; Wray, 2002, 2008). Corpus researchers have shown that the language
people produce is full of statistical regularities (Pawley & Syder, 1983; Sinclair,
1991) in the sense that words like to keep company with some words (e.g., strong
coffee, heavy drinker, avid reader) more than others (e.g., thick coffee, large drinker,
zealous reader). These conventionalized word sequences make up a large part of
people’s linguistic repertoires and help relieve some of the burden of word selec-
tion. However, it is unclear to what extent L2 users benefit from knowing for-
mulaic language as well because “it is only when a sequence is deeply entrenched
in a language user’s long-term memory that it qualifies as truly formulaic for that
user” (Boers & Lindstromberg, 2012, p. 85). Eye tracking can provide insight into
L2 learners’ depth of knowledge by revealing how the learners process formulaic
language in real time and specifically, whether they show a processing advantage
(faster reading times) for formulaic sequences.
Many studies in this area have focused on the processing of idioms as a pro-
totypical example of formulaic language. This line of work uses “a threshold
approach” (Yi et al., 2017, p. 4), in that an expression is either an idiom (e.g., spill
the beans) or it is not (e.g., spill the chips) and when it is an idiom, the phrase is
believed to be stored and retrieved holistically from the lexicon (e.g., Wray, 2002,
2008). More graded approaches to studying multiword units are found in col-
location studies (e.g., fatal mistake > awful mistake > extreme mistake, from Sonbul,
2015), where the association between words is strong but not absolute.Yi and col-
leagues (2017) termed the latter “a continuous approach” (p. 5) reflecting the view
that “[multiword sequences] exist as a continuum in terms of frequency and other
statistical properties” (ibid.). Interestingly, researchers who have studied a wider
74 Research Topics in Text-Based Eye Tracking
range of multiword sequences (Sonbul, 2015; Yi et al., 2017) have found that L2
learners are sensitive to the statistical properties of the target language, whereas the
results from idiom-processing studies have been more mixed (Carrol & Conklin,
2017; Carrol et al., 2016; Siyanova-Chanturia et al., 2011).
Carrol and Conklin (2017) studied the processing of English and translated
Chinese idioms presented in short English sentence contexts (also see Carrol et
al., 2016, for a similar study with English and Swedish). The participants were
English monolinguals and Chinese intermediate learners of English who read
sentences like (6a)–(6d) in the first experiment. The Chinese idiom 画蛇添足,
“draw a snake and add feet”, means “to ruin something adding unnecessary detail”
(Carrol & Conklin, 2017, p. 300), but evidently this figurative meaning is only
available to people who know Chinese. Carrol and Conklin addressed the ques-
tion of whether Chinese-English bilinguals would activate their semantic or con-
ceptual knowledge of the Chinese idiom even when they were reading in English
in two eye-tracking experiments.
The researchers analyzed reading times for the final word in the idioms and
matched non-idiomatic control phrases. The assumption behind this is that the
final word is where most facilitation would occur, because readers are most likely
to have recognized an idiom by the time they encounter the final word. Carrol
and Conklin found that the English and Chinese native speakers showed com-
plementary processing patterns. The English speakers read the English idiom (6a)
faster than the non-idiom (6b) whereas the Chinese showed a facilitation effect
on the Chinese idiom (6c) compared to the non-idiom (6d). Crucially, no dif-
ferences were found in the Chinese reading times for English idioms ([6a] versus
[6b]), which replicates Siyanova-Chanturia et al.’s (2011) earlier findings with
Research Topics in Text-Based Eye Tracking 75
English speakers of mixed L1 backgrounds, but not Carrol et al.’s results (2016)
for L1 Swedes. Simply knowing the meaning of an English idiom, as the partici-
pants in all of these studies do, may therefore not be enough for participants to
enjoy the processing advantages that come from having deeply entrenched phrasal
knowledge (Boers & Lindstromberg, 2012).
In their studies, Carrol and colleagues also showed that the meanings of L1
Chinese or L1 Swedish idioms are activated during L2 reading. This aligns well
with contemporary views of how word retrieval in the bilingual lexicon works (i.e.,
in parallel for both languages), which is the third subarea of vocabulary research
to which we turn now. The dominant theoretical position about the bilingual
lexicon is that lexical access is nonselective with regard to language. This means
words from a bilingual’s two (or more) languages are jointly activated during
language processing, regardless of the words’ language membership and regardless
of the language of the task. In plain terms, bilinguals and L2 speakers can never
really “switch off ” the language they are not using (also see Section 4.2.1). Much
of our understanding of the bilingual lexicon comes from single-word-processing
studies, for instance primed and unprimed lexical decision tasks, naming studies,
and ERP research (see Kroll & Bialystok, 2013; Kroll, Dussias, Bice, & Perrotti,
2015; Kroll & Ma, 2017; Van Hell & Tanner, 2012, for reviews). Eye tracking has
been used successfully in this context to monitor Japanese-English bilinguals’ eye
movements in an English lexical decision task (Miwa et al., 2014). Eye tracking
further offers the possibility of studying words in sentences (Hoversten & Traxler,
2016; Philipp & Huestegge, 2015;Van Assche et al., 2013) or longer texts (Balling,
2013; Cop et al., 2017), where contextual information might bias attention to the
language in use (i.e., away from parallel, cross-lingual activation). Then how do
researchers measure language co-activation in seemingly monolingual contexts
such as unilingual sentences? Often, they compare the processing of words that
share a formal and/or semantic overlap between a bilingual’s two languages with
language unique words that are matched on properties such as frequency and
word length (see Sections 2.5 and 6.2.3).The former category includes cognates
(words that overlap in meaning, spelling, and pronunciation, e.g., animal in French
and English), homographs (overlap in spelling and potentially pronunciation
but not meaning, e.g., pie in English and Spanish), and homophones (overlap
in pronunciation and potentially spelling but not meaning, e.g., belebt in German
and beleefd in Dutch). Here is an example with homographs from Hoversten and
Traxler (2016), who studied bilingual lexical activation by comparing Spanish-
English bilingual and English monolingual sentence reading.
into how much the participants are learning, at least as measured on immediate,
discrete post-tests of grammar or vocabulary knowledge.
Parallel to the ongoing research on visual input enhancement (e.g., Issa &
Morgan-Short, 2019; Issa, Morgan-Short, Villegas, & Raney, 2015) researchers
are broadening the range of instructional conditions included in eye-movement
studies (Cintrón-Valentín & Ellis, 2015; Indrarathne & Kormos, 2017, 2018). By
expanding the domain of inquiry in this way, these researchers are able to con-
tribute important empirical data addressing the role of explicit instruction in L2
acquisition, which is known as the interface debate (also see Andringa & Curcic,
2015). As a case in point, Cintrón-Valentín and Ellis (2015, Experiments 1 and
2) compared three types of focus-on-form instruction to help L1 English speak-
ers overcome their attentional biases when learning a new, inflection-rich lan-
guage, namely Latin (also see Ellis et al., 2014; Sagarra & Ellis, 2013). The types of
instruction were verb pretraining (VP), verb grammar instruction (VG), and verb
salience with textual enhancement (VS). There was also a control group. The VP
and VG groups completed a pretraining phase, in which the VP group trained on
inflected verbs (as sole cues) and the VG group engaged in a brief grammar lesson
on verb tense morphology. After that, learners from all four groups marked the
temporal reference (past, present, future) of simple Latin sentences that consisted
of a temporal adverb and a verb form marked for tense (i.e., two cues to temporal
reference, of which the verb cue is the one that tends to be blocked, or ignored,
by L1 English speakers). Verb endings were bolded and printed in red for the VS
group, in an attempt to help them overcome their attentional biases toward the
adverb (see Figure 3.3). The results of the eye-tracking data for phase 2, which
were collected for a subset of participants, showed that all three treatment groups
FIGURE 3.3 An example trial from the exposure phase in a study on learned attention.
Both the adverb heri, “yesterday”, and the verb cogitavi, “I thought”,
denote past time.
(Source: Cintrón-Valentín & Ellis, 2016).
Research Topics in Text-Based Eye Tracking 79
paid more, sustained attention to the verb cues than the control group, who grad-
ually lost interest in the verbs as training progressed. Moreover, the proportion
of time participants fixated on either cue (i.e., the verb or the adverb) during
training correlated with the participants’ cue reliance during sentence interpreta-
tion and production. Although the three focus-on-form techniques were similarly
effective in refocusing learners’ attention in this study, Cintrón-Valentín and Ellis
noted that the optimum levels of explicitness and explanation in ISLA will vary
for different types of constructions.
We have seen how the use of eye tracking in ISLA has spread from a direct,
visually oriented intervention (input enhancement) to other instructional tech-
niques. A further development in this line of research is the use of eye tracking
to validate features of instructional design. Within the field of task-based lan-
guage teaching, Révész (2014) advocated for the use of eye tracking and other
methodologies to validate task complexity manipulations, given the importance
of task characteristics, and specifically task complexity, in theoretical accounts of
task-based language teaching and learning (Robinson, 2001, 2011; Skehan, 1998,
2009). In a study on the acquisition of the English past counterfactual conditional,
Révész and her colleagues (2014) recorded participants’ eye movements as they
completed two oral production tasks (one simple, one complex). The tasks were
designed to differ in their reasoning demands; that is, how straightforward it was to
identify the likely cause, out of two, for a stated outcome (see Figure 3.4). Analyses
FIGURE 3.4 Sample trial in an ISLA study. Participants were tasked with describing
the causal relationship between two events based on a story they had
just read. The picture prompt on the right was hypothesized to be more
complex because both answer options were plausible.
(Source: Révész et al., 2014).
80 Research Topics in Text-Based Eye Tracking
3.2.4 Subtitles
Humans are surrounded by rich, multimodal input. They commonly experience
visual (pictorial) and verbal information at the same time, for instance when drink-
ing coffee with a friend in a coffee shop (sound and image), viewing advertise-
ments (text and image), or watching an opera performance with the soundtrack
displayed above the stage (sound, text, and image). Not only do humans have the
ability to decode information obtained from multiple input streams at the same
time, they can also integrate these sources into coherent, multimodal representa-
tions of the world. Eye-tracking researchers in SLA have recently turned to a
specific type of bimodal input, namely foreign films with subtitles (written L1
translations of the aural input) or captions (written L2 renderings of the aural
input). Subtitled and captioned videos are prime examples of multimodal input
because they combine visuals, spoken verbal, and written verbal input, all in one
viewing experience.
SLA researchers use eye tracking to examine how bimodal input conditions
influence reading behavior (Muñoz, 2017; Winke et al., 2013) or L2 learn-
ing (Bisson et al., 2014; Montero Perez et al., 2015): see online supplementary
Research Topics in Text-Based Eye Tracking 81
materials for summary tables, Table S3.4. Their work builds on 35 years of subti-
tles research that has shown, among other things, that captions are beneficial for
listening comprehension and L2 vocabulary learning (see Montero Perez,Van den
Noortgate, & Desmet, 2013, for a meta-analysis). Although some of these earlier
studies also used eye tracking (e.g., d’Ydewalle & Gielen, 1992; d’Ydewalle &
De Bruycker, 2007; d’Ydewalle, Praet, Verfaillie, & Van Rensbergen, 1991), most
previous subtitles research relied on offline performance measures such as mul-
tiple-choice questions or free recall. The value of eye tracking, therefore, lies in
illuminating, in fine temporal detail, how viewers allocate, shift, or divide their
attention between the image and text, when the image and text provide partially
overlapping information and processing is further constrained by the soundtrack.
The theoretical advantages of bimodal or multimodal input are fairly well
established—multimodal input reduces cognitive burden, which leads to bet-
ter processing and intake (Gass, 1997). Multimodal input also increases recall as
learners make use of both aural and visual working memory, thus expanding on
their limited capacity for storing information in either memory system (Baddeley,
1986). In SLA, the focus has been on whether the advantages of multimodal input
hold across different linguistic domains, starting with vocabulary (Bisson et al.,
2014; Montero Perez et al., 2015), and whether subtitles or captions are equally
effective for different learner profiles (Muñoz, 2017; Winke et al., 2013).
Of the four identified subtitles studies (see Table 3.4), two studies (Bisson et al.,
2014; Montero Perez et al. 2015) have linked bimodal input with vocabulary
acquisition. Montero Perez and her colleagues (2015) examined the role of cap-
tion types (full captions or keywords) on vocabulary acquisition in L2 French
learners who either were or were not informed they would be tested on vocabu-
lary afterward. The researchers used keyword captioning, as an alternative to full
captioning, in an attempt to increase the words’ visual salience and enhance learn-
ing. The amount of visual attention to the target words (i.e., viewers’ eye fixation
durations) showed an interesting relationship with word learning (test scores),
which differed for the keyword and full captioning groups and for the groups
that did, versus did not, receive a test announcement. Keyword captioning elic-
ited longer fixations on the target words than full captions if participants knew
about the upcoming test. Keyword captioning also led to higher form recognition
scores. However, a direct relationship between fixation duration and word recog-
nition could not be established for the keyword groups, only for the full-caption
group that received a test announcement. The relationship between attention and
learning, therefore, held in only one of the four conditions, although the results
generally confirmed the benefits of isolating key linguistic information in cap-
tions for better learning outcomes.
Researchers have also looked into how age and proficiency (Muñoz, 2017)
and content familiarity and target language (Winke et al., 2013) relate to caption-
reading behavior. Processing text paired with aural input and a changing visual
background is a complex task.This is particularly true for young learners, who are
still developing cognitively, and for beginning L2 learners (including young learn-
ers), whose language skills may hamper adequate comprehension (Vanderplank,
2016). In response to these concerns, Muñoz (2017) examined L2 learners’ read-
ing behaviors of two captioned Simpsons clips as a function of the participants’
age (children, adolescents, and adults) and L2 proficiency (beginner, intermedi-
ate, advanced). Because all the children were beginning learners of English, most
adolescents were at an intermediate level, and most adults were advanced, the
results for the two sets of analyses largely coincided. Muñoz found that beginners/
children experienced more processing difficulty reading L2 English captions than
L1 Spanish subtitles and that advanced speakers/adults tended to skip L1 subtitles
more often. The findings are interesting in light of a global trend toward starting
foreign language instruction at a younger age, often in primary school. As Muñoz
explained, captioned or subtitled video can be a valuable pedagogical tool for
child L2 learning, provided the children are able to read the captions or subti-
tles. Future researchers could disentangle proficiency and age-related factors more
clearly. Analyses could also focus on how the eye moves relative to what the ear
hears and how the line of sight travels between the text and the video image (see
Bisson et al., 2014, for analyses of the image area). Such analyses would capture the
multimodal experience of watching subtitled or captioned videos more fully and
advance our understanding of what makes reading captions or subtitles different
from reading static text. Table 3.4 summarizes some current and potential future
questions in eye-tracking research on multimodal input.
3.2.5 Assessment
Language assessment researchers have turned to eye-movement recordings as a
tool to provide insights into how test takers interact with test items. Research
to date has evaluated L2 reading assessment (Bax, 2013; McCray & Brunfaut,
2018), L2 listening assessment (Suvorov, 2015), and speaking assessment for L1
Research Topics in Text-Based Eye Tracking 83
and L2 English-speaking children (Lee & Winke, 2018): see online supplementary
materials for summary tables, Table S3.5. Although the research questions in each
study differ, in part because of the focus on different language skills, assessment
researchers who employ eye tracking are generally interested in test validity.
The overarching question, therefore, is whether language tests assess what they are
intended to measure, which is language proficiency.
A way to investigate test validity is by examining online behaviors of differ-
ent groups of test takers (e.g., successful vs. unsuccessful, high vs. low proficiency,
native vs. non-native) responding to the same test items. Such comparisons can
reveal whether a test discriminates between test takers with different linguistic
profiles, which is usually considered a positive sign of the validity of a test. The
exception is the native–non-native speaker comparison in Lee and Winke’s (2018)
study, where the goal was to investigate whether child English language learners
(the non-native speaking population) felt psychologically safe taking the English
language test and therefore, native–non-native speaker differences in eye move-
ments were not considered to be a good thing. Other questions addressed in the
assessment strand of eye-tracking research include the role of visual information
in L2 listening assessment (Suvorov, 2015) and the distribution of global and local
reading processes in banked gap-fill items (McCray & Brunfaut, 2018). To better
understand the construct of banked gap-fill items, McCray and Brunfaut (2018)
selected 24 test items from the Pearson Test of English Academic (see Figure 3.5
for a publicly available example, not used in the study). The researchers wanted
to know to what extent test takers’ use of global (higher-level, text-based) read-
ing processing vs. their local (word-level) reading processes related to their test
performance, considering that this type of reading test is designed to measure the
whole spectrum of reading processes.
Based upon Khalifa and Weir’s (2009) cognitive processing model of reading,
the authors proposed seven hypotheses of how different aspects of lower- and
higher-level processing relate to test performance.They examined eye-movement
behavior in three categories of task performance: (i) overall processing of the
gap-fill items, (ii) text processing (e.g., reading the text), and (iii) task process-
ing (e.g., engaging with the word bank). The authors found higher-performance
test takers completed the test faster than the lower-performance group, whereas
lower-level performers visited the word banks more often. Spending more time
on the words surrounding the gap (a local reading strategy) was associated with
lower test scores. These findings show that less successful test takers evidenced
more lower-level text processing. It follows that successful engagement with gap-
fill tasks may require higher level reading skills, in line with the stated objectives
for banked gap-fill test items.
Another way to ensure test validity is by exploring the role of test irrelevant
features and their impact on test performance. For a language proficiency test to be
valid, test scores should not reflect test irrelevant features, or construct-irrelevant
variance, because this would muddle the construct being measured. Test validity
is of special concern for cognitively developing populations such as young test
takers, since these groups of test takers are unfamiliar with highly restricted test
environments (e.g., performing under time pressure) and are thus more likely to be
influenced by test conditions. A recent study conducted by Lee and Winke (2018)
attempted to address these concerns by exploring response behaviors of young test
takers of the TOEFL® PrimaryTM speaking test (sample task shown in Figure 3.6).
Lee and Winke recruited native and non-native test takers aged eight, nine, or ten
years old to find out if developmental differences or issues in task design might
account for performance accuracy and response patterns, as seen in the children’s
eye-movement data and spoken output.
In an effort to measure their young learners’ test experience comprehen-
sively, the authors triangulated multiple data sources in their analyses, namely
drawings, interview data, test performance scores, and eye-movement measures.
Lee and Winke found the English language learners scored lower on two dif-
ficult items than their native peers. Eye gaze behavior during speech aligned
with speaking performance on these items. In particular, the English language
learners had a stronger tendency to look at the timer on the screen (see Figure
3.6) and this was associated with poor speech production (e.g., hesitations or
silence). Although the causality of this finding is unknown (i.e., did a lower
proficiency cause fixations on the timer or did the timer interfere with less
proficient speakers’ test performance), the findings illustrate the importance of
understanding test takers, test conditions, and their characteristics. Table 3.5
summarizes some of the main questions in contemporary eye-tracking research
on language assessment.
FIGURE 3.6 Sample picture description task.
(Source: The TOEFL® PrimaryTM speaking test for young test takers used in Lee and Winke, 2018.
Copyright © 2013 Educational Testing Service. Used with permission).
1. What cognitive construct does a certain test or test item measure? Do test items
successfully discriminate test takers’ language abilities?
2. How do test takers derive the right answer on a test? Does test takers’ response
behavior solely reflect their language abilities?
3. How does video function in a multimedia listening test environment? How do audio-
only listening tests differ from video-based listening tests? Is it a good idea (from a test
validity perspective) to include visual support in listening assessment?
4. Are the features of a test (e.g., directions, prompts, time pressure) appropriate for more
vulnerable test populations (e.g., young children)?
5. How do raters interact with rubrics and rubric categories when rating speech samples
or essays? How do raters interact with essays or other writing samples they are
evaluating?
86 Research Topics in Text-Based Eye Tracking
3.3 Conclusion
This chapter has given readers a tour of text-based eye-tracking research in SLA
and bilingualism. The itinerary was determined by a synthetic review of the eye-
tracking literature which encompassed 16 discipline-specific journals. That syn-
thetic review revealed a total of 82 eye-tracking publications published between
2003 and 2017. Fifty-two studies involved some type of written input, making
text-based eye tracking the largest category of eye-tracking research.The other 32
studies, which collectively make up visual world research and production studies,
will be reviewed in Chapter 4.
The growing body of eye-tracking research confirms the trend toward
increased use of real-time methodologies in SLA and bilingualism (see Chapter
1). Researchers across different paradigms are increasingly recognizing the value of
measuring language knowledge and processing as it occurs. At the same time, L2/
bilingual eye-tracking research still has a lot of potential for growth as about half
of the surveyed journals had published no eye-tracking or one eye-tracking study
at the time the search was concluded. There is reason to believe that this situation
will likely change soon because the use of eye-tracking methodology in the field
is rapidly diversifying. Indeed, for the present synthetic review, I identified a total
of five major eye-tracking research strands, four strands of which emerged after
2010. The five strands are grammar (the oldest and largest strand), vocabulary and
the bilingual lexicon, instructed second language acquisition, captions and sub-
titles processing, and assessment. The bulk of work in each strand—and in some
cases all of it—has appeared since 2010. Given the goals of this methodology
book, my review focused on general questions and strand-specific approaches that
may inspire readers’ own research projects. I gave an overview of the breadth and
depth of contemporary L2 eye-tracking research. One thing that became clear
is that, despite some similarities between research strands, eye tracking cannot
offer a one-size-fits-all solution. As L2 and bilingualism researchers’ needs and
research interests differ, so will the applications of eye-tracking methodology.This
alludes to the importance, mentioned in the beginning of this chapter, of reading
widely in eye-tracking and non-eye-tracking literatures and staying informed of
cross-disciplinary trends in eye-tracking methodology. Together with one’s own
expertise, this confluence of ideas may spark creativity in the research process and
result in more robust and innovative studies.
Notes
1 Studies that were available online first in 2017 had a 2018 publication date.
2 When describing eye-tracking studies, I will often use the terms trial, item, and sentence.
A trial is a sequence of events that represents one basic unit in an experiment. For
instance, a trial can be one sentence followed by a comprehension question, a sentence
paired with four pictures, a prime word followed by a target word, one screen in a
video, or a question on a listening test. An item is the central element inside a trial. It is
Research Topics in Text-Based Eye Tracking 87
This chapter and its companion chapter in Chapter 3 present the results of a
synthetic review of eye-tracking studies in SLA and bilingualism. The present
chapter focuses on eye-tracking studies conducted with spoken language, using
the visual world paradigm, a paradigm for studying spoken language comprehen-
sion. Although visual world eye tracking and eye tracking with text differ in many
ways, researchers working in the two paradigms still pursue similar questions
regarding the processing, acquisition, and representation of language. The present
chapter shares with Chapter 3 a goal to survey the field of eye-tracking research in
all its breadth and diversity. By surveying what types of questions have been suc-
cessfully addressed using eye-tracking methodology, this overview can serve as a
springboard for readers to formulate their own research questions and to kickstart
their own research projects with eye tracking.
to visual world studies) because the spatial layout of print text does not neces-
sarily correspond to the order in which readers process the words (Tanenhaus &
Trueswell, 2006).1 Thus, the primary strength of processing load measures is that
they signal “transient changes in process complexity” (Tanenhaus & Trueswell,
2006, p. 874) which researchers can use to “make inferences about the underlying
processes and representations” (ibid.).
The visual world paradigm, on the other hand, presents researchers with a
set of strengths that complement eye tracking with text. A method for studying
spoken language processing, the visual world paradigm is founded on a link-
ing hypothesis that maps auditory-linguistic processing onto visual processing
(see following). Visual world researchers use eye movements as a representa-
tional measure (Tanenhaus & Trueswell, 2006). This means that eye fixations
in the visual world can reveal what linguistic representations become activated
in listeners’ mind at any given time. For example, eye fixations can reveal how
and when the phonology and meaning of a word are retrieved from the mental
lexicon during listening (e.g., Allopenna, Magnuson, & Tanenhaus, 1998; Dahan,
Magnuson, & Tanenhaus, 2001; Dahan, Swingley, Tanenhaus, & Magnuson, 2000;
Marian & Spivey, 2003a, 2003b; Spivey & Marian, 1999). Eye fixations also show
how personal and reflexive pronouns are interpreted in relation to their possible
linguistic antecedents, which are depicted on the screen (e.g., Cunnings, Fotiadou,
& Tsimpli, 2017; Kim, Montrul, & Yoon, 2015; Runner, Sussman, & Tanenhaus,
2003, 2006). Because eye movements are time-locked to the auditory signal (there
is no skipping or regressing during listening), visual world data provide fine-
grained information about the time course of processing; however, they do not,
in any straightforward manner, address questions of processing load or processing
difficulty (Tanenhaus & Trueswell, 2006).
Eye tracking with text and visual world eye tracking thus appear to provide
complementary perspectives on language processing and representation. Given
the complementary nature of the two paradigms, eye tracking turns out to be a
useful research tool because eye movements are not only a processing load meas-
ure (as in text-based eye-tracking studies), but also a representational measure (as
in visual world studies), although such a distinction “is more of a heuristic than a
categorical” one (Tanenhaus & Trueswell, 2006, p. 874). This means that eye fixa-
tions, regardless of paradigm, are able to reveal what is represented and activated in
the mind because processing and representations are inextricably linked.
In this chapter, I will present the findings from part two of the synthetic review:
the visual world paradigm. Before this, however, it is crucial to understand the theo-
retical foundations of this body of work. Previously, I mentioned that visual world
research is founded on a linking hypothesis—a formal account of how auditory-
linguistic and visual processing come together and manifest themselves in a partici-
pant’s eye gaze. It is to different versions of this hypothesis that we now turn.
Early work using eye tracking and audiovisual materials uncovered a link
between audio materials, eye-fixation behavior, and visual information, even
90 Research Topics in the Visual World Paradigm
though the author, Roger Cooper, did not theorize about their relationship.
Cooper (1974) recorded eye movements of participants who listened to a story
while simultaneously looking at a 3 × 3 picture display (see Figure 4.1). He found
that participants looked at the pictures named in the story, suggestive of an audio-
eye-image link. He also found the link extended to semantically related items
(e.g., the word Africa elicited looks to pictures of a lion and a zebra). Although the
idea that language could direct attention in similar ways to pointing was thought
to be “unsurprising” (Altmann, 2011b, p. 979), Cooper’s findings revealed two
fundamental facts upon which contemporary visual world research is built: first,
when humans see an object, they activate the concept in memory and register the
object’s spatial location in the visual scene; and second, with the spatial location
registered, people look at what they hear (the linguistic input), as well as whatever
is phonologically or semantically related to what they hear.
Although very innovative, Cooper’s methodology did not attract much atten-
tion until researchers in the mid to late 1990s started applying it to the process-
ing of phonology (Allopenna et al., 1998; Eberhard, Spivey-Knowlton, Sedivy,
& Tanenhaus, 1995), semantics (Altmann & Kamide, 1999), syntax (Eberhard et
al., 1995; Tanenhaus, Spivey-Knowlton, Eberhard, & Sedivy, 1995), and pragmatic
information (Eberhard et al., 1995). Michael Tanenhaus and his colleagues (1995)
initiated a new era of visual world research with a paper on the role of visual
context in syntactic processing, which was published in Science. Participants’ eye
movements were tracked as they performed simple tasks with real objects. The
researchers showed that visual context had an immediate effect on listeners’ sen-
tence interpretation. Specifically, whether participants saw one or two potential
referents (e.g., one apple or two apples) influenced how they initially interpreted
Put the apple on the towel … in the box: as a goal (put the apple on the towel,
when there was only one apple) or a noun modifier (the apple that is on the
towel, when there were two apples). As Tanenhaus and Trueswell (2006) noted,
the presence of concrete objects—a “visual world”—alongside spoken language
makes the paradigm particularly well suited for studying questions of referential
processing (i.e., how people relate language to external referents). Indeed, the
original Tanenhaus et al. (1995) study has now been extended to child L1 speak-
ers (Trueswell, Sekerina, Hill, & Logrip, 1999) and adult L2 speakers (Pozzan &
Trueswell, 2016) as well (see Section 4.2.3).
As researchers embarked on their new research programs on spoken language
processing, the deeper question of how and why eye movements might reflect
linguistic processing became key. The first to propose a simple linking hypothesis
were Paul Allopenna and his colleagues. Allopenna et al. (1998) used a visual world
paradigm to investigate spoken word recognition. Participants saw a visual display
(see Figure 4.2) and listened to instructions such as “Pick up the beaker; now put
it below the diamond” (p. 419). Here the target was beaker. The researchers found
that participants looked at images of phonological onset competitors (i.e., beetle)
and rhyme competitors (i.e., speaker) more than they looked at an unrelated control
object (i.e., carriage). The findings thus showed that words with similar names com-
peted with the target for word recognition, as seen in the listeners’ eye fixation data.
The use of competitors (e.g., phonologically, visually, or semantically similar
words) in visual displays has become a key technique for studying different kinds
of activation and competition effects in the visual world paradigm (see Textbox
6.1). Work with bilinguals has been particularly interesting in this regard (e.g.,
Blumenfeld & Marian, 2007; Marian & Spivey 2003a, 2003b; Mercier, Pivneva,
& Titone, 2014, 2016; Spivey & Marian, 1999), as researchers have shown that
competition effects are not language specific, but occur within and across a
person’s two or more languages. In the following, we will see several more exam-
ples of lexical competition effects with bilinguals (see Section 4.2.1).
92 Research Topics in the Visual World Paradigm
A second, important finding of Allopenna et al.’s study was that the empirical
eye-movement data closely followed theoretical predictions of a spoken word
recognition model, TRACE (McClelland & Elman, 1986). Using computer sim-
ulations, Allopenna and his colleagues generated predicted activation levels for
the different word candidates (e.g., beaker, beetle, speaker, carriage) as the spoken
input unfolded over time. The model predictions showed a striking similarity to
the eye fixation data obtained from the participants (see Figure 4.3). This sup-
ported “a clear linking hypothesis between lexical activation and eye movements”
(Allopenna et al., 1998, p. 438), whereby the likelihood of a participant fixating
on a word corresponds to the word’s lexical activation level as predicted by the
model (also see Tanenhaus, Magnuson, Dahan, & Chambers, 2000). At the same
time, the authors already recognized that their linking hypothesis was a simple one
Research Topics in the Visual World Paradigm 93
FIGURE 4.3 Linking eye movement data and activation levels of lexical representations
as predicted by the TRACE model.The empirical data follow theoretical
predictions very closely and lend support to a simple linking hypothesis.
(Source: Reprinted from Tanenhaus, M. J. & Trueswell, J. C., 2006. Eye movements and spoken
language comprehension. In M. J. Traxler & M. A. Gernsbacher (Eds.), Handbook of Psycholinguistics
(2nd edition) (pp. 863–900). London: Academic Press, with permission from Elsevier. © 2006 Elsevier).
A key strength of the visual world paradigm, therefore, is that it can reveal L1
and L2 speakers’ linguistic knowledge representations during real-time spoken-
language processing, in the form of participants’ anticipatory eye movements (also
see Section 4.2.2.1).
Meanwhile, the new evidence for anticipatory eye movements suggested a
need to revise the linking hypothesis. Research had shown that eye movements in
the visual world could be either referential—coinciding with what was named
(e.g, Allopenna et al., 1998; Tanenhaus et al., 1995)—or anticipatory—ahead of
what would be named (e.g., Altmann & Kamide, 1999; Kamide et al., 2003). In
the latter case, an account in terms of lexical activation was clearly insufficient
because if an object hasn’t been named yet, its spoken form cannot activate the
word representation. A third kind of eye movements, which proved to be impor-
tant theoretically, was the occurrence of eye movements to objects in the display
that were neither named nor going to be named in the linguistic input. This was
the focus of a new study by Altmann and Kamide (2007), which involved a tense
manipulation (also see Section 9.3.1, research idea #6).
In Altmann and Kamide (2007), native English speakers listened to sentences
such as The man will drink the beer and The man has drunk the wine, while they
viewed a display with a full glass of beer and an empty wine glass (see Figure 4.5).
Research Topics in the Visual World Paradigm 95
As the authors noted, visual displays are static but “events, like sentences, unfold
in time; they have a beginning and an end state” (p. 504). Thus, interpreting a
dynamically unfolding event such as The man has drunk the wine in the presence of
a static visual scene requires listeners to establish a temporal relationship between
the scene and the event described. Altmann and Kamide found that listeners did
exactly that: they looked to the empty wine glass more often in the past-tense
condition has drunk (in which case they interpreted the scene as showing the end
state of the drinking event) whereas they tended to look at the full beer glass more
often in the future-tense condition will drink (suggesting they now interpreted the
scene as showing the initial state of the event).3 These changing event represen-
tations highlight the role of the visual scene and specifically, the idea that one and
the same visual representation can receive different interpretations depending on
the accompanying linguistic input.
Together with other studies (Dahan & Tanenhaus, 2005; Huettig & Altmann,
2004, 2005, 2011), the findings from Altmann and Kamide’s tense study pro-
vided the foundation for a revised version of the linking hypothesis. According
to their hypothesis, the interpretation of the linguistic input takes place against
96 Research Topics in the Visual World Paradigm
FIGURE 4.6
Distribution of eye-tracking studies across 16 SLA and bilingualism
journals. Note: SSLA = Studies in Second Language Acquisition; VWP =
visual world paradigm.
98 Research Topics in the Visual World Paradigm
studies on language production as a part of the current review (i.e., Flecken, 2011;
Flecken, Carroll, Weimar, & Von Stutterheim, 2015; Kaushanskaya & Marian,
2007; Lee & Winke, 2018; McDonough, Crowther, Kielstra, & Trofimovich, 2015;
McDonough, Trofimovich, Dao, & Dion, 2017). Because of its focus on speaking
assessment, Lee and Winke (2018) was also discussed together with other assess-
ment research, in Section 3.2.5, and was therefore not included in the total tally
of 32 visual world studies.
The previously mentioned characteristics (bimodal input, simple display)
defined prototypical visual world research. A small number of studies did not fall
neatly into this category and thus required closer scrutiny. For one, I subsumed
subtitles and captions research under the broad umbrella of text-based research (see
Section 3.2.4) because the visual displays of captioned materials are much more
complex than what is common in visual world research. Furthermore, although
captioned or subtitled videos are multimodal, analyses to date have focused pri-
marily on how the captions or subtitles are read (i.e., text processing). Second,
Bolger and Zapata (2011), whose study was mentioned in Section 3.2, combined
elements of text-based eye tracking and visual world eye tracking in different parts
of their vocabulary instruction study. Even though the study was not bimodal, I
will review it as a part of visual world research (i) because of the similarities in
visual display (a few large elements on the screen) and (ii) because of the con-
ceptual focus on interference or competition effects (a recurring theme in visual
world studies). The outcome of the categorization process is shown in Figure 4.6,
which represents the distribution of text-based and visual world research across 16
SLA and bilingualism journals.
The literature search revealed there have been just over half as many visual
world studies (k = 32) as text-based studies (k = 52) in SLA to date. Dissemination
of eye-tracking research has been concentrated in a fairly small number of jour-
nals; this was true for text-based eye tracking and holds true even more for visual-
world research. Bilingualism: Language and Cognition accounts for nearly half of
the visual world studies published up to 2017. The studies published in this jour-
nal show a large thematic overlap with studies published in psychology journals,
including the Journal of Memory and Language, Cognition, and Language, Cognition,
and Neuroscience, with several authors actively engaging with the two research
communities.5 Visual world research has also been reported in Second Language
Research, Studies in Second Language Acquisition, Applied Psycholinguistics, and
Language Learning. Some more recent developments, such as the use of the visual
world paradigm to study effects of instruction (Andringa & Curcic, 2015; Bolger
& Zapata, 2011; Hopp, 2016), implicit and explicit knowledge (Suzuki, 2017;
Suzuki & DeKeyser, 2017), and the proficiency correlates of prediction (Hopp,
2013, 2016) have appeared in these journals. Finally, production studies, by their
varied nature, have appeared in a range of different journals, including The Modern
Language Journal, reflecting their more theoretical or applied research objectives.
My prediction is that the newer applications of the visual world paradigm, as well
Research Topics in the Visual World Paradigm 99
as production studies, will play an important role in introducing the visual world
paradigm to SLA more broadly, because of the new research avenues these appli-
cations open up for the field.
Within this sample of 32 visual world studies, I identified four broad research
strands with the help of a research assistant, who coded all the studies. These
strands are (1) word recognition (see Section 4.2.1), (2) prediction (see Section
4.2.2), (3) referential processing (see Section 4.2.3), and (4) production (see
Section 4.2.4). With half of all studies, the prediction strand is by far the larg-
est area of L2 and bilingual visual world research, accounting for 16 out of
the 32 studies. To capture this diversity and adequately represent the multiple
levels at which listeners can predict, I further divided prediction research by
level of the linguistic hierarchy: semantic prediction, morphosyntactic predic-
tion, and discourse-level prediction. Therefore, readers interested in grammar
research are invited to consult the section on morphosyntactic prediction (see
Ssection 4.2.2.3), as well as Sections 4.2.2.4 and 4.2.3. Instructed second lan-
guage acquisition researchers can find relevant research summarized under the
effects of instruction on prediction (see Section 4.2.2.5). Finally, vocabulary
researchers and researchers studying the bilingual lexicon may find the section
on word recognition particularly interesting (see Section 4.2.1). In what follows,
I will provide an overview of the types of questions investigated in each strand,
starting with the “lowest” level of the linguistic hierarchy, word recognition or
lexical processing, and gradually working my way toward the higher levels of
linguistic representation.
4.2.1 Word Recognition
The online search revealed a total of six visual world eye-tracking studies that
examined topics in word recognition (see online supplementary materials for
summary tables, Table S4.1). The majority of these studies deal with the nature
(i.e., structure of and access to) the bilingual lexicon (Marian & Spivey, 2003a,
2003b; Mercier et al., 2014, 2016).The overarching question about the bilingual
lexicon is whether words are organized and accessed in separate lexicons, as if the
bilingual had two mental dictionaries, one for each language, or whether words
are accessed all together, regardless of language, and form one mental dictionary
or integrated lexicon (also see Section 3.2.2). Empirical data to date favor the
view of an integrated lexicon. The visual world paradigm has also been used
to investigate word segmentation in L2 French connected speech (Tremblay,
2011), where a misalignment of syllable boundaries and word boundaries in liai-
son (e.g., fameux élan, “famous swing”) could theoretically make word recogni-
tion more difficult for L2 French learners. Lastly, recognition of words has been
a tool for studying individual differences in bilinguals (e.g., inhibitory control)
in a Stroop task (Singh & Mishra, 2012) and in prototypical visual world studies
with four-image displays (Mercier et al., 2014, 2016).
100 Research Topics in the Visual World Paradigm
Visual world researchers who study word-level phenomena will often account
for their data in terms of lexical activation and competition effects. The idea
is that before the meaning of a word is retrieved, the incoming input activates
multiple word candidates (e.g., can, candle, candid, candy) which compete for rec-
ognition in the listener’s lexicon. We already saw an example of lexical competi-
tion effects in Allopenna et al.’s (1998) study (see Section 4.1). In this study (see
Figure 4.2), as the target word beaker unfolded, native English speakers looked at
the image of a beetle (onset competitor) and a speaker (rhyme competitor) more
than the image of a carriage (unrelated distractor). Hence, their eye movements
revealed transient activation and competition effects during listening. Generalizing
from this study, visual world eye tracking can uncover subtle competition effects
in L1 and L2 spoken word recognition, revealed in participants’ looks to nontar-
get, competitor images on screen that are displayed alongside the target (also see
Textbox 6.1, for a summary of the different roles of images in visual world studies).
By definition, L2 speakers and bilinguals know words in more than one lan-
guage. This makes them an interesting population to study lexical competition
effects. Will words from all languages compete for recognition even if the input
is solely in one language? This is a major question in visual world research on the
bilingual lexicon (also see Section 3.2.2). Like text-based eye-tracking research-
ers, visual world eye-tracking researchers have adduced important evidence that
bilingual speakers’ lexicons are, indeed, integrated and accessed non-selectively
(see Kroll & Bialystok, 2013; Kroll, Dussias, Bice, & Perrotti, 2015; Kroll & Ma,
2017;Van Hell & Tanner, 2012, for general reviews).
In two influential experiments, Marian and Spivey (2003a, 2003b) asked
Russian-English bilinguals to manipulate real objects that were laid out on a
workspace. Unbeknownst to the participants, the names of some objects over-
lapped phonologically in the language used in the experiment (e.g., English) or
between the bilinguals’ two languages (i.e., English and Russian). The display in
Figure 4.7 mirrors the real objects participants saw in the actual experiments.
Each display contained a target object (e.g., shovel) that participants were asked to
pick up. In the critical conditions, the display also contained a within-language
competitor (e.g., shark) and/or a between-language competitor (e.g., a balloon,
pronounced as /ʃαrik/ in Russian). When the experiment was in English, the
researchers found that the balloon, like the shark, exerted an influence on partici-
pants’ eye movements. This suggested both languages were activated at the same
time during listening.
Marian and Spivey’s studies (also Spivey & Marian, 1999) marked the begin-
ning of visual world eye tracking in bilingualism. Since then, researchers have
attempted to uncover the factors (e.g., age of L2 acquisition, L2 proficiency, lan-
guage of the experiment, phonological overlap) that influence the strength of
between-language competition, given that these effects are sometimes weak or
even absent. In two studies, Mercier and her colleagues (2014, 2016) focused on
the role of inhibitory control as a potential factor (also see Blumenfeld & Marian,
Research Topics in the Visual World Paradigm 101
then, that language users can proactively inhibit all the words in a language, not
unlike someone who speaks French in the workplace but a different language at
home would “switch off ” all French when she gets home (cf. Mercier et al., 2016).
Looking at inhibitory control as a consequence, rather than a cause, Singh and
Mishra (2012) examined levels of inhibitory control in two groups of Hindi-English
bilingual speakers.The researchers used an oculomotor (eye-movement-based) ver-
sion of the Stroop task, a classic measure of inhibitory control (MacLeod, 1991;
Stroop, 1935), shown in Figure 4.8. They found that bilinguals with higher L2
English proficiency outperformed lower-proficiency bilinguals on the task. When
replicating this study, future researchers could include additional control variables
such as participants’ socioeconomic status, educational experience (schooling sys-
tem), language use during their leisure time, and nonverbal intelligence, to bolster
the case for the reported cognitive advantages of proficient bilinguals (for reviews,
see Bialystok, 2015; Paap, 2018;Valian, 2015).
In sum, these studies illustrate how eye tracking can capture moment-to-
moment processing at sublexical levels. As bottom-up linguistic input becomes
available, activation spreads to the phonemic and lexical representations, where
FIGURE 4.8 Display used in the oculomotor Stroop task. Participants needed to make an
eye movement to the color patch that matched the ink color of the color
word (e.g., red), while ignoring the word’s meaning (e.g., hara means “green”).
(Source: Figure supplied by Dr. Ramesh Kumar Mishra, University of Hyderabad, India; Singh &
Mishra, 2012).
Research Topics in the Visual World Paradigm 103
TABLE 4.1
Questions in visual world eye tracking on lexical processing and word
recognition
1. How does knowing words in more than one language affect word representation and
processing in the bilingual lexicon?
1.1. To what extent does competition between languages depend on the bilingual’s
linguistic profile (e.g., language dominance, daily L1 and L2 use, L1 and L2
vocabulary size, L2 proficiency, age of L2 acquisition)?
1.2. To what extent does competition between languages depend on the bilingual’s
cognitive profile (e.g., inhibitory control, nonverbal intelligence)?
1.3. To what extent do task factors (e.g., language switch, language mode) and item-
level variables (e.g., degree of phonological overlap between words) influence
competition between languages in the bilingual lexicon?
2. To what extent does proficiency in multiple languages influence participants’
inhibitory control in verbal and nonverbal tasks?
3. How do L2 learners at different proficiency levels parse connected speech that
contains an ambiguous word boundary?
lexical candidates compete for recognition. Importantly, these activation and com-
petition effects are reflected in participants’ eye movements to different images
on the screen. Hence, the fine-grained temporal information in eye-movement
records can reveal the subtleties of word recognition. Table 4.1 summarizes some
of the main questions that have guided this line of work.
4.2.2 Prediction
4.2.2.1 What is Prediction?
The online search revealed a total of 16 out of 32 visual world studies that were
categorized as prediction research (see online supplementary materials for sum-
mary tables, Tables S4.1–S4.5). Prediction in language processing refers to the
“pre-activation/retrieval of linguistic input before it is encountered by the lan-
guage comprehender” (Huettig, 2015, p. 122, my emphasis). More generally, pre-
diction is the influence of the preceding context on the current state of the
language processing system (Kuperberg & Jaeger, 2016). In the visual world
paradigm, anticipatory eye movements (see Section 4.1) provide particularly
strong evidence for prediction: the behavioral response (the anticipatory look to
the target) happens before the predictable word appears in the input.The recording
of electrical brain activity in EEG/ERP research (see Section 1.1.4) can simi-
larly provide pure tests of anticipation (e.g., DeLong, Urbach, & Kutas, 2005;
Kuperberg & Jaeger, 2016; Wicha, Moreno, & Kutas, 2004). Many other methods,
however, such as reading and lexical decision, provide only indirect evidence for
prediction because the occurrence of prediction needs to be inferred from data
that are obtained during the processing of the predictable word.
104 Research Topics in the Visual World Paradigm
Consider an idiom such as spill the beans, “to divulge a secret”, in which the
final word is fairly predictable (cf. Carrol & Conklin, 2017). Compelling evidence
for prediction in the visual world paradigm would come from looks to an image
of beans before the onset of the word “beans”. Likewise, reading researchers may
find that beans is processed faster than chips in spill the ____, because spill the chips
is not an idiom and, hence, the final word is less predictable (for reviews of pre-
dictability and idiom processing, see Section 2.5 and Section 3.2.2, respectively).
While the latter finding is still informative, the point is that faster reading times
are the consequences of prediction rather than prediction per se (Huettig, 2015).
The visual world paradigm, in contrast, can capture these true predictive effects
as they are happening. This relatively unique ability, shared only with EEG/ERP
research (see Section 1.1.4), makes visual world eye tracking particularly well
suited to study prediction (for recent reviews, see Huettig, 2015; Kuperberg &
Jaeger, 2016).
The role of prediction in contemporary cognitive theory is difficult to over-
state. Clark (2013), in a highly influential review, argued that action-oriented
prediction may provide a “unified theory of the mind” (p. 200), in which percep-
tion, action, and attention are all linked in a single theoretical account. Prediction
is everywhere. In daily life, drivers can often predict the traffic light patterns they
encounter on their daily commute. Music lovers can generally tell when a song
is about to end. If you are a dancer like me, you will not only predict the song’s
ending but also try to align your dance moves with the predicted ending.
In SLA, prediction can help explain the concept of “noticing the gap”
(Gass, 1997; Schmidt & Frota, 1986), a mechanism for language learning. Second-
language learners are said to notice a gap when they consciously register a mis-
match between what their interlocutor says and how they themselves would have
said it (see Godfroid, 2010, for discussion). In other words, for such noticing to
occur, a listener must have predicted what she would say next in the current
sentence context before the speaker actually says it, so the listener can compare
the spoken form to the form she predicted would come next. When the listener
notices a gap, and she deems her interlocutor to be more proficient, this will trig-
ger an adjustment of her internal prediction mechanisms, which amounts to a
form of L2 learning (e.g., Altmann & Mirković, 2009; Huettig, 2015). Importantly,
on this view, the listener is also a speaker, in the sense that she actively draws on
production processes to predict the upcoming input during comprehension (cf.
Pickering & Garrod, 2013). In Pickering and Garrod’s (2013) words, “producing
and understanding are tightly interwoven, and this interweaving underlies peo-
ple’s ability to predict themselves and each other” (p. 329).
Although prediction is fundamental to how humans act in the world (Clark,
2013; Friston, 2010), the mechanisms that underlie prediction are not yet fully
understood. Huettig (2015) proposed that at least four different mechanisms
can underlie prediction, which he jointly referred to as PACS—Production-,
Association-, Combinatorial-, and Simulation-based prediction. Of
Research Topics in the Visual World Paradigm 105
from a person’s two+ languages, and task-induced processes and strategies (e.g.,
priming effects) as possible sources of divergence in L1/L2 performance. A more
productive approach to studying prediction, therefore, may be to step away from
a strict L1-L2 dichotomy and focus on individual differences instead (for a recent
example with bilinguals, see Peters, Grüter, & Borovsky, 2018). By including cog-
nitive (e.g., working memory) and linguistic factors (e.g., receptive and productive
vocabulary size, overall proficiency), visual world researchers can come to under-
stand the extent to which listeners with different cognitive and linguistic profiles
generate expectations during real-time processing.
Likewise, testing bilinguals in both of their languages is a good way to evaluate
their overall linguistic abilities, including their prediction skills, in a non-deficit
approach (for examples, see Dijkgraaf et al., 2017, and Sekerina & Sauermann,
2015). If bilinguals are able to generate predictions in their L1 or dominant lan-
guage, this makes the important point that they are capable of making predictions
in principle. At the same time, the within-subject research, which involves testing
participants in all of their languages, will let researchers narrow down the factors
that contribute to a potential absence of predictive behavior in the L2 or non-
dominant language (also see Section 5.2).
4.2.2.2 Semantic Prediction
Work on L2 semantic prediction is only now beginning to appear, with two recent
publications leading the way (see online supplementary materials, Table S4.2).
Dijkgraaf et al. (2017) and Ito et al. (2018) have used semantic prediction as a
tool to study more general questions about predictive language processing, as
described in the previous section. In this work, the fact that prediction is based
on semantic cues appears secondary to the larger theoretical goal of uncovering
whether and to what extent L2 speakers engage in prediction. Both Dijkgraaf and
colleagues (2017) and Ito and colleagues (2018) reported evidence that L2 speak-
ers can and do generate predictions during L2 processing, consistent with Kaan’s
(2014) theoretical account (see Section 4.2.2.1). Ito and her colleagues further
examined the cognitive mechanisms that underlie prediction in native and non-
native speakers. Together, these studies contribute important empirical data to
understand prediction and its mediating factors in L2 listening.
Research on L2 semantic prediction extends a long line of semantic predic-
tion research with L1 speakers that originated in psychology with Altmann and
Kamide’s (1999) study (see Section 4.1). Similarly to Altmann and Kamide’s study,
L2 participants listen to simple S-V-O sentences (e.g., Mary reads/steals a letter,
The lady will fold/find the scarf), in which the second noun is either predicted or
not by the semantic information in the verb. Researchers want to know whether
L2 listeners, like L1 listeners, can utilize these semantic restrictions in real time to
anticipate the upcoming noun (i.e., letter or scarf, shown as images on the screen).
Results from Dijkgraaf et al. (2017) and Ito et al. (2018) converged in showing
that unbalanced bilinguals predicted thematic roles to the same extent across their
two languages and/or similarly as monolingual L1 speakers.
Ito et al. (2018) further sought to establish a causal relationship between work-
ing memory resources and predictive processing. The authors showed that antici-
patory eye movements in L1 and L2 listening were similarly delayed when the
participants were concurrently performing a memory task (remembering a list
of words). The authors concluded that “predictive eye movements draw on some
of the cognitive resources that are used for remembering words” (p. 260). Hence,
making predictions—in either L1 or L2—is a process that demands cognitive
resources and may be most likely to occur when such resources are available.
Table 4.2 summarizes the main questions that have guided L2 semantic prediction
research with eye tracking.
108 Research Topics in the Visual World Paradigm
4.2.2.3 Morphosyntactic Prediction
The previous section has shown that L2 speakers are capable of making lin-
guistic predictions in principle (cf. Kaan, 2014). While the meanings of words
may be shared across languages, which may facilitate L2 prediction, many
linguistic phenomena are specific to a given language. The question then
becomes what will happen if cues are not instantiated in a participant’s L1
or are instantiated differently in the L2 than the L1, and therefore cannot
be transferred. This question has attracted considerable interest from L2 and
bilingualism researchers, who have often chosen morphosyntactic (grammar)
phenomena to examine it.
Work on morphosyntactic prediction is the largest substrand of L2 prediction
research and, indeed, the entire visual world paradigm. A total of ten studies have
investigated L2 morphosyntactic prediction in its classic form (see online supple-
mentary materials, Table S4.3). Two additional studies have combined morpho-
syntactic prediction research with an instruction component; these studies will be
reviewed in what follows (see Section 4.2.2.5). Much research was conducted using
gender-based prediction as a test case (Dussias et al., 2013; Grüter, Lew-Williams,
& Fernald, 2012; Hopp, 2013; Hopp & Lemmerth, 2018; Morales, Paolieri, Dussias,
Valdés Kroff, Gerfen, & Teresa Bajo, 2016). Grammatical gender lends itself well
to studying prediction because it creates agreement relationships between nouns,
articles, and adjectives.7 Gender marking on the article (e.g., el zapato, “theMASC.
shoe”) or adjective (e.g., ein grosser Wecker, “a bigMASC. alarm clock”) provides a cue
to the upcoming noun. If listeners have the grammatical knowledge and cognitive
resources to use this cue, they could anticipate the noun; that is, make a gender-
based prediction. Through different L1-L2 pairings, researchers can further investi-
gate what happens when grammatical gender is absent from speakers’ L1, as is the
case in English, or represented very similarly, as in Spanish and Italian (Dussias et al.,
2013; Morales et al., 2016). They can also examine whether L1 gender is activated
during L2 gender processing (Hopp & Lemmerth, 2018; Morales et al., 2016).
Research on gender-based prediction came to SLA through a series of three
studies conducted in Anne Fernald’s lab. Lew-Williams and Fernald (2007) showed
that Spanish native speakers, even at a young age, are able to use the grammati-
cal information in Spanish articles to predict upcoming nouns. Lew-Williams and
Fernald (2010) extended their study to adult classroom-based learners of Spanish.
They found no evidence of prediction with familiar nouns in their learner data.
Grüter et al. (2012) similarly found limited evidence of L2 prediction with familiar
article-noun combinations, this time in a group of highly advanced to near-native
L2 Spanish speakers. In all three studies, participants heard instructions containing
nouns preceded by gender-marked articles (e.g., ¿Dónde está la pelota? “Where is
theFEM. ball?”). Of special interest were the trials where the display depicted two
objects of different genders (e.g., la pelota “theFEM. ball” and el zapato “theMASC. shoe”):
see Figure 4.9, for an example. These trials, which are referred to as different-
gender trials, allow for prediction, because the gender-marked article uniquely
Research Topics in the Visual World Paradigm 109
FIGURE 4.9
Display used in gender prediction experiments. Because el zapato,
“theMASC. shoe”, and la pelota, “theFEM. ball”, differ in grammatical gender,
the article el or la acts as a predictive cue for the following noun.
(Source: Figures supplied by Dr. Casey Lew-Williams, Princeton University).
identifies the following noun. If listeners are able to use grammatical gender as a cue,
they will look at the target image faster in different- compared to same-gender trials.
A similar approach—comparing different- with same-gender trials—underlies
other gender prediction studies as well (Dussias et al., 2013; Hopp, 2013, 2016;
Hopp & Lemmerth, 2018; Morales et al., 2016). Overall, results for gender-based
prediction have been mixed, with researchers generally reporting evidence of
prediction in highly proficient speakers (Dussias et al., 2013; Hopp, 2013; Hopp
& Lemmerth, 2018; but see Grüter et al., 2012) but not in less proficient speakers
(Dussias et al., 2013; Hopp, 2013; Lew-Williams & Fernald, 2010). Interestingly,
prediction does occur more consistently if learners are trained on the article–
noun combinations first, as if they were learning new vocabulary (Grüter et al.,
2012; Hopp, 2016; Lew-Williams & Fernald, 2010). Together, these results high-
light the role of L2 proficiency level in prediction, which itself may be related to
the amount and the type of input learners have received and the environment in
which learning takes place.
Because gender-based prediction provides such a neat paradigm, researchers
have adopted it to study the role of moderating variables. I already mentioned the
role of L2 proficiency (see previous paragraph). Other factors include the relation
between production and comprehension (Grüter et al., 2012; Hopp, 2013, 2016)
and L1 background (Dussias et al., 2013; Hopp & Lemmerth, 2018; Morales et
al., 2016). Regarding the comprehension–production relationship, Grüter et al.
(2012) triangulated data from three measures: offline comprehension (sentence
picture matching), online production (elicited imitation), and online compre-
hension (visual world eye tracking). They showed that highly advanced speakers’
occasional errors in production were mirrored in online comprehension in the
form of weaker prediction effects. This suggested it is the real-time retrieval of
gender information—whether productively or receptively—that is difficult. Also
focusing on the production–comprehension relationship, Hopp (2013) asked L1
110 Research Topics in the Visual World Paradigm
and L2 German speakers to prename the images (along with their determiner
or gender-marked adjective) that appeared in the visual world displays of the
subsequent experiment. Hopp found that only those L2 speakers who consist-
ently assigned the correct grammatical gender to the images engaged in predic-
tive processing in the visual world experiment (also see Hopp, 2016). Given that
most production errors in Grüter et al.’s study were also gender assignment errors
(e.g., *el pelota instead of la pelota, “theFEM. ball”), these two studies underscore the
importance of robust lexical knowledge for gender-based prediction.
Lastly, researchers have also investigated L1 transfer effects in L2 gender pre-
diction (Dussias et al., 2013; Hopp & Lemmerth, 2018; Morales et al., 2016).
Research by Morales et al. (2016) and Hopp and Lemmerth (2018) suggests a
role for gender congruency in anticipatory processing. For instance, Morales et al.
(2016), in a Spanish language experiment, found that Italian learners of Spanish
looked at the target object more when it had the same gender in participants’ L1
Italian as L2 Spanish, for example ilMASC. formaggio and elMASC. queso, “the cheese”
(also see Section 9.3.1, research idea #4). Hopp and Lemmerth (2018) reported
nativelike prediction for high-intermediate Russian learners of German, but only
when gender was marked syntactically in the same way in the two languages (i.e.,
on adjectives, not articles). Finally, the role of typological distance has yet to be
examined more systematically; however, results from Dussias et al. (2013) for L1
Italian–low proficiency L2 Spanish speakers suggest typological similarity could
aid in prediction.
Taken together, the different substrands of gender prediction research converge
in showing that gender-based predictive processing relies on the robust encoding
of grammatical gender on individual lexical items. Highly proficient learners and
learners who master gender assignment tend to demonstrate “nativelike” prediction.
For other learners, whose lexical representations are perhaps less stable, performance
is more subject to L1 influence. Depending on the L1-L2 relationship, these learn-
ers will either be helped or hindered in L2 predictive processing by their native
language.
In recent years, work on morphosyntactic prediction has expanded to other
target structures and languages. There is now also prediction research on case
marking (Mitsugi, 2017; Mitsugi & MacWhinney, 2016; Suzuki, 2017; Suzuki &
DeKeyser, 2017), classifiers (Suzuki, 2017; Suzuki & DeKeyser, 2017), and definite
and indefinite articles (Trenkic et al., 2014). This list is likely to keep growing in
the following years, as researchers identify new grammatical phenomena that lend
themselves to making predictions.
An interesting case in point is prediction in L2 Japanese, a verb-final lan-
guage (Mitsugi, 2017; Mitsugi & MacWhinney, 2016; Suzuki, 2017; Suzuki &
DeKeyser, 2017).The verb-final status of Japanese allows for studying prediction
based on cues other than the verb, unlike in English and other head-initial lan-
guages, where the verb assumes a central role in enabling predictions (for exam-
ples, see Section 4.2.2). For instance, in a replication of Kamide et al. (2003)
Research Topics in the Visual World Paradigm 111
with L2 learners, Mitsugi and MacWhinney (2016) used Japanese sentences that
translate as (1) and (2):
Of interest was whether L1 and L2 Japanese speakers would use the case mark-
ers, which appear as postpositions in the noun phrases, to assign thematic roles
in real time. The two sentences were paired with the same four-image display
(see Figure 4.10, for a reconstruction); however, only the ditransitive sentence
enabled listeners to anticipate the theme (e.g., the exam paper) as the third
verbal argument in the sentence based on the agent-goal combination. Mitsugi
and MacWhinney (2016) and especially Mitsugi (2017) found that L1 Japanese
speakers used case markers incrementally and predictively (cf. Kamide et al.,
2003); that is, they did not wait until the verb to build a sentence structure.
Third- and fourth-year university students of Japanese were delayed in their
processing and did not generate predictions, perhaps because they did not have
the time to do so.
A recurring theme in the L2 prediction literature is that listeners need accurate
linguistic knowledge and fast processing skills to make predictions. Because the
visual world paradigm emphasizes real-time, meaning-focused processing, Suzuki
(2017) and Suzuki and DeKeyser (2017) argued that prediction is a reflection
of learners’ implicit knowledge (also see Andringa & Curcic, 2015; Godfroid &
Winke, 2015). Implicit knowledge can be deployed rapidly and without awareness
(e.g., Williams, 2009) and because of this, it is often regarded as key to communi-
cative competence. Using data from confirmatory factor analysis and individual
differences measures, Suzuki and Suzuki and DeKeyser showed patterns of asso-
ciation between prediction in the visual world paradigm and other measures of
implicit knowledge and implicit learning aptitude. The researchers thus went one
step further than previous authors in arguing, not only that prediction reflects
linguistic knowledge, but specifying that knowledge as unconscious-implicit in
nature (contrast with Huettig, 2015). Future researchers will need to confirm
or disconfirm Suzuki’s (2017) and Suzuki and DeKeyser’s (2017) evidence, for
instance by triangulating visual world data with verbal measures, to probe L2
speakers’ awareness and strategic processing more directly. No doubt the question
of what drives prediction in L2 processing will invite more research in the years
to come (also see Section 4.2.2.1). Table 4.2 summarizes the main questions that
have guided L2 morphosyntactic prediction research with eye tracking.
4.2.2.5 Effects of Instruction
A total of four studies have examined whether instruction can influence real-time,
predictive language processing and the retrieval of lexical knowledge (see online
supplementary materials, Table S4.5). Given that prediction reflects linguistic
knowledge (see Section 4.2.2.3), adding an instruction component to a prediction
study can reveal whether prediction can be trained, how prediction develops over
time with exposure to input, and whether explicit instruction can speed up the
process of learning to predict. Understanding the effects of instruction on real-
time language processing is both theoretically and practically important. Research
on instruction can inform the interface hypothesis—that is, how explicit instruc-
tion, explicit knowledge, and implicit knowledge are related (Andringa & Curcic,
2015)—the relationship between production and comprehension (Hopp, 2016),
114 Research Topics in the Visual World Paradigm
and L2 vocabulary learning and teaching (Bolger & Zapata, 2011; Kohlstedt &
Mani, 2018).
Visual world studies that focus on instruction generally consist of a learning
phase and a testing phase. During training, the participants first learn the targeted
grammatical (Andringa & Curcic, 2015; Hopp, 2016) or lexical (Bolger & Zapata,
2011) knowledge. Then, they take part in a visual world experiment, which is
the testing phase and measures the outcomes of training on the participants’ real-
time language processing (also see Section 3.2.3, for similar eye-tracking research
with text). Taking a slightly different approach, Kohlstedt and Mani (2018) inte-
grated both the learning task and the outcome measure into the same visual world
experiment and, in doing so, they were able to obtain fine-grained information
about the participants’ learning trajectory.
Andringa and Curcic (2015) were interested in the extent to which metalin-
guistic information (i.e., the provision of a grammar rule) could enhance implicit
processing of a morphosyntactic structure in Esperanto, an artificial language.The
authors found that neither implicit instruction in Esperanto (listening to sen-
tences that contained the target structure) nor a combination of implicit and
explicit instruction (listening + rule provision) resulted in predictive processing,
although the explicit instruction group, who were taught the rule, did perform
better on a separate measure of explicit knowledge. In contrast, Hopp (2016)
reported that intermediate-level English learners of German were able to make
gender-based predictions after explicit vocabulary instruction (also see Grüter et
al., 2012; Lew-Williams & Fernald, 2010). The participants listened to, saw, and
repeated article-noun combinations three times (e.g., der Käse “theMASC. cheese”)
and then produced the article-noun combinations themselves before they took
part in a visual world post-test. Prediction at post-test correlated with accuracy in
the production task, which pointed to a close association between production and
online comprehension (also see Hopp, 2013).
One difference between Andringa and Curcic (2015) and Hopp (2016) is par-
ticipants’ prior familiarity with the target language and vocabulary. Different from
the German learners in Hopp (2016), participants in Andringa and Curcic (2015)
had no prior exposure to Esperanto. Simply comprehending the sentences may
thus have placed a heavy burden on their working memory, leaving no room
for morphosyntactic prediction, even if participants knew the rule. To test this
hypothesis, researchers could adopt Andringa and Curcic’s target structure (dif-
ferential object marking) and test it in Spanish, which has the same grammatical
structure (Andringa & Curcic, 2016).
Two studies investigated lexical processing or vocabulary learning. Bolger
and Zapata (2011) were interested in how presenting new words in semantically
related or semantically unrelated story contexts influences vocabulary learning.
The authors hypothesized, based on previous research, that grouping words in
semantic sets (e.g., all terms for food, types of animals, or colors) might inhibit
vocabulary learning due to overly strong connections between the words. This is
Research Topics in the Visual World Paradigm 115
FIGURE 4.11
Display used in a vocabulary learning experiment. In biasing story
contexts, the prime word Opa,“grandfather”, invited looks to the image
of the cane (Gehstock or the pseudoword Ausfrieb in German), which
was the target word in the story.
(Source: Kohlstedt and Mani, 2018).
116 Research Topics in the Visual World Paradigm
In sum, instruction research in the visual world paradigm has shown the ben-
efits (Hopp, 2016; Kohlstedt & Mani, 2018) and limits (Andringa & Curcic, 2015;
Bolger & Zapata, 2011) of input and instruction on real-time language processing.
Because online processing in the visual world paradigm “does not readily allow
for the application of explicit knowledge” (Andringa & Curcic, 2015, p. 237), the
paradigm allows for the testing of theoretically interesting questions such as the
interface hypothesis. Findings may also have pedagogical implications, for instance
for L2 vocabulary instruction, by demonstrating the benefits of clustering vocabu-
lary thematically, rather than semantically, and embedding it in rich discourse
contexts. Research also benefits from the triangulation of online eye-tracking data
with offline measures, such as grammaticality judgment tests or vocabulary tests, as
participants’ performance may vary dramatically in these contexts. Table 4.2 sum-
marizes the main questions that have guided research on instruction in the visual
world paradigm and all other areas of prediction research.
4.2.3 Referential Processing
Four studies have investigated real-time sentence interpretation in ambiguous or
semantically complex sentences (see online supplementary materials, Table S4.6).
Compared to the prediction research reviewed previously (see Section 4.2.2), this
work represents somewhat of a conceptual shift, because the focus is no longer on
listeners’ anticipation of upcoming linguistic information. Instead, many research-
ers examine how listeners establish reference (linking language with the outside
world) when more than one potential referent is given. For example, the word frog
may refer to one of two frogs shown on the screen or the pronoun he may refer to
one of two male characters. To establish reference, listeners will normally need to
hear the critical word (e.g., frog, he) in the input first; that is, there is typically no
anticipation of the referent. The time windows used for analysis will be defined
accordingly: they will either align with or follow—but not typically precede—the
onset of the critical word (see Section 6.3.2.2). Secondly, work on referential pro-
cessing relies on additional tasks (see Section 5.4), such as moving an object (Kim
et al., 2015; Pozzan & Trueswell, 2016) or answering a comprehension question
(Cunnings et al., 2017; Sekerina & Sauermann, 2015). These tasks are a key com-
ponent of this research: by comparing participants’ eye movements with their final
decision (revealed in the additional task) researchers can determine the extent
to which the processes and product of sentence interpretation are in agreement.
As Tanenhaus and Trueswell (2006) noted, “introducing a referential world that
is co-present with the unfolding language, naturally highlights … questions about
reference” (p. 883). One such question is how listeners parse sentences in the
context of referential ambiguity, for instance when there are two potential refer-
ents (e.g., two frogs or two male characters) on the screen. This question inspired
pioneering research on syntactic ambiguity resolution with adult L1 speakers by
Tanenhaus et al. (1995), whose study was reviewed in Section 4.1. This work has
Research Topics in the Visual World Paradigm 117
since been extended to child L1 speakers (Trueswell et al., 1999) and, of relevance
here, adult L2 speakers (Pozzan & Trueswell, 2016).
Pozzan and Trueswell (2016) compared L1 Italian–L2 English speakers’ parsing
skills to those of child L1 English speakers who participated in a previous study
by Trueswell et al. (1999). Both participant groups were language learners; how-
ever, only the adults in Pozzan and Trueswell’s study had fully developed execu-
tive skills, which could potentially help them recover from a syntactic ambiguity.
This is the hypothesis the researchers wanted to test. The participants acted out
spoken instructions such as Put the frog on the napkin onto the box.This sentence is a
garden-path sentence: listeners are led to believe on the napkin is the goal of the action
until they hear the second prepositional phrase and revision is necessary (the
frog, which is on the napkin, goes on the box). When only one frog (one poten-
tial referent) was present on the screen, L2 English speakers selected the wrong
goal (i.e., the napkin) in nearly half of the trials, showing they had difficulty in
updating their initial interpretations, just like native children. Revision difficulties,
therefore, are partly a learning phenomenon (also see Cunnings et al., 2017) that
is attested across learners with differing levels of cognitive maturity.
Another question is how listeners link pronouns to their antecedents to estab-
lish co-reference within a sentence or between sentences. In a sentence such as
Before Lizzi drives to East Lansing, shei takes heri dog for a walk, the pronouns she and
her and the proper name Lizz are co-referential: they all refer to the same real-life
person. Kim et al. (2015) and Cunnings et al. (2017) used the visual world para-
digm to study co-reference in L2 speakers, with interesting results. Cunnings et
al. (2017) examined the effects of L1 background on subject pronoun resolution.
Participants in the study were L1 Greek–L2 English speakers, L1 English speak-
ers, and L1 Greek speakers, who listened to sentences in English or Greek such
as the following:
(3) (a) While Peter helped Mr Smith by the sink in the kitchen, he carefully
cleaned the big cup that was dirty.
(b) While Mr Smith helped Peter by the sink in the kitchen, he carefully
cleaned the big cup that was dirty.
In a null subject language such as Greek, the overt pronoun aftós, “he”, in the
main clause indicates a shift in topic (i.e., from subject to direct object); hence
aftós normally refers to Mr. Smith in (3a) and Peter in (3b). In English, the inter-
pretation is reversed, given that pronouns more commonly refer to the cur-
rent discourse topic (typically the subject). The question then becomes whether
L1-Greek L2-English speakers will process English sentences according to Greek
grammar, English grammar, or a hybrid of the two. Eye fixations to images
of Peter and Mr. Smith, the two possible antecedents depicted on the screen
(see Figure 4.12), revealed listeners’ evolving preferences for either a subject or
an object antecedent for the he/aftós pronoun. Like English monolinguals, and
118 Research Topics in the Visual World Paradigm
FIGURE 4.12
Display used in a referential processing study. Participants heard
sentences such as While Peter helped Mr Smith by the sink in the kitchen,
he …, in which the referent of the personal pronoun he was ambiguous.
(Source: Image supplied by Dr. Ian Cunnings, University of Reading, UK; Cunnings et al., 2017).
Table 4.3 summarizes the main questions that have guided referential processing
research in the visual world paradigm.
4.2.4 Production
The synthetic review also yielded a total of six eye-tracking studies involving oral
production (see online supplementary materials,Table S4.7). Although these stud-
ies focus on speech production and interaction, which sets them apart from the
comprehension studies reviewed previously, the visual world paradigm and eye-
tracking research on production have “obvious similarities” (Huettig et al., 2011,
p. 152).Therefore, following Huettig et al. (2011), I too shall conclude the present
review with an overview of production research.
Eye-tracking research on L1 production has revealed a “tight temporal link
between eye movements and speech planning” (Huettig et al., 2011, p. 165). In
most production studies, participants are asked to describe a scene or name pictures
on the screen. Their eye gaze provides a measure of visual attention (see Sections
1.2 and 2.6), which reveals how the speakers extract visual information from the
display to fulfill their goals. Looking at early, preverbal stages of speech produc-
tion (i.e., message generation), Flecken and colleagues examined how L2 speakers
and bilinguals conceptualize events before and as they verbalize them (Flecken,
2011; Flecken et al., 2015). Lee and Winke (2018), working in child language
assessment, examined where English language learners direct their eye gaze dur-
ing speech disfluencies such as pauses and hesitation phenomena.9 Kaushanskaya
and Marian (2007) extended reading and listening research on the bilingual lexi-
con (see Sections 3.2.2 and 4.2.1) to speech production and investigated how
L1 orthography and phonology might interfere with L2 picture naming. Finally,
McDonough and her team (McDonough et al., 2015, 2017) looked at joint atten-
tion in interaction as a potential language learning mechanism. Together, these
Research Topics in the Visual World Paradigm 121
studies have extended the use of eye tracking from language comprehension to
production, highlighting close links between a participant’s eye gaze and their
productive language processing.
Flecken and her colleagues conducted two cross-linguistic comparisons of
event conceptualization, which is how speakers segment and select information
to comprehend and interpret an event (Flecken, 2011; Flecken et al., 2015). This
line of work can inform the debate on language and cognition (for reviews, see
Lupyan, 2016; Zlatev & Blomberg, 2015)—namely, whether and to what extent lan-
guage-specific properties shape how humans conceptualize experiences. Different
from previous cross-linguistic eye-tracking research (e.g., Papafragou, Hulbert, &
Trueswell, 2008), Flecken and her colleagues focused specifically on bilinguals and
L2 speakers, whose event conceptualization can potentially be influenced by more
than one language.The participants in both studies viewed short video clips of eve-
ryday events, which they were asked to describe (see Figure 4.14, for an example of
a still image).The researchers analyzed participants’ verbal productions, for instance
for use of the progressive aspect (Flecken, 2011) or different kinds of motion verbs
(Flecken et al., 2015). They then related participants’ linguistic choices to their
eye fixations on different areas on the screen. Flecken et al. (2015) found that L2
German speakers with an L1 French background inspected scenes similarly to L1
French monolinguals. This suggested that although the advanced L2 speakers had
FIGURE 4.14
Motion event used in an oral production study. Participants viewed
a video clip of a pedestrian walking toward a car and were asked to
describe the event in French or German. Note: The boxes represent
interest areas.
(Source: Reprinted from Flecken, M., Weimar, K., Carroll, M., & Von Stutterheim, C., 2015. Driving
along the road or heading for the village? Differences underlying motion event encoding in French,
German, and French-German L2 users. The Modern Language Journal, 99, 100–122, with permission
from Wiley. © 2015 The Modern Language Journal).
122 Research Topics in the Visual World Paradigm
acquired most of the lexical means for describing motion events in L2 German,
their event conceptualization still showed a strong L1 influence.
Moving beyond monologic tasks, McDonough et al. (2015, 2017) were among
the first in SLA and bilingualism to use eye tracking in face-to-face interaction
(also see Gullberg & Holmqvist, 1999, 2006). In two studies, McDonough and
colleagues focused on joint attention during interaction, which they defined as
“the human capacity to coordinate attention with a social partner” (McDonough
et al., 2017, p. 853) using visual cues such as gesture and eye gaze. The researchers
measured three kinds of joint attention—the L2 speaker’s self-initiated eye gaze,
their interlocutor’s other-initiated eye gaze, and mutual eye gaze (i.e., shared eye
contact) between both speakers (see Figure 4.15). In both studies, the research-
ers found that the length of L2 participants’ self-initiated eye gaze predicted
the outcome variable: a greater likelihood of responding correctly to feedback
(McDonough et al., 2015) and more pattern learning (McDonough et al., 2017).
McDonough et al. (2015) also found a positive effect of mutual eye gaze.
These studies have thus begun to illuminate the role of a nonverbal cue in
language learning and strongly invite further research along the same lines, for
instance on gestures (see Section 9.3.1, research ideas #7 and #8). In light of the
naturalistic tasks and high ecological validity of this work, I see great potential
in the application of eye-tracking methodology to interaction research (also see
FIGURE 4.15
Eye-tracker set up in an oral production study. Second-language
learners were paired up with a research assistant to perform one-on-
one interactive tasks while eye-tracking cameras recorded their eye
movements.
(Source: Image supplied by Dr. Dustin Crowther, University of Hawai’i).
Research Topics in the Visual World Paradigm 123
1. To what extent do the language(s) that people use shape their perception and
interpretation of events?
2. What are the visual markers of speech disfluencies in language assessment?
3. What are the consequences of having an integrated bilingual lexicon for L2 speech
production? Do the orthography and phonology of the not-in-use language (the L1)
interfere with speech production (picture naming) in the other language?
4. To what extent does interlocutors’ eye gaze, as an index of joint attention, relate to
successful interaction and the initial stages of L2 grammar learning?
Brône & Oben, 2018). Table 4.4 summarizes the main questions that have guided
eye-tracking research on L2 production.
4.3 Conclusion
This chapter has offered a bird’s eye view of the 32 visual world studies pub-
lished in SLA and bilingualism between 2003 and 2017. Complementing the
review of the 52 text-based studies in Chapter 3, this chapter showcases how the
visual world paradigm can help advance different research areas.Visual world eye
tracking has grown into a full-fledged paradigm for studying spoken language
processing. This is a non-trivial matter given that many other measures of spo-
ken language processing are metalinguistic in nature, provide only a snapshot of
processing, rather than full time-course data, and may interrupt the speech input
(Tanenhaus & Trueswell, 2006). Through a thoughtful integration of visuals and
spoken language, researchers can study questions from the lowest, sublexical lev-
els, to word recognition, (morpho)syntax, and semantics, all the way up to the
discourse level. The paradigm places questions about the temporal and referential
aspects of spoken language processing front and center (see Section 4.1). It has
also provided key data on the rapidly expanding area of prediction in language
processing (see Sections 4.1 and 4.2.2.1). Bilingualism and SLA researchers have
embraced these methodological affordances in their research, with valuable results.
At this juncture, it is good to look back to the ground covered and look ahead
to what the paradigm may bring to SLA and bilingualism in the following years.
Compared with text-based research, the use of the visual world paradigm in
SLA and bilingualism research is a more recent development, and especially so
in SLA. The present synthetic review revealed a strong thematic overlap with
research in psychology, potentially reflecting the origins of the paradigm in
this neighboring discipline. Research questions addressed so far fall under four
rubrics—word recognition, prediction, referential processing, and production—all
of which are general themes in language and cognition (see Sections 4.2.1–4.2.4).
Bilinguals and L2 speakers are interesting study populations to include in these
124 Research Topics in the Visual World Paradigm
Notes
1 Two common reading behaviors, skips and regressions, both cause readers to depart
from a strictly linear, word-by-word reading pattern.
2 In these experiments, the nouns that are underlined were depicted as images on the
screen.This enabled the researchers to contrast looks to the different objects (e.g., bread
versus man or motorbike versus carousel) as a result of the different verb types.
3 The difference in looks in the future-tense condition was only statistically significant
in a second, revised version of the experiment.
4 Studies that were available online first in 2017 had a 2018 publication date.
5 To assess the degree of overlap between the two research communities, I conducted
an additional search in three psychology-oriented journals: Journal of Memory and
Language, Cognition, and Language, Cognition, and Neuroscience. This yielded 15 visual
world studies with bilinguals. The topics covered ranged from the bilingual lexicon, to
morphosyntactic prediction, and production. These strands appear to be similar to the
present synthetic review of SLA and bilingualism research and will be covered in the
remainder of this chapter.
6 Gender-based prediction refers to participants’ use of gender-marked cues (articles or
adjectives) in the input to anticipate which noun is likely to come next, for instance
looks at a shoe, zapatoMASC in Spanish, based on the preceding article elMASC (cf. Grüter,
Lew-Williams, & Fernald, 2012).
7 More formally, gender agreement occurs between a trigger (generally a noun) and mul-
tiple targets (e.g., articles, adjectives, pronouns, demonstratives, and past participles). In
visual world research, it is the article-noun and article-adjective-noun relations that
have been studied the most.
8 The abbreviations in subscript refer to case, expressed in Japanese through case mark-
ers after the noun. LOC = locative case, NOM = nominative case, DAT = dative case,
ACC = accusative case.
9 Readers interested in Lee and Winke’s (2018) study are referred to the assessment
strand in Section 3.2.5.
5
GENERAL PRINCIPLES OF
EXPERIMENTAL DESIGN
There are many roads that can lead a researcher to engage in eye tracking. Some
readers of this book will have conducted multiple studies before turning to eye-
tracking research. They will have extensive experience with experimental design
and research methodology but may be new to the particulars of eye tracking.
Other readers will be relatively new to the process of experimental research. If
this is you, you may find yourself needing to learn basic principles of experimen-
tal design, as well as information and guidelines specific to eye tracking. If that
is the case, this chapter is just right for you. In this chapter, I set the stage for the
eye-tracking-specific guidelines that will come next, in Chapter 6. Starting from
what an item is (see Section 5.1), I will describe the creation of item lists for within-
and between-subjects research designs (see Section 5.2), the different types of trials
within a study (see Section 5.3), the distinction between primary and secondary
tasks (see Section 5.4), and, finally, give guidelines for how many items per condi-
tion to include (see Section 5.5). Having a solid grasp of these different concepts
will make you a better quantitative researcher. It will also make you a better
eye-tracking researcher because good eye-tracking research builds upon general
principles of experimental design (see, e.g., Kerlinger & Lee, 2000, for a stand-
ard methodological reference). Therefore, let us take a methodological excursion
together and explore the basics of experimental design.
experiments, the materials can be sentences (for reviews, see Keating & Jegerski, 2015;
Roberts & Siyanova-Chanturia, 2013), images, videos, webpages, or other types of
visual displays, depending on the nature of the study. To a large extent, experimental
materials shape the experience participants will have during a study.
An important step, therefore, is designing sound and reliable materials for the
study. At first blush, this may seem to boil down to compiling or creating a long list
of materials. Although finding or creating materials is generally a part of the process,
in practice, a researcher’s job seldom ends here. Researchers usually create multiple
versions of the same materials, manipulating one or more aspects of the item to see
how this change affects the outcome (dependent variable). This process of creat-
ing multiple versions is referred to here as reduplication. Together, the different
versions of an item represent the independent variables that are of interest in a
study. Descriptions of how to create multiple versions apply specifically to categori-
cal variables; that is, variables with a limited set of possible values, such as input
enhancement (enhanced, unenhanced); word status (word, pseudoword); article (def-
inite, indefinite); feedback type (implicit, explicit, no feedback); or task complexity
(simple, complex).Therefore, reduplicating items, as described in this section, is most
common in ANOVA-type designs, in which researchers manipulate a limited num-
ber of categorical variables.The guidelines also apply to regression analysis and other
correlation-based designs, as long as the set of independent variables includes at least
one categorical variable. In the following, we will see many different examples of
how researchers map categorical variables onto different item versions. For a review
of the different variable types referred to in this chapter, see Textbox 5.1.
Confounding variable: any variable that is associated with both the inde-
pendent and the dependent variable and is not accounted for in the
research design or statistical analysis. Confounding variables bias experi-
mental results and can undermine the validity of a study, for instance
text difficulty (do the two texts differ in other ways than the presence or
absence of word spacing?).
128 General Principles of Experimental Design
Categorical variable: any variable that can take a limited number of pos-
sible options or values, such as feedback type. Each value represents a dis-
tinct category in the world that can be described qualitatively, for instance
recasts, prompts, and no feedback. Examples: text type (spaced vs. uns-
paced), feedback (recasts vs. prompts vs. no feedback)
Continuous variable: any variable that can be measured quantitatively on
a continuum. There could, in theory, be any number of values. Examples:
reading time, reaction time.
Say you are interested in whether L2 Chinese learners can read spaced text faster
than unspaced text (Chinese text is normally unspaced). To investigate this, your
study should naturally contain sentences with spacing and sentences without.
Ideally, however, the spacing manipulation should be applied to the same set of
sentences, so other factors (confounding variables) such as the words used in
the sentence and the overall sentence difficulty do not play a role. Simply put,
each sentence should have both a spaced and an unspaced version.The number of
versions of a stimulus (in this case, a sentence) depends on the variables that are of
interest in the study. As a rule, there should be as many item versions as there are
levels in your categorical variable. For instance, in the spacing study, the categori-
cal variable (spacing) has two levels: spaced text and unspaced text. Hence, there
should be two versions of each sentence.
Although there is no theoretical limit on how many levels a variable can
have, for practical reasons most variables tend to have two, three, or four levels.
Accordingly, the items used to measure them will be organized into doublets
(two levels), triplets (three levels), or quadruplets (four levels). Figure 5.1 provides
a graphical representation of the most commonly used item types. Doublets
are pairs of stimuli that represent a two-level categorical variable. Triplets are
groups of three of the same stimulus that represent a three-level categorical vari-
able. Lastly, quadruplets are four different versions of the same stimulus which,
together, represent a four-level categorical variable or the combination of two
independent variables (both categorical) with two levels each. To illustrate how
this works in an actual study, I will walk you through some examples modified
from existing text-based and visual world studies.
Godfroid, Ahn, Rebuschat, and Dienes (in prep.) wanted to investigate
the acquisition of L2 syntax by L1 English speakers. They conducted an eye-
tracking experiment based on Rebuschat’s (2008) semi-artificial language (also
see Rebuschat & Williams, 2012), in which English words have been rearranged
according to German syntax (e.g., Yesterday scribbled David a long letter to his family,
mirroring the German sentence Gestern kritzelte David einen langen Brief an seine
General Principles of Experimental Design 129
FIGURE 5.1 Common item types: doublets (two levels), triplets (three levels), and
quadruplets (four levels). An item should have as many versions as there
are levels in your independent variable.
Familie). In order to obtain baseline reading data, they also ran a control condition
in which participants read the same sentences with normal English word order
(e.g., Yesterday David scribbled a long letter to his family). The independent variable
in this study was word order. It had two levels: German word order and English
word order. Accordingly, the experimental items were doublets—sentences were
presented in either German syntax (verb-second word order, experimental group)
or English syntax (subject-verb word order, control group). A third condition
(not included in Godfroid et al.’s study) could examine the acquisition of verb-
final word order found in Subject-Object-Verb languages such as Korean and
Japanese (e.g., Yesterday David a long letter to his family scribbled). Doing so would
simply mean adding a branch to the item tree, so we now have triplets instead of
doublets.
A plausible follow-up to this study could look at the role of instruction. Might
bolding and underlining the verbs help learners acquire the structure? To address
this question, the researchers would introduce a new variable into the design,
namely input enhancement. We can represent this move by adding a new level to
the item tree (see Figure 5.2). Because input enhancement is normally operation-
alized as a two-level independent variable (enhancement, no enhancement), each
item version should branch two-ways. Thus, doublets become quadruplets and
triplets would become sextuplets (see Figure 5.2).
The principles of stimulus reduplication apply to text-based and visual-world
eye-tracking research alike (see Chapters 3 and 4, for reviews). They also apply to
the different strands of research within these broad paradigms, with the excep-
tion of observational research (e.g., Godfroid et al., 2018; Lee & Winke, 2018;
McCray & Brunfaut, 2018; McDonough, Crowther, Kielstra, & Trofimovich, 2015;
McDonough, Trofimovich, Dao, & Dion, 2017). Therefore, as long as researchers
130 General Principles of Experimental Design
are manipulating task or text properties, using ANOVA or regression with dummy
variables, they will benefit from applying the same manipulation to the same stimuli.
This appears straightforward for sentence-processing research, such as the grammar
study described previously. However, it is worth emphasizing that “cloning” stimuli
is not the prerogative of sentence-processing researchers. Researchers who work
with picture prompts, videos, or larger instructional materials can also benefit from
creating two (or more) versions of the same stimulus. Doing so will earn them
better experimental control and a study that has higher internal validity (i.e., the
study findings reflect what the researchers intended to study).
As seen in Chapter 4, researchers in the visual world paradigm utilize auditory
input in conjunction with images. This opens up additional possibilities for mate-
rials design besides the manipulation of linguistic input on which we have focused
thus far. Specifically, visual world researchers can manipulate linguistic-auditory
and visual input sources independently, which gives them more options when
creating materials. Even though there are more options theoretically, oftentimes
the research questions will guide visual world researchers in what to do. Broadly
speaking, there are three ways to go about creating items for a visual world exper-
iment: (1) manipulate linguistic-auditory stimuli while keeping the visuals con-
stant, (2) manipulate visual stimuli while keeping the linguistic-auditory stimuli
constant, and lastly (3) manipulate both the visual and the linguistic-auditory
stimuli. To illustrate these approaches, I will introduce three representative studies
from across the spectrum of visual world research (also see Section 6.3.1.1).
Kohlstedt and Mani (2018) examined L1 and L2 speakers’ ability to infer the
meaning of unknown words from a spoken story context (see Section 4.2.2.5).
The authors compared the processing of words and pseudo words in biasing
(semantically informative) and neutral contexts, for instance the target word
cane paired with a story about a grandfather (semantically biasing) or a closet
General Principles of Experimental Design 131
FIGURE 5.3
A quadruplet. Four different versions of the audio were presented
together with the same display.
(Source: Kohlstedt and Mani, 2018).
132 General Principles of Experimental Design
trials, one of these three objects was an item of which the English name overlapped
with the target word (e.g., shovel - shark). Between-language competitor trials
included an object of which the Russian name overlapped with the target word
(e.g., shovel - xarik [SArik], “balloon”). Lastly, the simultaneous competition
condition included both Russian and English competitor objects.Thus, in all there
were four conditions. Hence, each target word was tested by means of a quadruplet.
The quadruplet was realized through the visual display, rather than the contents of
the spoken sentences as previously shown in the Kohlstedt and Mani (2018) exam-
ple. With such a design, Marian and Spivey (2003a, 2003b) were able to test, and
adduce evidence for, the co-activation of words in the bilingual lexicon, regardless
of whether the words belonged to the same language or a different language than
the spoken input.
Lastly, visual world researchers can combine the previous possibilities and
manipulate both visual and linguistic-auditory input when creating materials.This
is what Trenkic, Mirković, and Altmann (2014) did, in a study on L1 Chinese–L2
English speakers’ processing of definite (i.e., the) and indefinite (i.e., a) determin-
ers (also see Sections 4.2.2.3, 5.2, and 6.1.3.2). For the linguistic-auditory stimuli,
the researchers created two versions of each item, which makes for a doublet:
As for the visual-imagery stimuli, the authors used “semi-realistic scenes” (Huettig,
Rommers, & Meyers, 2011, p. 151) that contained a number of objects, including
the target object (see Figures 5.5 and 6.10). Each scene contained two identical
containers; however, the properties of the containers varied, so that in one ver-
sion, both objects were potential goal recipients (e.g. two open cans), while in the
other version, only one object was (e.g., one open and one closed can). Again, this
is a doublet because there were two versions of each image. Combined, these two
variables (definiteness and number of referents) resulted in a quadruplet, which
is shown in Figure 5.5. Trenkic et al.’s study will be returned to in Section 5.2, to
illustrate the construct of counterbalancing, and again in Section 6.3.1.1, when
we look at interest areas.
In sum, the process of creating materials is closely tied to how many categorical
variables there are in a study and how many levels each variable has. Doublets, triplets,
and quadruplets are a reflection of this. They are the product of how many values
each variable can take. Researchers can increase the level of experimental control in
their study through item reduplication, because all that differs between the different
versions of an item are the experimental conditions, and not some other properties.
Creating experimental materials, then, is part creativity and part labor.
FIGURE 5.4
A quadruplet. Four different versions of the display were presented
alongside the same auditory input, “Pick up the shovel”. Note: displays
recreated with images from the International Picture Naming Project.
(Source: Bates et al., 2003; Szekely et al., 2003; Marian and Spivey, 2003a, 2003b).
FIGURE 5.5
A quadruplet drawn from doublets of visual and linguistic-auditory
stimuli.
(Source: Trenkic et al., 2014).
134 General Principles of Experimental Design
on the stimulus lists that participants receive, the statistical analysis, and ultimately, the
chances of detecting statistical effects that are present in the data.A distinct advantage
of within-subjects designs is that every participant serves as his or her own control.
For instance, a motivated participant is likely to apply herself to all items equally,
meaning you could get a homogeneous set of data for your different experimental
conditions. Likewise, a less proficient L2 speaker will have the same language profi-
ciency throughout the study and can therefore best be compared to herself. When
researchers use a within-subjects design, they can control for individual differences
in participant performance. I would argue that no matter the type of research you
do, it is good to think about whether your study lends itself to a within-subjects
design. Many L2 researchers naturally lean toward between-subjects designs, but
with a few small tweaks, it may be possible to convert a between-subjects study into
a more controlled and statistically more powerful within-subjects experiment.
To compare these two options, we will take Montero Perez, Peters, and DeSmet’s
(2015) captions study as an example. Recall from Section 3.2.4 that Montero
Perez and her colleagues studied L2 French learners’ incidental vocabulary acqui-
sition from watching two captioned video clips, one on a LEGO© factory and
the other about a brewery in northern France. The authors wanted to know
whether keyword captioning (showing only keywords on the screen) would yield
higher vocabulary gains than the traditional, full captioning, given that keywords
are more salient. The study participants were randomly assigned to either the full
captioning or keyword captioning condition and viewed the two video clips with
the corresponding caption type. This is a between-subjects design. Although the
authors were able to demonstrate some benefits of keyword captioning in this way,
we do not know whether additional results would have emerged had the same
participants watched one video with keyword captions and the other video with
full captions. Using such a within-subjects design (see Figure 5.6), the researchers
could have controlled for their participants’ L2 proficiency level, listening com-
prehension, vocabulary size, language aptitude, motivation, stress levels and fatigue,
and any other individual differences that might influence the outcomes of the
study. Many studies use a between-subjects design like Montero Perez et al. (2015)
did (see online supplementary materials, Tables S5.1–S5.12). Therefore, the goal
in discussing this study is not to single this project out, but rather demonstrate
how, with a few small tweaks, we can take a good study and make it even better.
Lists and counterbalancing. To ensure that participants see only one of the
multiple item versions, researchers arrange their experimental materials in lists.
Each list contains one version of each item and each participant sees only one
of the various lists. This way, researchers can avoid repetition effects that would
come from watching the same video clip or reading the same sentence twice.3
The number of lists equals the number of item levels; in other words, doublets are
distributed across two lists and quadruplets require four lists. Going back to the
Montero Perez et al. (2015) example, the authors had two item lists as they had
two captioning conditions (Keyword and Full captions). For both between- and
136 General Principles of Experimental Design
items across different lists so every group of participants sees every condition and
every item exactly once and there is exactly one observation for every item ver-
sion is known as a Latin square design (see Textbox 5.3).
The experiment began with a calibration of the eye-tracker using the par-
ticipants’ right eye.This initial calibration was followed by a practice session
of ten trials and by the main experiment. In each trial, the participants saw
General Principles of Experimental Design 139
When reading this description for the first time, you may find it helpful to create
a visual of the different events in the trial sequence (see Figure 5.8 and Section
6.3.1.5).This will help you keep track of the different parts in the study and figure
out their function. A good place to start is to locate the item, because, as stated
previously, this is the central piece of any study. In Tremblay’s experiment, the
items were four orthographic words displayed in the four quadrants of the screen
The treatment phase consisted of 36 trials for both the experimental and
the control group. Each trial included three separate display screens. For the
experimental group, the first screen displayed a sentence in which the target
word and its vowel(s) were textually enhanced with three typographical
cues … . In contrast, for the control group the textual input enhancement
was absent, that is, the test item was not underlined and [it was] displayed in
the same font and color as the remaining words in the sentence. …
The second screen asked study participants to provide the meaning
of the target word in a multiple-choice exercise identical to that of the
pretest and the immediate and delayed posttests. This task was intended
not only to motivate our study participants to actually read the sentence,
but also to ascertain the learner’s knowledge of the meaning of the test
items. …
The third screen provided feedback to the study participants on their
choice of the correct meaning of the target word. Screens 2 and 3 were
identical for both the experimental and the control group.
(Alsadoon & Heift, 2015, p. 64)
This was an important step in the study because, by checking the learner’s
knowledge of the meaning of each test item, we ensured that the problems
in the learners’ intake of English vowels can be attributed to a lack of ortho-
graphic vowel knowledge as opposed to not knowing the word meaning.
(Alsadoon & Heift, 2015, p. 64)
Step 3, then, was a logical sequel to Step 2, as it let participants know whether
their response was correct. Taken together, the different steps in Alsadoon and
Heift’s trials all played an integral part in demonstrating the beneficial effects
of input enhancement and, specifically, helped the researchers isolate the role of
input enhancement in learning word form and meaning.
The two studies discussed in this section show how, in different research strands,
items are embedded in larger sequences of events. These events make up the tri-
als in a study. Another way to consider trial contents (discussed next) is in terms
of primary and secondary tasks that make up the trial. In what types of tasks do
participants engage in an eye-tracking study? It turns out there are many options.
In many cases, the implausible sentences will serve as filler trials (see Section 5.3)
because processing may change when participants encounter something unex-
pected. Thus, researchers may report overall accuracy rates for the plausibility
judgments, but focus only on the plausible sentences in their analysis of eye-
tracking data.
The final measure, translation, is slightly different from comprehension ques-
tions and plausibility judgments because participants need to produce language
on top of comprehending it. What translation measures is a bit controversial. Lim
and Christianson (2015) argued translation is a comprehension measure, which,
compared to comprehension questions, invites a deeper level of processing (also
see Lim & Christianson, 2013). In their study, Korean learners of English were
asked to read and then translate English sentences into Korean in each trial. The
participants were more sensitive to grammatical violations in this translation
experiment than in an experiment that included comprehension questions after
each sentence (also see Jackson & Bobb, 2009; Jackson & Dussias, 2009; Leeser,
Brandl, & Weissglass, 2011 for similar comparisons of grammaticality judgments
and comprehension questions). Although Lim and Christianson (2015) promoted
the use of translation as a comprehension measure, the authors acknowledged that
translation “draws the attention of even lower proficiency L2 learners to mor-
phosyntax” (p. 1288) and “the translation itself contains explicit evidence of how
148 General Principles of Experimental Design
attending to the meaning of a sentence. GJTs may undo some of these benefits
normally found in eye-tracking experiments. Therefore, unless the goal is to
study GJT-induced task effects (see Godfroid et al., 2015; Leeser et al., 2011),
other, more meaning-focused tasks may be more appropriate as secondary
tasks. Table 5.1 summarizes the main characteristics of the different secondary
tasks reviewed in this section.
As seen in Table 5.1, secondary tasks can fulfill different purposes in a study. The
primary purpose of the task will inform various methodological decisions, such as
how many trials need to have a secondary task after the primary task. Coverage
refers to the percentage of trials that include a secondary task. You may have assumed
primary and secondary tasks had to go together all the time; however, a 100% cover-
age is only necessary if you are planning to do a detailed analysis of the secondary-
task data. Other purposes can be accomplished with lower coverage rates. If the goal
is to use the secondary-task scores as an inclusion criterion and exclude participants
who are performing poorly, it may be enough to insert secondary task items after
50% of all trials.To keep participants alert and engaged, a 25% to 30% coverage rate
may be enough. These figures are intended as guidelines. Contextual factors such
as lab availability and your participants’ profile will further shape what is possible
in your study. As a rule, however, you will want to add a secondary task after more
trials if the secondary-task data will help you answer your research questions or are
important to what you are studying.
A further question is whether, researchers want to keep trials for which par-
ticipants answered the secondary task incorrectly. This question may arise when
researchers want to study meaning-focused processing. An incorrect response
to a comprehension question or plausibility judgment may signal a temporary
lapse in the participant’s focus on meaning. On this account, it may be safer
to remove these trials from the analysis. This is reasonable but strict. Another,
more liberal approach is to set an overall inclusion threshold (e.g., 70% or 80%
response accuracy on the whole experiment) and include all the data from par-
ticipants who meet that criterion. In this manner, both correct and incorrect
responses will be included for eligible participants. Researchers are likely to lose
less data this way, which can be a concern, especially when working with L2
speakers. The key here is to have a clear understanding of the purpose of your
secondary task.
In sum, secondary tasks present researchers with a range of options to enrich
their studies and collect additional information about participants’ performance.
Although these tasks are secondary by nature, their implementation in a study
requires careful thought. Piloting will be helpful to confirm the robustness of your
task stimuli. Piloting can also help you make some of the finer methodological
decisions for which standard guidelines are still being developed. Tables S5.1–5.
S12 in the online supplementary materials present primary and secondary-task
information for the different types of eye-tracking studies.
TABLE 5.1 Comparison of four secondary tasks
reaction time research, would be to calculate split-half reliability for the items
in each of the different experimental conditions, treating eye fixation times as a
special type of reaction times. The statistical software R has a package, splithalf,
that will let you do just that (https://cran.r-project.org/web/packages/splithalf/
index.html). If reliability becomes more mainstream in eye-tracking research, not
just as a theoretical construct but as something researchers calculate and report, it
will further help promote the notion that item numbers are important.
Item numbers also matter because larger item sets will increase the researcher’s
chances of detecting effects (i.e., significant group differences or significant rela-
tionships between variables) in the data analysis. The concept here is that of sta-
tistical power. Statistical power refers to the likelihood a researcher will uncover
true effects based on the data that were collected. The concept of statistical power
can be likened to the workings of a microscope. Microscopes come in different
strengths. Which microscope is appropriate for your project will depend on what
you are trying to study. Flower pollen can be studied under a low-power micro-
scope, but seeing bacteria requires a better, more powerful device. In research, the
size of the phenomenon you seek to study is reflected in the effect size. Effect
sizes are traditionally categorized as small, medium, or large. To detect smaller
effects in a statistical analysis you need more statistical power, much like seeing
bacteria requires a more powerful microscope than seeing pollen.
Researchers can increase the statistical power of their analyses by collecting
a larger data set. The size of a data set is the total number of observations. It is a
product of both the number of participants and the number of items. Researchers
should pick their item and participant numbers accordingly, and aim for high lev-
els of statistical power in their studies. In SLA and bilingualism, statistical power
is typically set at .80. This means there is an 80% chance researchers will detect
existing effects accurately (e.g., find significant differences if groups or treatments
truly differ). With the desired statistical power level, the anticipated effect size,
and the significance level, researchers can perform an a priori power analysis (for
details, see Larson-Hall, 2016). The outcome of this analysis will be an estimate of
how large a data set is necessary to run a well-powered study.
So what are typical effect sizes in different strands of SLA and bilingualism
research? It turns out they are larger than in neighboring disciplines. Plonsky and
Oswald (2014) compiled over 400 effect sizes from L2 research (both primary
studies and meta-analyses) and plotted the effect sizes’ distribution. Using the
distribution as a guide, the authors proposed field-specific cutoff values for what
counts as a small effect (25th percentile of the distribution), a medium effect (50th
percentile), and a large effect (75th percentile). In so doing, they revised, all cutoff
values for effect sizes upward compared to Cohen’s (1988) norms. For instance,
for mean differences between groups, Plonsky and Oswald proposed d = 0.60
(versus Cohen’s d = 0.20) for a small effect, d = 1.00 (versus Cohen’s d = 0.50)
for a medium effect, and d = 1.40 (versus Cohen’s d = 0.80) for a large effect.
As we will see in what follows, this has implications for sample-size calculations.
General Principles of Experimental Design 153
The authors attributed these differences in cutoff values, not just with Cohen’s
benchmarks, but with similar meta-syntheses in other fields as well (Hattie, 1992;
Lipsey & Wilson, 1993; Richard, Bond, & Stokes-Zoota, 2003; Tamim, Bernard,
Borokhovski, Abrami, & Schmid, 2011), to the relative youth of the field of L2
research, as well as a potential publication bias.
In the neighboring discipline of psychology, typical effect sizes are about d =
0.40 (Kühberger, Fritz, & Scherndl, 2014; Open Science Collaboration, 2015),
which are considered small effects. In such cases, Brysbaert and Stevens (2018) rec-
ommended that researchers conducting reaction time experiments aim for 1,600
observations per condition (e.g., 40 participants × 40 items; 20 participants × 80
items; or 80 participants × 20 items), in order to achieve a statistical power of .80.
As mentioned previously, effect sizes in L2 research tend to be larger. Therefore,
to bring back the microscope analogy, L2 researchers will not need quite as many
observations to achieve the same statistical power.To determine how many obser-
vations are needed, researchers could run simulations based on pilot data for their
study. Brysbaert and Stevens (2018) demonstrated this procedure, using the simr
package developed for R (Green & MacLeod, 2016; Green, MacLeod, & Alday,
2016). The idea is to draw a number of random samples from an existing data set
and run the desired statistical analysis a number of times on each sample. For accu-
rate power estimates, it is better to run the simulations on one’s own data (these
could be real data or simulated data) or a dataset from a similar study (Brysbaert
& Stevens, 2018). Using your own data is better because variance in eye fixation
times is likely to be task- and population-specific.To calculate the statistical power
of a research design, simply count how often the simulations return as statistically
significant. The statistical power of a study is the proportion of all tests that are
significant. For instance, if you run a total of 2,000 statistical tests on your data set
and 1,200 tests are statistically significant, the estimated power of the design is .60.
Once researchers have an initial estimate of the power of their test, they can
modify different parameters, such as the participant number, item number, and the
observed difference between conditions. To increase sample size, which is neces-
sary if the initial simulations indicated a lack of power, researchers could simply
copy their data set. This may not yield exact results, because no two sets of data
will ever be identical, but it will fit most researchers’ needs. The target number
of items and participants, then, is the size of data set (participant number × item
number) that will return positive test results on 80% of all test simulations.
The previous discussion emphasized the importance of both items and par-
ticipants for statistical power.4 Table 5.2 lists the number of items and participants
per condition I distilled from the synthetic review of text-based eye-tracking
studies (see Tables S5.1 to S5.6 online for detailed information). Table 5.3 does
the same for visual world studies (based on Tables S5.7 to S5.12 online). These
numbers provide an indication of the number of observations in contemporary
L2 eye-tracking research. In many strands, median cell size (n participants × k
items per condition) is close to 300, for instance 260 observations in grammar
154 General Principles of Experimental Design
Research strands Number of items per condition Number of participants per conditionb
Mean Median Min–Max Mean Median Min–Max
Grammar 11.99 10 4–42 28.42 26 14–60
Vocabulary 33.70 14.25 4–225 26.24 26 15–42
ISLA 19.52 19 3–36 21.63 15.5 3–66
Subtitlesa 184 254 18–280 25 25.5 9–40
Assessment 11.25 9 3–24 28.25 30.5 14–38
a
For subtitles research, the relevant unit may be time, rather than the number of subtitles shown.
Average clip length: 10 min; Median length: 5 min; Range: 4–25 min.
b
In a within-subjects design, the number of participants per condition will be the total sample size.
In a between-subjects design, it will be the total sample size, divided by the number of conditions.
Research strands Number of items per condition Number of participants per conditionb
Mean Median Min–Max Mean Median Min–Max
Word recognition 21 20 5–48 41 39 14–70
Prediction 11 9 3–28 40 35 16–100
Referential 11 8 6–18 45 36 34–66
processing
Productiona 14 10 7–22 24 20 15–48
a
Number of items in some studies was based on the amount of feedback given in response to the
participant’s production.
b
In a within-subjects design, the number of participants per condition will be the total sample size.
In a between-subjects design, it will be the total sample size, divided by the number of conditions.
you start thinking about item lists. As we saw in Section 5.2, list assignment will
look somewhat different for studies with a between- or a within-subjects design
and for studies with or without counterbalancing.Thus, item production is where
all the different topics we covered in this chapter come together. This is where
the rubber hits the road and you finally have a chance to apply everything you
learned in this chapter.
To illustrate, let’s take another look at Montero Perez et al. (2015), a captions
study with L2 French university students. Recall from Section 5.2 that Montero
Perez and her colleagues wanted to compare the effects of full captioning and
keyword captions on vocabulary acquisition. Thirty-four participants (from an
initial sample of 51) were randomly assigned to watch two video clips in one of
the two captioning conditions, using a between-subjects design. Participants were
evenly split between the two conditions. Across the two video clips, they encoun-
tered a total of 18 target words (i.e., items). Thus, the number of observations per
cell (i.e., per captioning condition) was 17 participants × 18 items = 256.
Now imagine that the study had a within-subjects design. In this case, all 34
participants would experience both keyword captioning and full captions, but due
to the counterbalancing of the video clips, each participant would see only half
the target words in either captioning type, specifically 11 targets in the LEGO©
video (e.g., with full captions) and 7 targets in the brewery video (e.g., with key-
word captions). Figure 5.6, which is reproduced as Figure 5.9 here, visualizes the
FIGURE 5.9
Between-subjects design and within-subjects design for studying
the role of captions in vocabulary acquisition. In a counterbalanced,
within-subjects design with two conditions, there will be twice as many
participants but half the number of items per condition.
(Source: Based on Montero Perez et al., 2015).
156 General Principles of Experimental Design
difference between the two designs. This example shows how counterbalanced,
within-subjects designs make larger demands on materials creation (because par-
ticipants see fewer items per condition), whereas between-subjects designs require
larger participant numbers (because there are fewer participants per condition).
Although it is probably not a good idea to have fewer than ten participants
or ten items in a given condition, the practicalities of a given research context
may push researchers toward a between- or a within-subjects design. Specifically,
some of the limitations of a small participant sample can be offset by including
more items and running the study as a within-subjects experiment. Conversely, if
length of experiment is a concern (e.g., when working with children), research-
ers could include fewer items but recruit more participants. At the end of the day,
when cell sizes are equal, a within-subjects design will be more powerful than a
between-subjects design, because every participant will serve as his or her own
control (see Section 5.2).
To sum up, the median cell size in contemporary L2 eye-tracking research
is close to 300, which corresponds to 15 participants × 20 items or 20 partici-
pants × 15 items per condition. These numbers do not by themselves, however,
guarantee that a study will have adequate statistical power, because power depends
on many other elements as well. When deciding what sample size you need for
your study, the key factors to consider are practicality, instrument reliability, typi-
cal effect sizes in a given subdiscipline, ease of recruiting participants, and type of
research design.
5.6 Conclusion
In this chapter I covered basic principles of experimental design with a focus on
the overall structure of a study. Eye-tracking researchers follow the same principles
of experimental design as other quantitative researchers, but add to that some new
concepts and rules or constraints that are specific to eye-tracking methodology
(see Chapter 6). From this perspective, fundamental knowledge of general design
principles is essential for creating a good eye-tracking study. Of many factors, I
selected five key elements deemed critical for a sound research study. The first
concepts were those of items and item versions (see Section 5.1). The different
versions of an item mirror the different levels of your independent variable(s) and
are thus a direct expression of the design of your study. When designing my own
research projects or advising students, I like to draw the different experimental
conditions on a piece of paper, similarly to the diagrams shown in Section 5.1.
I find this helpful to link the statistical and experimental aspects of the research.
With the concept of items in place, researchers need to consider whether to
implement their study as a between- or a within-subjects design (see Section 5.2).
Many studies will lend themselves to either design type. In that case, a within-
subjects design may be preferred, because it yields less noisy data (every partici-
pant serves as his or her own control) and hence, a more powerful research design.
General Principles of Experimental Design 157
The issue of statistical power came up again, in Section 5.5, as a guiding principle
for determining an adequate sample size.The smaller the effect you want to study,
the more participants and items you will need to run well-powered statistical
tests. While the question of sample size defies a simple answer, Tables 5.2 and 5.3
summarize the item and participant numbers that are commonly found in con-
temporary L2 eye-tracking research. Lastly, items are the nucleus of a larger unit,
known as a trial (see Section 5.3). Trials are composed of primary and secondary
tasks (see Section 5.4) and sometimes eye-tracking data will only be recorded
during the primary task. Even so, secondary tasks can provide a wealth of infor-
mation about participants’ attentiveness, their general L2 proficiency, and ability
to complete the task, while also giving the participants a purpose (real or pretend)
for doing the experiment.
With these concepts fully established, the time has come for some eye-tracking-
specific guidelines. In Chapter 6, we consider what makes eye-tracking method-
ology different from other behavioral research methods.
Notes
1 Critical or experimental trials are trials that will be included in the data analysis, see
Section 5.3.
2 This is true even for within-subjects designs, because items tend to be counterbalanced
(more on counterbalancing later in this section).
3 Creating lists is not necessary when researchers think participants can safely see the
same item more than once. In that case, the researchers can present all versions of all
items together in a within-subjects design with no counterbalancing (e.g., Godfroid et
al., 2015; Kaushanskaya & Marian, 2007). This is less common in L2 and bilingualism
eye-tracking research (because in many cases repetition is better avoided), but from a
design standpoint, it is the simpler thing to do.
4 This is true for linear mixed effects models, which were the focus of Brysbaert and
Stevens’ (2018) article, but it is also true for analyses that involve some type of data
averaging, such as t tests, ANOVA, and linear regression. When data are averaged, more
observations will give rise to more precise mean values (i.e., means with a smaller
standard deviation) and hence larger effect sizes (Brysbaert & Stevens, 2018).
6
DESIGNING AN EYE-
TRACKING STUDY
The aim of this chapter is to prepare readers for their own eye-tracking stud-
ies by equipping them with the methodological know-how and skills to design
their own research. Building on general experimental guidelines (see Chapter 5),
I provide an overview of methodological considerations specific to eye-tracking
research. At the top of the list is the need to define interest areas (see Section
6.1), a key element that pertains to text-based eye tracking and the visual world
paradigm alike. The remaining parts are devoted to providing paradigm-specific
guidelines; that is, guidelines for text-based eye tracking and for visual world
eye tracking, which extend to other multimodal settings. Section 6.2 deals with
spatial, artistic, and linguistic factors in text-based design. Section 6.3 offers an
in-depth discussion of how to create auditory and visual materials for a visual
world experiment. Researchers working on multimedia learning may find the
information in Section 6.3 helpful too. Readers can draw from Section 6.2 and
Section 6.3 as the primary sources of information for their studies in these respec-
tive paradigms. However, some principles (e.g., screen layout) apply more broadly
and so a general understanding of how eye-tracking researchers do things in other
areas of the field can benefit your own research as well.
FIGURE 6.1
Interest areas in a vocabulary learning study. The authors analyzed
fixation times on the novel (pseudo) word perchants as well as its
apposition offspring. To study further context effects, a researcher could
designate additional interest areas encompassing the whole pseudo
word–English word pair.
(Source: Godfroid et al., 2013).
researchers are primarily interested in how the eye-movement measures for the
particular word relate to acquisition; they may choose to disregard the process-
ing of other parts of the text in their analysis. Now imagine the same research-
ers wanted to investigate the role of context in inferring word meanings. This
scenario would require more interest areas besides the target words to capture
the processing of the surrounding context. Additional interest areas could again
be word-based, like the area for the target word, or they could encompass larger
areas (e.g., everything up to the target word as a single interest area). Which route
researchers took would depend on what analysis they had in mind for their con-
text effects. They could also define interest areas both ways and then decide later,
in the analysis stage, which approach proved more informative.
Similar to vocabulary research, many sentence processing studies have word-
based interest areas. This time, the areas center around the critical grammatical
features, for instance the structures or forms whose acquisition researchers want
to study. Let’s consider Hopp and León Arriaga (2016) as an example. Recall from
Section 3.2.1 that Hopp and León Arriaga were interested in the processing of
case in Spanish. As a result, for sentences with ditransitive verbs, the primary inter-
est area was the indirect object and its article/case marker, which was either gram-
matical (al) or ungrammatical (el) in the study. In addition, the authors designated
the preceding verb, the ensuing direct object, and the sentence-final prepositional
Designing an Eye-Tracking Study 161
phrase as separate interest areas (see Figure 6.2). By defining additional interest
areas in this manner, Hopp and León Arriaga were able to compare the process-
ing of grammatical and ungrammatical sentences more comprehensively and to
make a stronger case for their participants’ sensitivity (or lack of sensitivity) to case
marking violations.
In the preceding examples, word-based interest areas were defined for words
that occurred in sentences or longer stretches of text. While this is arguably the
most common scenario, other kinds of research designs may call for word-based
interest areas as well. These are studies that present words in isolation, for instance
in keyword captioning (Montero Perez et al., 2015) or in some psycholinguistic
research (De León Rodríguez et al., 2016; Miwa et al., 2014). Montero Perez et
al.’s study is a good example because it shows how flexibly word-based interest
areas can be used. Recall from Section 3.2.4 that these authors examined the
effects of different types of captioning—keyword captioning and full caption-
ing—on learning vocabulary.To do so, the authors overlaid the same interest areas
on the target words in both captioning conditions; that is, they used the same
interest areas regardless of whether the target word appeared in a full captioning
line or as a keyword caption (see Figure 6.3). By manipulating the layout in this
way, the authors were able to focus on the effects of visual salience in multimodal
vocabulary learning (with keyword captioning hypothesized to be the more sali-
ent technique). They did not have to analyze viewing behavior across the whole
screen, or even across the whole captioning line, but could focus on the targets for
learning in the video captions instead.
Finally, some psycholinguistic experiments focus on single-word processing,
using reading aloud (De León Rodríguez et al., 2016) or lexical decision tasks
(Miwa et al., 2014). In such cases, the word or nonword may be all there is for
a participant to look at on the screen. This may make the use of interest areas
redundant. Participants have no motivation to move their eyes away from the
words (Miwa, personal communication, October 9, 2017). Conceptually, though,
researchers still want to know for how long and where in the word participants
were looking (see Figure 6.4). These examples highlight that interest areas are
FIGURE 6.3 Full captioning (left) and keyword captioning (right). The interest area
was drawn around the French target word figurines (“figurines”).
(Source: Montero Perez et al., 2015).
FIGURE 6.4 English lexical decision task with eye tracking. Circles represent fixations
inside four English words.
(Source: Miwa et al., 2014).
particularly useful when there are several pieces of information on the screen and
researchers want to analyze only a subset of all the available information.
watching a video or filling out a gapped text). A nice example of a study that com-
bined larger interest areas and word-based interest areas is McCray and Brunfaut’s
(2018) assessment research. As mentioned in Section 3.2.5, the authors examined
the processing profiles of higher- and lower-scoring test takers on gap-fill items on
a standardized reading test. The authors hypothesized that lower-performing test
takers would display local reading strategies more often than higher-performing
test takers. One of their measures of local reading was the time spent fixating on
words surrounding the gaps in the gap-fill task. To measure this, the authors drew
areas of interest around the three words—a number the authors picked them-
selves—prior to and following the gaps (see Figure 6.5). Regarding global pro-
cessing differences, the authors hypothesized that higher performers would spend
comparatively less time on task processing (the word bank), leaving more time for
the higher-level processing of the text (the whole text area). To test these hypoth-
eses, McCray and Brunfaut defined two large interest areas around the word bank
and the text as a whole, respectively. In all, the authors had three very distinct types
of interest area—text gaps, three-word phrases, and word bank vs. text—and each
interest area was argued to capture a different aspect of test takers’ behavior.
research (see Chapter 4 and Section 6.1.3.2), yet images also play a role in mul-
timodal studies with written text. For instance, Révész, Sachs, and Hama (2014),
whose study was reviewed in Section 3.2.3, sought to validate the cognitive
complexity of different tasks in ISLA. To measure fixation behavior, the authors
combined the two picture prompts in each task version into a single interest
area (see Figure 3.4, reproduced as Figure 6.6 below). In Lee and Winke’s (2018)
assessment research (see Section 3.2.5), the interest areas were different compo-
nents of the TOEFL Primary Speaking Test, shown in the right panel of Figure
6.7. And lastly, in Suvorov (2015), also a language assessment study, the author
selected the videos embedded in the online test interface as the basis for further
analysis (see Figure 6.6).
As in text-based research that focuses on more global text processing (see
Section 6.1.2), studies with pictorial interest areas probe into learners’ general
processing of language tasks, including language assessment tasks. These stud-
ies were not designed to address questions related to grammatical sensitivity or
the acquisition of certain linguistic features; instead they deal with issues of task
design. In Suvorov’s (2015) study, for instance, the question was whether the type
of video used in a listening assessment (content video or context video) would
influence ESL test takers’ viewing behavior and test performance. To this end, the
author extracted eye-movement measures for the videos embedded in the screen
(see interest area in Figure 6.7, left panel) and correlated these eye-movement
measures with participants’ test scores for the corresponding subtests.
FIGURE 6.8
Sample display with a target image (right) and a nontarget, distractor
image (left).
(Source: Andringa and Curcic, 2015).
However, when fixations are far out, they will be omitted from data analysis. This
would be the case for the second fixation in Figure 6.9, which falls outside of
either box in the center panel. Finally, the most conservative option is to draw
free-form interest areas that follow the external boundaries of the object. In our
example, only the third fixation on the dog will be counted for analysis in the
lower panel of Figure 6.9. Because objects vary in pixel sizes, researchers need to
ensure their comparisons are valid when they use free-form interest areas. Larger
objects will naturally attract the eye gaze more. To account for these size differ-
ences, researchers can compare looks to the same objects in the same scene while
manipulating the audio (see Section 6.3.2, for more information). In sum, one and
the same set of eye-movement data may be processed differently, depending on
how researchers conceive of the interest areas in their study.
The distinction between target and nontarget images is foundational to vis-
ual world research. What the nontarget images are, however, and what their role
is in a study will vary. A study that included multiple types of nontarget images
was Trenkic, Mirković, and Altmann (2014). Recall from Section 4.2.2.3 that the
authors were interested in measuring participants’ knowledge of the English defi-
nite and indefinite articles. Participants were asked to put objects in containers
(e.g., Put the cube in the can).The authors varied the article preceding the container
(i.e., inside the can or inside a can), as well as the number of potential goal referents
in the display (i.e., one open can or two open cans). On top of a target, which was
the can in which participants put the cube, each scene included a competitor, a
distractor, and two fillers (see Figure 6.10). Here, the competitor was the other
can, which was closed or in which participants did not put the cube. More gener-
ally, a competitor is any object onscreen that shares some form-based, phonologi-
cal, semantic, or grammatical properties with the target object. The competitor
is hypothesized to compete with the target for visual attention because of their
shared properties. A different kind of container in Trenkic et al., namely a basket,
was the distractor. The role of the distractor is to give participants a genuine
Designing an Eye-Tracking Study 167
FIGURE 6.9
Three types of interest areas around discrete images in a visual world
experiment.
(Source: Picture stimuli from Andringa & Curcic, 2015).
purpose for listening. In Trenkic et al.’s study, there was more than one kind of
container the cube could potentially go into and so participants had to listen care-
fully to know what to do. Finally, the filler objects (pencil, rope) were unrelated
to the event description. They were there to make the display look more varied
and realistic. Textbox 6.1 summarizes the four possible types of images in a visual
experiment and their functions.
168 Designing an Eye-Tracking Study
Trenkic et al. (2014) embedded objects in a visual scene, rather than presenting
them as distinct images. Using a scene will let researchers create a more coher-
ent context against which to interpret the auditory input. This may be important
for studies on discourse-level phenomena such as pronoun resolution (Cunnings,
Fotiadou, & Tsimpli, 2017; Sekerina & Sauermann, 2015) or when multiple agents
participate in the same event (Hopp, 2015). Other times, for instance in word- or
sentence-processing research, it may not matter much whether you use separate
objects or a scene with objects embedded in it (Altmann, personal communica-
tion, April 3, 2018). However, opting for a scene or discrete images can carry
Designing an Eye-Tracking Study 169
implications for how you define your interest areas.When objects are presented in
a larger visual context, you may be more inclined to use free-form interest areas
that follow the object contours closely (see Figure 6.11, top panel). This is how
Trenkic and her colleagues (2014) designed their interest areas (Trenkic, personal
communication, May 31, 2018). Even with hand-drawn interest areas, you could
consider adding a buffer (extra white space around the object) to capture eye fixa-
tions that land slightly off target (see Figure 6.11, mid panel). Finally, researchers
can still draw boxes around the various objects, as they often do when working
with discrete images (see Figure 6.11, bottom panel). Interestingly, researchers
often do not report detailed information on the shape of their interest areas in
their articles, so readers are left to infer this part. In the spirit of research transpar-
ency, authors may wish to add this information in their research papers. Including
a visual, such as the figures presented in this section, that shows the interest areas
can be very helpful.
The same principles that apply to interest areas around objects in scenes also
apply to movies—researchers can track participants’ attention to different objects
in the movie by drawing interest areas around them. For instance, Flecken et al.
(2015) used rectangular interest areas to capture eye gaze data as participants were
watching motion events (see Figure 6.12). Two things made Flecken et al.’s study
special. First, it was a production experiment (see Section 4.2.4) and second, it
required the use of dynamic interest areas given that motion is an inherently
dynamic event. Specifically, one entity (e.g., the pedestrian) followed a trajectory
toward a potential endpoint (e.g., the car). To capture participant attention to the
moving entity, the interest area had to be moved along the same path as the entity
did. The authors defined the interest area on a frame-by-frame basis. Because
most movies are filmed at a rate of 24 frames per second, this meant they had
to update the interest area at least 24 times a second! That amounted to drawing
close to 3000 interest areas to code the 20 six-second video clips in the study.
Working with dynamic interest areas, then, is still a time-consuming enterprise
(also see Section 9.3.1, research idea #10, and Section 9.3.2.3), even though soft-
ware developers are working to automate the process at least partly.
FIGURE 6.12
Dynamic interest areas in a movie depicting a motion event. As the
pedestrian walked toward the car, the corresponding interest area had to
be redrawn on a frame-by-frame basis.
(Source: Reprinted from Flecken, M., Weimar, K., Carroll, M., & Von Stutterheim, C., 2015. Driving
along the road or heading for the village? Differences underlying motion event encoding in French,
German, and French-German L2 users. The Modern Language Journal, 99, 100–122, with permission
from Wiley. © 2015 The Modern Language Journal).
Designing an Eye-Tracking Study 171
6.2.1 Spatial Constraints
Eye-movement recordings tend to be less accurate when participants are looking
to the outer edges of the screen. The most extreme case is track loss, a temporary
172 Designing an Eye-Tracking Study
interruption in recording due to the eye tracker’s inability to locate the eye gaze
(for more information on track loss, see Section 8.1.2). There are several things
researchers can do to prevent track loss and other system glitches from happening.
Here we concentrate on factors that contribute to a robust study design. Practical
tips for enhancing recording quality during data collection will be presented in
Section 9.3.2.2.
First, researchers can insert margins around the edge of the screen.These
margins (large blank borders) do not contain any information and therefore, few,
if any, eye fixations are expected to land in these regions. In research strands
where having lots of blank space may not be desirable (e.g., subtitles and captions
research), researchers could simply make sure they move their interest areas (e.g.,
subtitle regions) away from the edge. In my own work, I leave at least a 1.5 to 2
cm buffer. Likewise, text-based eye-tracking researchers should avoid placing
interest areas at the end or beginning of a line, regardless of whether their
studies include one or multiple lines of text. Researchers in ISLA, vocabulary
(studies with glossing), and assessment will also need to place their interest areas
further away from the left and right edge.This is because viewers have a tendency
to overlook or skip information in these areas. Rayner (2009) noted that the first
and last fixations of a text line are typically five to seven letter spaces from the
beginning and the end of the line, respectively. The majority of all fixations fall
within these extreme ends. When moving between lines of text (a long-distance
eye movement that is known as a return sweep), readers accumulate extra error.
During these transitions, additional, corrective saccades may be necessary for the
eyes to reach their intended location. All these factors point in the same direction:
it is better to keep interest areas out of the peripheral regions of the screen. Third,
it is important to note that interest areas should be large enough to reduce
the probability of skipping and observing a 0 fixation. (A dataset with many skips
can be more difficult to analyze, see Chapter 7.) Because the probability of fixa-
tion (i.e., non-skipping) increases with word length, longer words tend to offer
certain advantages for data analysis compared to shorter words. This is not to say
short words cannot be analyzed, but researchers may need to get more inventive,
for instance by focusing on decreases in skipping rates as well as increases in fixation
duration (for an example, see Drieghe, 2008). When interest areas are the size of
a seven-letter word, skipping rates will be as low as 10%, compared to 80% for
one-letter words (Vitu, O’Regan, Inhoff, & Topolski, 1995). In a similar vein,
ISLA or assessment researchers who work with image-based interest areas should
ensure their interest areas are large enough for the region to be fixated. In practice,
this tends to be less of a concern because image-based interest areas are gener-
ally quite large. Finally, double spacing the text is recommended to account
for the technical limitations of an eye tracker (i.e., bounds on its accuracy and
precision, see Section 9.1.3). Double spacing acts as a protective layer against ver-
tical drift—the systematic recording of eye fixations above or below their actual
location. Drift looks as if the eye fixation bubbles are floating above or below the
Designing an Eye-Tracking Study 173
text (for an example, see Figure 8.5). However, to spot drift in a recording and
potentially correct it (see Section 8.1.3), text needs to be double- or triple-spaced.
Otherwise, readers will be mistaken for reading the line above or below the line
they were actually looking at.
Let’s consider an example of how my colleagues and I applied the four prin-
ciples in boldface in an actual study. Figure 6.13 shows a screen display from
Godfroid et al. (2018) before (top-right) and after (bottom-right) the researchers’
intervention. This text was extracted from an authentic English-language novel
set in Afghanistan. One computer screen could fit about one quarter of a text page
(top-left). The researchers were interested in studying L1 and L2 readers’ inciden-
tal vocabulary acquisition of the Farsi-Dari words that occurred naturally in the
FIGURE 6.13
A screen display before (top-right) and after (bottom-right) the
researchers’ intervention.
(Source: Godfroid et al., 2018).
174 Designing an Eye-Tracking Study
text (e.g., jo and dishlemeh in the example). The first thing to notice is that both
texts are double spaced and have large, 2.5-cm margins on all four sides. The texts
contain two target words, jo, meaning “dear” or “auntie”, and dishlemeh, which is
“a sweet candy made mostly of sugar”. You will notice that we removed the italics
for dishlemeh because we did not want target words to be enhanced visually in the
text. The boxes around the target words represent interest areas, which are used
for analysis (see Section 6.1), but are not visible to participants during the experi-
ment. This brings us to some of the finer changes my colleagues and I introduced
in this text. In the original text (the top-right panel), the target word, dishlemeh,
occurred as the first word in a text line. As discussed previously, this is less than
ideal and so we inserted a hard return after something in the preceding line to
move dishlemeh closer to the center of the line. Second, we merged the interest
areas for Bibi and jo into a single, larger area of interest. To reduce the probability
of skipping the target, it made sense to consider Bibi jo as a single unit for analysis,
both from an eye-tracking perspective and a semantic point of view (recall that
Bibi jo means “dear Bibi” or “auntie Bibi”). However, merging the two interest
areas introduced a new problem because now Bibi jo was also the first word of the
sentence and the text line. A minor modification of the original text (Bibi jo too
always brought → Also Bibi jo always brought) took care of this issue.
6.2.2 Artistic Factors
Print is the means for visual reading. Scholars who specialize in typographical
and vision research care about what features of print contribute to the ease of
reading (Legge & Bigelow, 2011). Eye-tracking researchers share this concern for
text legibility. If nothing else, eye-tracking researchers want to present text in
a way that does not impede readers from exhibiting fluent reading. Going one
step beyond this, many eye-tracking researchers see the value of an ecologically
valid study design. Ecological validity dictates that the text features in an eye-
tracking study should resemble those in natural reading as closely as possible,
within the constraints of contemporary eye-tracking technology (see Godfroid &
Spino, 2015).The question, then, becomes what font size to use in an eye-tracking
experiment so measurement is accurate and the reading is natural (for a critical
discussion, see Spinner, Gass, & Behney, 2013). This issue is particularly important
for researchers interested in word-level phenomena (e.g., grammar and vocabu-
lary researchers) because font type and size will determine what portion of the
visual field a word or words occupy (see Table 2.1).
Findings in vision research are hopeful in that the range of font sizes for flu-
ent reading is quite wide: from 4 to 40 points from a viewing distance of 40 cm
(Legge & Bigelow, 2011). At a 40 cm viewing distance, which is somewhat less
than in most eye-tracking experiments, 4- to 40-point size letters subtend a visual
angle of 0.2° to 2° (for more information on visual angle, see Section 2.1). In
daily life, these more extreme font sizes are used for patient information leaflets
Designing an Eye-Tracking Study 175
seen in the top row of Figure 6.14, the different characters in a proportional font
have a variable width. For example, uppercase “P” is much wider than lowercase
“i”. Because of this, word length cannot be mapped onto degrees of visual angle
and researchers cannot draw comparisons between studies.
Another reason monospaced fonts are preferred in eye-tracking research relates
to the accuracy of the eye tracker. Accuracy refers to how closely the eye gaze
position recorded by the eye tracker matches the true position of the eye (for
more information, see Section 9.1.3). As a rule of thumb, you should avoid defin-
ing interest areas that fall within an eye tracker’s error margin. For example, the
average accuracy rate of the EyeLink 1000, as reported in the manufacturer’s
manual, is 0.25–0.50° of visual angle. To give you an idea of how much that is,
hold your hand at arm’s length. One degree of visual angle corresponds approxi-
mately to the width of your pinky finger held out at this distance (for further
details, see Section 2.1). In a typical recording with an EyeLink 1000, the differ-
ence between the actual gaze and the recorded gaze tends to be half the size of
your pinky finger or less. Now let’s apply this to a text-based processing study,
using Lim and Christianson’s (2015) study as an example. Lim and Christianson
reported that a single character in their study spanned approximately 0.4° of visual
angle.Therefore, the researchers were fully equipped to examine word-level inter-
est areas, as they did, but they would have lacked the spatial resolution to run more
fine-grained analyses at the letter level. To perform a letter-based analysis, they
would have had to enlarge the font size until the corresponding degrees of visual
angle exceeded the eye tracker’s error margin (> 0.5°).
More generally, the use of a monospaced font, as opposed to a proportional
font, helps researchers select a font size that is appropriate for their research ques-
tions and the eye-tracking equipment they have. Monospaced fonts afford a better
control over the visual input. Figure 6.15 lists different types of monospaced fonts
with the actual fonts shown in the second column.
Finally, because most eye-tracking experiments in our field take place on a
computer screen, it is good to pause and think about what background color
to use. In our lab, we prefer using a light gray, rather than a white, background
for the experiment, because this is less tiring for participants’ eyes. Background
colors can be chosen from a color wheel or entered as red, green, blue (rgb)
Designing an Eye-Tracking Study 177
values in the programming software. For example, the rgb values of the light gray
background my colleagues and I use are 204, 204, 204. Once you have selected a
background color, you want to use the same color consistently across all screens.
Like many other types of research, eye-tracking experiments consist of differ-
ent stages, including camera set-up and calibration, instructions, practice trials,
the main experiment, and any potential secondary tasks. The background colors
should remain the same across all these different stages, because changes in hue
could cause changes in pupil size.This could be detrimental for data quality, given
that contemporary video-based eye tracking relies on the pupil for accurate meas-
urement. From a practical standpoint, maintaining the same background through-
out the experiment means you may need to change the default background color,
which is usually white, in multiple places in the programming software.
6.2.3 Linguistic Constraints
The final set of guidelines for text-based experiments pertains to researchers who
are comparing word- or phrase-level interest areas that differ in their content or
lexical makeup (see Sections 6.1.1 and 6.1.2). This includes most studies within
the grammar and vocabulary strands (see Sections 3.2.1 and 3.2.2), but not yet at
this time ISLA, assessment, and subtitles or captions research (see Sections 3.2.3,
3.2.4, and 3.2.5). If you are not sure whether you should control the linguistic
properties of the written materials in your study, you can ask yourself the follow-
ing question: Am I comparing words or phrases that differ in their lexical com-
position? If the answer to this question is affirmative, then yes, you will want to
control the linguistic properties of your materials.
In Section 2.5, I introduced a host of linguistic variables that influence when
the eyes move. These variables should be considered when you are designing a
study. Imagine a hypothetical study on the role of cognates in L2 reading, which
could include an item like this: It was very kind/considerate of you to send me flowers.
(Considerate and considerado are English-Spanish cognates, kind and amable are not.
Therefore, they represent a cognate/noncognate pair.) The words kind and consid-
erate have several commonalities, including their part of speech and meaning, yet
they differ in much more than their cognate status alone. Kind is a shorter and
178 Designing an Eye-Tracking Study
more frequent word than considerate, L1 English speakers typically start using kind
at a younger age, and people generally indicate they are more familiar with the
word kind than the word considerate. For all these reasons, it is not a good idea to
compare the eye-tracking data for these two target adjectives directly if the goal is
to study the role of cognate status in L2 reading. Many other linguistic variables
could be accounting for the differences in the eye-movement data.
In a carefully designed study, then, target words or phrases differ only with
regard to the variable the researcher wants to study (e.g., cognate status). All other
linguistic properties of the target words have been carefully controlled for. As we
saw in Section 2.5, the “big three” variables that influence eye fixation durations
are frequency, contextual constraint or predictability, and word length
(Kliegl, Nuthmann, & Engbert, 2006, p. 13). Therefore, these are the first variables
that should come to mind when designing a text-based study: are my word pairs
or phrases approximately equally frequent? Do they appear in the same context?
Do they have the same length? Additional variables to control for are age of acqui-
sition, part of speech, concreteness, and the location in a clause (see Section 2.5).
Questions here are: at what age do L1 speakers first start using the target words or
phrases? Do my word pairs or phrases have the same part of speech (e.g., all nouns
or all verb–noun collocations)? Are all items concrete or all items abstract? Do
my word pairs or phrases occur in a similar place in the sentence (e.g., not at the
end of a clause or sentence)? Table 2.3 in Chapter 2 provides a good overview of
these different variables. Importantly, it details what sources (e.g., corpus, normed
database) you can consult or what data you can collect to determine whether your
experimental conditions have been matched appropriately.
FIGURE 6.16 Modified verb list from Godfroid and Uggen (2013), with word length and
frequency information recorded for all the verbs in the study. Calculation of
frequency per million is illustrated in the formula bar.
and one-way ANOVAs on the regular versus irregular e → i versus irregular a → ä
verbs. When conditions do not differ significantly, as was the case for the example
in Figure 6.16, researchers conclude that their stimuli have been matched on the
dependent variable.2
The procedure described here, which involves matching items between
conditions manually, is termed experimental control (also see Section 2.5).
Experimental control is the most common way of controlling written materi-
als, especially when the materials consist of sentences, rather than longer texts.
Experimental control works best when researchers have identified specific interest
areas within sentences and focus their analysis on these interest areas.When all the
words in a sentence are of interest or researchers work with long texts, another
type of control may be necessary.
180 Designing an Eye-Tracking Study
TABLE 6.1 Best-fitting linear mixed effects model for log first pass reading time
first-pass reading times, Sonbul calculated the collocation length and frequency
information as described in the previous example. She then entered these vari-
ables into a multiple regression analysis. Both variables were statistically significant,
suggesting it is a good thing Sonbul controlled for them statistically. Adding these
control variables to the model further strengthened the researcher’s claim that
L1 and L2 English readers were sensitive to collocation frequency and not some
other variable that correlated with collocation frequency (e.g., frequency of the
constituent words). Statistical control, then, can loosen the shackles of experimen-
tal control in research designs and give researchers more flexibility. Textbox 6.2
summarizes the main points about how to design an eye-tracking study with text.
1. Make sure you insert margins around the outer edges of the screen;
interest areas should be wide enough for fixations to be recorded (in
other words, wide enough to reduce the possibility of skipping); less
accurate eye trackers require larger interest areas; avoid placing the
interest areas at the beginning or end of a sentence or at the beginning
or end of a line of text; lastly, double space the text.
2. A monospaced font type is preferred to a proportional font type; keep
the font sizes within the range for fluent reading (Legge & Bigelow,
2011) and report what font type and size you used; the background
screen color should be consistent throughout the entire experiment.
3. Linguistic variables such as word length, frequency, and predictability in
context should be controlled for experimentally or statistically.
6.3.1 Selecting Images
6.3.1.1 Experimental Design
The general principle in visuals creation is that overall effects due to visual pro-
cessing should be minimized and controlled in the study. In an ideal visual world
experiment, there are a number of objects on the screen (typically between two
and four, although it could be more) and the observer’s eye gaze wanders freely
and equally across the different objects. Only when the auditory input is pre-
sented does the participant orient more strongly toward one or more objects.
This shift in eye gaze, then, is attributed to the unfolding speech signal rather than
some inherent properties of the images.
For eye gaze behavior to be tied to the auditory input, images on the screen
should be comparable in terms of their visual salience. If certain visual proper-
ties of the display render an image more eye catching, this can influence looking
behavior and confound the results. As a researcher you should do your best to
select images that do not contain any visual, phonological, semantic, or linguistic
confounds (see Sections 6.3.1.2 and 6.3.1.3). An additional and necessary measure
is to design your study so any remaining extraneous influences are divided equally
across conditions and thus accounted for by the experimental design.Therefore, as
a researcher you want to exercise control at two levels—control over your materi-
als and experimental control.
Let’s take a look at Dijkgraaf, Hartsuiker, and Duyck’s (2017) study to under-
stand how experimental control works. Dijkgraaf and her colleagues extended a
seminal study by Altmann and Kamide (1999) with monolingual English speak-
ers to the field of bilingualism (for a review, see Section 4.2.2.2).The researchers
were interested in whether or not Dutch-English bilinguals were able to predict
aspects of upcoming speech in their L1 and their L2 based on the seman-
tic information in the verb. For example, upon hearing “Mary drives …”, can
the listener predict the semantic category of vehicles (e.g., a car, a van) in the
object position based on the constraining nature of the verb drive? Dijkgraaf
and colleagues used a display with four images, as shown in Figure 6.17. They
compared the probability of listeners fixating on the target car in a semantically
restrictive condition (“Mary drives …”) and a semantically neutral condition
(“Mary takes …”), respectively.
What is important for the present discussion is that the displays for the restric-
tive and neutral conditions stayed the same; only the auditory input was varied
(also see Kohlstedt and Mani, 2018, in Section 4.2.2.5). This is one approach to
experimental control. It is arguably the simplest one. An advantage of keeping
the visual context constant is that researchers can safely attribute any differences
in looks to differences in the auditory input. After all, the images do not change.
Therefore, any remaining imperfections in the images will cancel each other out,
leaving the audio as the only possible source of statistical effects. It’s that simple.
Designing an Eye-Tracking Study 183
FIGURE 6.17 Display from Dijkgraaf et al.’s (2017) study. Note: Display recreated with
images from the picture-naming database by Severens et al., 2005.
Now, let us take this example one step further. As previously described, some
research questions do not allow researchers to keep the visual input constant (see
Section 5.1). We discussed Marian and Spivey (2003a, 2003b) and Trenkic et al.
(2014) as two such example studies. By the same token, it is easy to think of an
extension of Dijkgraaf et al. (2017) in which the displays are no longer identical
between conditions. One possibility would be to introduce a semantic competitor
into the display, for example a bike in a display with a car (see Section 6.1.3.2). To
test for potential effects of the semantic competitor, researchers would necessarily
need to compare displays with a semantic competitor and displays without (see
Figure 6.18). This means the two displays will no longer be identical.
When displays are different, researchers can no longer ignore what happens to
the other images on the screen. Perhaps participants find bikes inherently more
interesting and appealing than potatoes (I certainly do) and this changes their over-
all viewing behavior. To account for such differences, researchers need to express
looks to the target (i.e., the car) as a percentage of the overall looks to the screen.
This will give them an estimate of the baseline distribution of looks across the
screen in the control condition. Once they account for these baseline effects
(see Textbox 6.3), they may test how participants’ viewing preferences change (i)
in a semantically constraining context, and (ii) when there is a semantic competi-
tor on the screen.
184 Designing an Eye-Tracking Study
Related to this point, it is best to rotate the position of images within a screen
as well. This is to balance out any spatial biases participants may bring to the task.
Participants who read from top to bottom, left to right tend to show a bias for
the top-left quadrant of a screen, even in non-reading tasks. To account for this,
images (targets, competitors, distractors, fillers) should occur with equal frequency
in each position on the screen. This means that in a four-image display, the target
image should occur in each position in 25% of the trials. Textbox 6.4 summarizes
the main points related to designing a visual world experiment.
186 Designing an Eye-Tracking Study
FIGURE 6.20 Three possible displays of a balloon, a shark, a shovel, and a hat. Note: Shark
in Display One reproduced under a Creative Commons Attribution-
ShareAlike 2.5 license, https://creativecommons.org/licenses/by-sa/2.5/
deed.en Note: Displays Two and Three recreated with images from the
International Picture Naming Project (Bates et al., 2003; Szekely et al., 2003).
(Source: Modeled after Marian and Spivey [2003a, 2003b]).
that I modeled after Marian and Spivey’s (2003a, 2003b) influential experiments
(for a review, see Section 4.2.1). The participants in Marian and Spivey’s studies
manipulated real objects (artifacts or toy replicas) placed on a white board; for
illustration purposes, I have replaced them with images here. All displays are meant
to depict a balloon, a shark, a shovel, and a hat. The first display was originally
created for a class.The instructor looked for images on the internet, using only his
intuition as a guide. The result is a mix of images with and without background
color, which look good enough for teaching but would probably not pass muster
with article reviewers because they are so diverse.
For one, the image background color does not add anything to the images. It is
generally better to select line drawings without a color background or, at a mini-
mum, be consistent across the different images in the display. Another observation
is that the images in the first display differ in how clear they are (cf. the hat and the
188 Designing an Eye-Tracking Study
shark, which could be mistaken for another big fish). Because word recognition is
a key component of visual world research, the selection of clear and prototypical
images that can be readily recognized matters greatly. Normed databases, which
will be introduced in the next section, can help with this goal.
Building on these ideas, I adopted images from the International Picture
Naming Project (Bates et al., 2003; Szekely et al., 2003) for Displays Two and
Three instead. For the purposes of this book, I shaded the images in the second
display as a proxy for using color. Color can render images more lively, but at
the same time, color can introduce a new confound into the stimuli. Specifically,
imagine that in the second display, the balloon is colored red. The balloon may
then stand out against the other objects on the screen because it is more visu-
ally salient. Fire engines and other emergency vehicles use red for obvious rea-
sons: to attract attention. Using black-and-white images, or normed color images
(Rossion & Pourtois, 2004), will let you avoid these issues (also see Section
6.3.1.3). Consequently, the display on the bottom, which features the original
images from the International Picture Naming Project, will be the most suitable
for research purposes. There is no confound from color; the images are largely
clear and identifiable and have been normed extensively in a previous study (Bates
et al., 2003; Szekely et al., 2003, 2004, 2005). Finding suitable images, then, is an
integral part of designing materials. Always make sure you pilot your materials
with a similar group of participants before the main study. If eye fixations are
divided roughly equally across the images at the onset of each trial (before the
audio begins to play), you know you have done a good job.
Bates et al., 2003 1. Name agreement (e.g., H 520 Black and Free
statistics) white line
2. Naming time drawing
3. Cross-language universality
and disparity of name
agreement
4. Cross-language universality
and disparity of reaction time
5. Picture characteristics (e.g.,
conceptual complexity)
6. Features of the dominant
response and picture
characteristics (e.g., length in
syllables)
7. Cross-language frequency and
length measures
Szekely et al., 2003, 1. Name agreement 421 Black and Free
2004, 2005 2. Naming time white line
3. Features of the dominant drawing
response (e.g., length in
syllables)
4. Picture characteristics (e.g.,
objective visual complexity)
Lotto et al., 2001 1. Degree of categorical 266 Black and Free for
typicality of the concept white line authorized
2. Familiarity drawing users
3. Naming latencies
4. Name agreement
5. Concept agreement
6. Length in letters of the name
7. Length in syllables of the name
8. Frequency of the written name
9. Age of acquisition of the
concept
Rossion and 1. Naming agreement 260 Gray-level Free
Pourtois, 2004 2. Familiarity texture
(an update of 3. Complexity with
Snodgrass & 4. Imagery judgments, naming surface
Vanderwart, 1980) latencies details, and
color
Severens,Van 1. Number of names 590 Black and Free, but must
Lommel, 2. Name agreement white line request
Ratinckx, and 3. Naming latency drawing access first
Hartsuiker, 2005
Snodgrass and 1. Norms for name agreement 260 Black and Requires a
Vanderwart, 1980 2. Image agreement white line license
3. Familiarity drawing
4. Visual complexity
Designing an Eye-Tracking Study 191
to preview the images (see Section 6.3.1.4), the fixation cross can come either
before or after preview. Most research studies in Tables S5.6–S5.12 (see online
supplementary materials) had a fixation cross before image preview, right at the
outset of each trial. An example is Dijkgraaf et al. (2017), shown in Figure 6.21,
who opted for a classic fixation cross – image preview – audio + image sequence.
When used in this manner, the fixation cross will capture participants’ attention;
however, because of the following preview, the eyes can be at any place on the
screen when the audio begins. In short, with the fixation cross before the preview
phase, initial fixation location and saccade length will no longer be controlled.
The other option is to insert a fixation cross in between the preview and the
audio phase. To signal the beginning of a new trial, researchers could use a beep
sound instead of a cross. Although this sequence is less common, it affords better
experimental control over participants’ eye gaze at a critical point in the trial (i.e.,
right before the audio begins to play). Perhaps, then, this is a better use of fixation
crosses in experimental design. Figure 6.22 depicts the corresponding experi-
mental stages, using Tremblay (2011) as an example. This study was previously
described in Section 5.3. From the researcher’s perspective, each trial consisted of
three stages: a word preview, a fixation cross on a blank screen, and then the words
and audio presented together. (For present purposes, the participant’s mouse click
at the end of the sequence is not depicted, but see Section 5.4.) Given such a
design, participants needed equal time to plan and execute a saccade to any four
of the candidate nouns upon hearing le fameux élan, “the infamous swing”. Finally,
FIGURE 6.21 Three-stage trial: (1) fixation cross, (2) image preview, (3) audio + image
processing with eye-movement recording.
(Source: Dijkgraaf et al., 2017).
Designing an Eye-Tracking Study 195
FIGURE 6.22 Three-stage trial: (1) word preview, (2) fixation cross, (3) audio + word
processing with eye-movement recording.
(Source: Tremblay, 2011).
in the work by Hopp, the fixation cross was presented together with the images,
rather than on a blank screen, and the display remained the same throughout the
trial (Hopp, 2013, 2016; Lemmerth & Hopp, 2018). Even though the screen did
not change, each trial still consisted of three functionally distinct stages, much like
Tremblay (2011). After preview, a beep sound indicated participants had to fixate
on the central cross and audio did not begin to play until participants were fixat-
ing on the cross. In effect, this amounted to an image preview - fixation cross -
audio + image sequence.Textbox 6.5 summarizes the main points regarding other
design features in visual world experiments.
research transparency and will improve your readers’ confidence in the quality of
your materials. Taking transparency up a level, researchers can also upload their
audio recordings to open repositories such as IRIS (https://www.iris-database.
org) or the Open Science Framework (https://osf.io/).
Once you have identified your speaker, the recording session is next. Ideally,
recording should be conducted in a soundproof or sound attenuating booth with
professional audio equipment. Having a dedicated recording space will let you
avoid echoes and unintended background noise. Because the speakers are gener-
ally volunteers, not professional actors, it will be good if they can get some train-
ing prior to the actual recording. First, the speaker should practice speaking in the
microphone. To record at a steady volume, the distance between the speaker and
the microphone should be kept constant as much as possible. It is very important
that the speaker speak with a neutral intonation. Prosody carries a lot of infor-
mation in speech, yet prosodic effects are seldom the topic of visual world research
(for an exception, see Sekerina & Trueswell, 2011). Therefore, to minimize the
influence of prosody, it is better that the speaker does not know much about the
experiment so he or she does not add unintended stress or intonation patterns to
the recording. The speech should also be appropriately paced. For L2 speak-
ers this may mean speaking at a somewhat slower rate. For instance, in Hopp and
Lemmerth (2018), sentences were recorded “at a slow-to-moderate pace with
neutral intonation” (p. 182). Similarly, Ito, Corley, and Pickering (2018) used a
slow speech rate of 1.3 syllables per second with pauses between the phrases in
their sentences in order to “create optimal conditions for predictive eye move-
ments” (p. 253). The benefits of a slower speech rate for prediction need to be
weighed against the risk of inducing strategies in your participants; that is, if the
speech becomes too slow, participants may begin to realize what the experiment
is about and adjust their behavior. Finding the right speech rate, then, is a balance
between what sounds natural and yet, leaves participants time to fully deploy their
predictive abilities, if they have any.
Before you start recording, allow sufficient time for the speaker to familiarize
him- or herself with the materials. Using mono mode (as opposed to stereo) for
the recording typically results in a clearer voice which can be delivered through
both sides of the earphones equally. It is wise to keep a record of the recording
parameters, including the software used, the sample rate, and so on. This informa-
tion can be included later when you write up the methodology for publication.
For the recording itself, my best advice is to get it right the first time. Getting
it right is easier said than done of course, but if you succeed, you can avoid mix-
ing items from different recordings. Participants will be able to tell if you mix
items and so it might be better to redo the recording completely if you have to.
To increase their chances of success, many researchers will make three record-
ings of their sentences in a session. Consider recording the stimulus list from top
to bottom three times, rather than recording the same sentence three times in a
row. Stress patterns tend to carry over between successive recordings and so if you
198 Designing an Eye-Tracking Study
cycle through the whole stimulus list first, there is a greater chance that the second
or third versions will not have the same issue as the first. Make sure you remind
your speaker periodically to keep talking at the same pace. Check the audio on
the spot during the recording session if you can. Finally, like your participants,
your speakers will get fatigued faster than you might think. Give them breaks and
treat them to a nice cup of coffee.
The final step in preparing the audio files is the editing. There is a lot of
audio editing software available for use free of charge. Two common software
programs are Audacity (Audacity Team, 2012) and Praat (Boersma & Weenink,
2018). Using such a program, you first want to normalize all the files, meaning
you adjust the volume of all the audio files to a similar level. Then, listen to each
file very carefully and pick the recording, for each sentence, that sounds clear
and neutral. Consider adjusting the pitch of certain parts of the audio if there is
any unintended stress. Length is another thing you could manipulate with the
software (see Section 6.3.2.2). Save the files in a format that is compatible with
the experimental software you will use. Make sure the file names are meaning-
ful to you. Using the item number is generally a good idea, and so is including
the critical word and/or the condition (e.g., 01_Constr_Reads_Book.wma or
06_graben_3sg.wav). Having this information in the file name will save you a lot
of time when you program your experiment.
time window for the time it takes to plan and execute a saccade. In that case, the
temporal region of interest will be shifted rightward by 200 ms—from the offset
of the predictive cue + 200 ms to the onset of the target referent + 200 ms—
because 200 ms is how long it takes to plan and launch a language-mediated eye
movement (e.g., Matin, Shao, & Boff, 1993; Saslow, 1967). In contrast to the single
prediction windows found in most prediction studies, some researchers who study
referential processing use multiple time windows, representing the different key
constituents in the sentence. For instance, Sekerina and Sauerman (2015), in a
study on the interpretation of every in Russian and English, had four time win-
dows: the sentence subject with every, the verb, a locative prepositional phrase, and
a silence at the end of the sentence (see Section 4.2.3).
Once you have determined appropriate time periods for your own project,
the next step is to determine when exactly these time points occur in the audio
files. This is where you go back to your speech editing software to listen to your
audio recordings again (see Section 6.3.2.2). Listen carefully to each sentence.
When do the beginning and end of each time period occur? You will want to
mark the onset and offset latencies (in ms) relative to the beginning of the trial.
Figure 6.23 demonstrates how this works for the sentence Every alligator lies in a
bathtub, from Sekerina and Sauermann (2015). As previously mentioned, Sekerina
and Sauermann analyzed eye fixations in four time periods: every alligator — lies —
in a bathtub — (silence). For illustration purposes, my research assistant made a
new recording of the English sentence. I have presented it as Figure 6.23 in what
follows. In the recording, the four time periods had the following onset and offset
times: every alligator (0–1290 ms), lies (1290–2010 ms), in a bathtub (2010–3084
ms), silence (3084–end of trial). Researchers could now add this information
(i.e., 1290, 2010, and 3084 ms, start and end of trial) as time stamps in the
programming software of their eye-tracking experiment. With the help of these
time stamps, they will be able to extract the eye-fixation data for each of the four
time windows separately and ignore all data that was recorded outside this time
window. This will greatly facilitate the analysis.
Like many other researchers, Sekerina and Sauermann used a number of
different sentences to test their research hypotheses. Therefore, the research-
ers had to measure onset and offset times for the critical time periods in each
sentence separately. Other studies may have a carrier phrase, an introductory
phrase that is identical across all the experimental items. Examples of English
carrier phrases are Pick up the …, Click on the …, Look at the …, and Find the
… . These phrases derive their name from the fact that they carry, or provide
a frame for, the following object, which is the target referent. If the researchers
use the same carrier phrase throughout the experiment, they can control for its
acoustic properties and will need to measure the onset of the first time window
only once. Specifically, the onset of the first time window will coincide with the
end of the carrier phrase. An example of this approach is Hopp’s (2013) study on
gender-based prediction in German. Participants listened to sentences such as
Wo ist der / die / das gelbe [Noun], “Where is theMASC./FEM./NEUT. yellow [Noun]?”
and clicked on the corresponding objects on the screen. To align onset times
across different sentences, Hopp took the carrier phrase Wo ist, the gender-
marked article, the adjective, and the noun, and put them together in a newly
formed audio file. The carrier phrase was exactly 1,103 ms long in each file.
This technique, which is known as splicing, is a way to control for the length
of different parts in the sentence and make sure time periods start at the same
point across different trials. With a user-friendly software, splicing should be as
easy as the copy-and-paste operation (see Figure 6.24); however, it is important
to check the naturalness of the spliced stimuli.
Other researchers do not splice their sentences but hand-edit the length of the
different constituents. For instance, Morales et al. (2016), in a study on gender-
based prediction in L2 Spanish, hand-edited the length of the carrier and the
definite article so they would be the same across all trials. Specifically, the carrier
encuentra, “find”, was edited to 800 ms, the definite articles el and la were 147
ms, and they were followed by 50 ms of silence before the target noun onset.
With these precise time points, the researchers were able to link eye fixations at
any point during the trial to the exact input a participant heard at that time (see
Figure 6.25). That is, any eye fixations to the left of the Y axis (i.e., negative times,
not shown in Figure 6.25) would have been for the carrier phrase encuentra. This
is where baseline effects in picture preferences would occur, if there are any, so it
could be informative to plot that data as well (see Osterhout, McLaughlin, Kim,
Greewald, & Inoue, 2004; Steinhauer & Drury, 2012, for similar arguments for
event-related potentials). The article el or la plus silence covered 0–197 ms, and
the target noun came right after that. Accordingly, the authors’ temporal region of
interest extended from 200 ms post article onset (to account for saccade latency)
to 900 ms, when participants clicked on the corresponding target noun.
FIGURE 6.24 Splicing the target noun onto the carrier phrase.
FIGURE 6.25
Eye fixation patterns plotted against time. The Y axis and the black
vertical line mark the onsets of the two critical time periods in the study.
Note: shaded area represents significant differences between conditions,
based on Morales et al.’s (2016) analyses.
(Source: Morales et al., 2016, graph modified with permission from the author).
202 Designing an Eye-Tracking Study
In sum, visual world researchers have a few different options to make their
audio files ready for use. Whether they edit their files by hand or choose to splice
them, ideally the different parts within a sentence (e.g., lead-in or carrier phrase—
predictive cue—target noun) should coincide across the different audio files. For
instance, if 300 ms marks the onset of a critical time window in a sentence, it
is best if it does so consistently across all sentences. Aligning your time periods
in this manner will produce cleaner data and facilitate data analysis. What those
eye-movement data are and how you can begin analyzing them will be the topic
of Chapters 7 and 8. Textbox 6.6 summarizes the key points for creating audio
materials in eye-tracking experiments.
1. Choose a speaker with a clear and pleasant voice. Have the speaker
practice talking at an appropriate, slow-to-moderate pace, with neutral
prosody. Use a soundproof or sound-attenuated booth and professional
recording equipment when possible. Jot down important parameters
(e.g., sample rate) for later reporting.
2. Edit your sound files using audio editing software. Normalize volume
across all files. Hand-edit the length of the different components if
necessary.
3. Set time periods in a manner that will let you address your research
questions. Visualize the speech stream in the audio editing software to
identify precise time points. Use these time points to align the segments
in your different sentences.
6.4 Conclusion
This chapter has provided detailed, practical guidelines for you as a researcher to
design your eye-tracking study. These guidelines should be read in tandem with
the general methodological guidelines provided in Chapter 5, as eye-tracking
research follows the same principles as other types of experimental research. A
key element in study design is setting interest areas for your research project (see
Section 6.1). Interest areas come in many shapes and forms, reflecting the diversity
of eye-tracking applications in our field.Text-based and visual-world eye-tracking
researchers have used four types of interest areas, namely word-based interest areas,
larger areas of text, images, and dynamic (moving) interest areas.To define interest
Designing an Eye-Tracking Study 203
areas for your study, you will want to consider your research questions, your mate-
rials, and the spatial accuracy and precision of your eye-tracking system.
Next, we turned to the specifics of text-based and visual-world eye-tracking
design. Eye-tracker properties and the characteristics of the human eye will
impose spatial and artistic constraints on text presentation, such as how large
the text should be and what the minimal region of analysis should be (see
Sections 6.2.1 and 6.2.2). Likewise, linguistic factors related to how the human
mind processes language require experimental control. It is seldom a good
idea to put some material onscreen and simply look at what happens. A better
approach is to create different versions of the same material and relate dif-
ferences in eye movements to the changes you made. Factors that cannot be
controlled for in the experimental design should be controlled statistically (see
Section 6.2.3). The highlights of text-based eye-tracking design were summa-
rized in Textbox 6.2.
In visual world research, as in text-based eye tracking, a careful research design
is paramount to account for all the factors that could potentially influence a
participant’s eye gaze during the study (see Section 6.3.1.1). Careful experimen-
tal design enables researchers to control potential confounds resulting from vis-
ual (e.g., colors) and linguistic (e.g., image names) features of the materials (see
Sections 6.3.1.2 and 6.3.1.3). Normed databases are a useful resource for selecting
suitable visuals (see Table 6.2). For audio, the emphasis is on making clear, qual-
ity recordings that will be easy to comprehend (see Section 6.3.2.1). Audio files
come with their own regions of interest, called time periods or time windows (see
Section 6.3.2.2). Researchers may need to edit their recordings a bit to ensure
that time periods start at the same time across their different sentences. The key
considerations for creating images and audio were summarized in Textboxes 6.4,
6.5, and 6.6.
At the end of the day, designing a sound experiment is not unlike gardening.
Careful planning up front will ensure that you get the most fruitful results down
the road. Even if you find yourself down in the weeds at some point, it is impor-
tant to keep sight of the coming seasons in your research cycle. Quality research
data is the best harvest! For that, make sure you know the properties of your
materials and take the time to design them carefully. Always run a pilot first. In
a well-designed study, different design features will tie in nicely with the overall
goals and research questions of a study.
Notes
1 Exceptions include some kinds of writing research, face to face interaction, and areas of
computer-assisted language learning where participants are interacting with software
programs.
204 Designing an Eye-Tracking Study
2 The current approach relies on null hypothesis significance testing (NHST), which is
the default statistical approach in SLA and bilingualism, but which poses some con-
ceptual difficulties for the interpretation of null results. Alternatives to NHST that can
also be used for matching purposes are equivalence tests (Godfroid & Spino, 2015) and
Bayesian statistics (Dienes, 2014).
3 This section has benefited from a long discussion about the visual world paradigm with
Geri Altmann. I thank Professor Altmann for his input and suggestions.
4 Coarticulation is the change in articulation of one sound because of neighboring
sounds. For example, a word-initial vowel is typically linked to the offset consonant of
the preceding word.
7
EYE-TRACKING MEASURES
Eye-movement measures define how researchers will look at their data. They are
a way to carve up the large amount of information in an eye-movement record,
by allowing the researcher to focus on particular events, such as fixations, saccades,
or a combination of fixations and saccades, and measure particular properties of
these events. Eye-movement measures are typically extracted for specific regions
on the screen, termed interest areas (see Section 6.1) and may be categorized
in terms of their temporal properties (when the event happened relative to other
events in and out of the same interest area). Eye-movement measures function as
a dependent variable in most statistical analyses, but they have also been used as an
independent variable to study the relationship between online processing patterns
and learning (e.g., Cintrón-Valentín & Ellis, 2015; Godfroid et al., 2018; Godfroid,
Boers, & Housen, 2013; Godfroid & Uggen, 2013; Indrarathne & Kormos, 2017;
Mohamed, 2018; Montero Perez, Peters, & DeSmet, 2015; Pellicer-Sánchez, 2016;
Winke, 2013).
Although eye-movement measures can be calculated by the eye-tracking soft-
ware only after the data are collected, it is a good idea to think ahead about what
measures you will use in your study. This chapter can prepare you to do just that.
It provides a comprehensive overview of the measures that have been used in
SLA and bilingualism research to date, drawing on the substantive review of eye-
tracking literature in Chapters 3 and 4. When deciding what measures to include,
it certainly helps to know the most commonly used eye-tracking measures in the
field. Additionally, researchers may wish to familiarize themselves with measures
that are perhaps less common but typical of their particular subfields. I hope that
the breadth of measures reviewed in this chapter will entice researchers to sample
widely from across the spectrum of eye-tracking measures. In fact, most research-
ers do already include multiple eye-movement measures in their analyses but they
206 Eye-Tracking Measures
FIGURE 7.3
Types of eye-tracking measures used in text-based and visual world
studies.
210 Eye-Tracking Measures
7.2 Eye-Movement Measures
7.2.1 Fixations and Skips
7.2.1.1 Counts, Probabilities, and Proportions
To illustrate many of the primary eye-movement measures found in SLA and
bilingualism, I will use two data trials from a novel-reading study (Godfroid et al.,
2018).You will want to refer back to Figure 7.4 as you learn about each measure.
In Godfroid et al. (2018), L1 and L2 English speakers read five chapters of the
authentic English novel A Thousand Splendid Suns (Hosseini, 2007). The novel is
set in Afghanistan and contains a number of Dari words (the Farsi dialect spoken
in Afghanistan) to convey the foreign setting of the novel. Figure 7.4 depicts two
participants’ responses to an unfamiliar word, tahamul, that occurred in the novel.
Figure 7.5 and Table 7.1 present a summary of the different count measures
in SLA and bilingualism research, illustrated with data from Figure 7.4. Fixation
FIGURE 7.4 Two different reading patterns for an unfamiliar word, tahamul, embedded
in context. Note: this sentence was part of an exchange between the
protagonist Mariam, a 15-year-old girl, her mother, and a tutor about
whether Mariam would be allowed to go to school. Mariam’s mother is
opposed to the idea. She previously uttered the sentence, “There is only
one, only one skill a woman like you and me needs in life, and they don’t
teach it in school. Look at me.” (Hosseini, 2007, p. 17). As seen in the
figure, that skill is tahamul—the Dari word for endure.
(Source: Godfroid et al., 2018).
Eye-Tracking Measures 211
FIGURE 7.5 Count measures in eye tracking in SLA and bilingualism (used in 20 out
of 52 studies with text).
counts and the measures we derive from them (i.e., probabilities and proportions)
have informed a variety of questions in our field. Count measures are useful
when the region of analysis is a larger area on the screen, as in reading assess-
ment (Bax, 2013; McCray & Brunfaut, 2018), subtitle processing research (Bisson,
Van Heuven, Conklin, & Tunney, 2014; Muñoz, 2017), and, importantly, in visual
world research. For example, Bisson et al. (2014) used consecutive fixation
counts in an attempt to distinguish participants’ reading behavior for different
types of subtitles (native and foreign language subtitles). They found no differ-
ences in the number of fixations across subtitle conditions; instead, they found
a rather regular reading pattern of the subtitles even when the subtitles were
in a language that the participants did not know (see Section 3.2.4). Analyses
of fixation counts are also common in various types of sentence-processing
research with lexical interest areas (Carrol & Conklin, 2015; Pellicer-Sánchez,
2016; Philipp & Huestegge, 2015; Siyanova-Chanturia, Conklin, & Schmitt,
2011; Sonbul, 2015) or grammatical interest areas (Philipp & Huestegge, 2015;
212 Eye-Tracking Measures
Winke, 2013). In this case, the analysis of fixation counts supplements the analysis
of fixation duration measures. Analyses of fixation counts and fixation durations
often provide converging evidence: effects tend to be present in both counts and
durations or absent from both (but see Philipp & Huestegge, 2015; Van Assche,
Duyck, & Brysbaert, 2013). The reason is fixation counts correlate with fixation
durations (Godfroid, 2012). As people fixate more often in the same area, aggre-
gate duration measures for that area will increase.
Measuring fixations, and the lack thereof, is the bread and butter of visual world
researchers, who analyze fixation proportions more than any other dependent
variable (see Figure 7.3). Did the participant look at a picture at a given moment
in time, yes or no? This is what the majority of visual world researchers record and
analyze in a raw or aggregate form.The analysis of fixation proportions is common
in prediction research, from word-level processing (e.g., Marian & Spivey, 2003a,
2003b; Mercier, Pivneva, & Titone, 2014, 2016; Tremblay, 2011), to morphosyn-
tax (e.g., Mitsugi, 2017; Morales et al., 2016; Suzuki, 2017; Trenkic, Mirković, &
Altmann, 2014) and semantics (Dijkgraaf, Hartsuiker, & Duyck, 2017; Kohlstedt &
Mani, 2018), all the way up to research at the discourse-syntax-prosody interface
(Sekerina & Trueswell, 2011). Fixation proportion has also been used in a study
on the interpretation of overt subject pronouns (Cunnings, Fotiadou, & Tsimpli,
2017). Researchers want to know for what proportion of participants and trials,
the eye tracker recorded a fixation on a given image. Many also want to know
how this proportion of looks changes over time, for different groups of partici-
pants, and different experimental conditions (see Section 8.5 for further details).
When a study includes both L1 and L2 (bilingual) speakers, the focus is often on
whether L2 speakers or bilinguals exhibit the same type of predictive processing
or interpretive preferences as L1 speakers do (e.g., Cunnings et al., 2017; Dijkgraaf
et al., 2017; Mitsugi, 2017; Sekerina & Trueswell, 2011; Tremblay, 2011; Trenkic
et al., 2014). For instance, Mitsugi (2017) compared L1 and L2 Japanese use of
case markings to anticipate the voice (i.e., active or passive) of the verb (Japanese
is a verb-final language). Participants saw two scenes on the screen, for instance a
woman hitting a man (active) and a woman hit by a man (passive). Mitsugi (2017)
found the L1 Japanese speakers increasingly looked to the correct scene in the
1200 ms following the case-marked nouns and before the verb, suggesting these
participants could use the case markings predictively. The college-level L2 learn-
ers, on the other hand, did not show similar anticipatory behavior (see Section
4.2.2.3). Lastly, the analysis of fixations also has a place in face-to-face interaction
research (McDonough, Crowther, Kielstra, & Trofimovich, 2015; McDonough,
Trofimovich, Dao, & Dion, 2017). McDonough et al. (2015), for instance, created
a new variable, which they termed mutual eye gaze. A mutual eye gaze occurred
during interaction when the L1 and L2 speaker looked at each other during
a feedback episode. The researchers found that the L2 speaker was more likely
to reformulate her initial utterance correctly when feedback coincided with a
mutual eye gaze (see Section 4.2.4).
Eye-Tracking Measures 213
The previous review highlights the many ways in which fixations can be ana-
lyzed, namely, as binary, yes-or-no events (e.g., McDonough et al., 2015, 2017),
counts (e.g., Bisson et al., 2014), proportions (e.g., Mitsugi, 2017), and probabili-
ties (e.g., Marian & Spivey, 2003a, 2003b). Binary fixation data and count data
are available directly from the eye tracker after the event detection algorithm
has processed the raw data (for further details on data preprocessing, see Sections
8.1 and 9.1.3). The binary data can be converted easily into proportions (0–1)
or probabilities (%). To convert binary fixation data into proportions, one simply
divides the number of trials where an event occurred by the overall number of
trials. This will give a proportion, which is a number between 0 and 1. To express
as a percentage, one multiplies the number by 100 (for a detailed description, see
Section 8.5.2.1). These variables are useful in visual world research, where many
studies revolve around the analysis of fixation proportions or probabilities. For
instance, in Marian and Spivey’s (2003a) study, Russian-English bilinguals looked
at the between-language competitor (e.g., spichki, “matches”, given the target
word speaker) in 15% of all trials. This percentage was derived from a total of 20
critical trials per participant and each trial was coded individually for whether or
not the participant looked at the between-language competitor.
The previous discussion has focused on how to calculate proportions and
probabilities for eye fixations. The same principles apply to other binary events in
eye-tracking data, such as skips and regressions. Here, we focus on skips, which are
the opposite of fixations, or “zero fixations” (for an overview of regressions, see
Section 7.2.2). Looking at something or not looking at it (i.e., skipping it) are the
only ways in which participants can engage visually with information. Therefore,
skipping probabilities and fixation probabilities are each other’s inverse. If the
fixation probability for a given condition is, say, 68%, then the skipping prob-
ability is 32%. In a study on language switching, Philipp and Huestegge (2015)
reported skipping probabilities for the first word of a sentence in two-sentence
trials. The researchers found that L1 German–L2 English speakers were some-
what less likely to skip the first word in the sentence of a language switch trial
(i.e., German–English or English–German) than a language repetition trial (i.e.,
German–German or English–English). They argued this finding showed their
participants adopted a “careful reading strategy” (p. 662) following a language
switch, which was relatively short-lived and distinct from any longer-term effects
on sentence comprehension.
In summary, count measures are eye-movement measures that tell us how often
something occurred. The eye-tracking software can count different things—fixa-
tions, skips, visits, and regressions. Researchers can report these measures as raw
counts, binary yes/no events, probabilities (%), or proportions (0–1). Following
Radach and Kennedy (2004), we can think of counts as spatial eye-movement
measures that pertain to a given area on the screen and are distinct from tem-
poral eye-movement measures, which are discussed next. Because fixation
counts subsume all activity over all visits or passes in a given area, they are a
214 Eye-Tracking Measures
late eye-movement measure (see Section 7.2.1.2.2, for more information on the
early/late distinction). Fixation counts are also an aggregate measure, similarly to
total reading time. Conversely, skipping probability is best conceived as an early
measure (Conklin & Pellicer-Sánchez, 2016) because the decision to skip a word
must be made when the word is still in parafoveal vision; that is, before the eyes
have landed on it.
7.2.1.2 Fixation Duration
Duration measures are the largest category of dependent variables in SLA and
bilingualism research.The synthetic review of eye-tracking literature from 15 SLA
journals and Language Testing (see Chapter 3) returned 12 different measures of
FIGURE 7.6
The ‘big four’ durational measures (upper panel) and other durational
measures (lower panel) in eye tracking in SLA and bilingualism (used in
49 out of 52 studies with text). Note: The numbers in the lower panel do
not add up to 16% due to rounding.
TABLE 7.2 Definitions and examples of duration measures
(Continued)
216 Eye-Tracking Measures
TABLE 7.2 Continued
Total time The total sum of all fixation durations [6] + [10] + [11] + [12]
recorded for an interest area
Total visit The summed duration of all visits to [6] + [10] + [10 → 11
duration a particular interest area saccade duration] + [11]
+ [11 → 12 saccade
duration] + [12]
Expected The expected time a participant will Syllable-based expected
fixation spend in a given interest area if she fixation duration:
duration distributes her attention evenly Sentence: 12 syllables
across all the interest areas on the Tahamul: 3 syllables (25%) ([1]
screen + [2] + [3] + [4] + [5] +
[6] + [7] + [8] + [9] + [10]
+ [11] + [12] + [13]) ÷ 4
Observed Fixation duration in an interest area [6] + [10] + [11] + [12]
fixation as recorded by the eye tracker
duration
Difference The extent to which a participant’s (([6] + [10] + [11] + [12])
between processing time in a given -
observed interest area deviates from what is ([1] + [2] + [3] + [4] + [5]
and expected expected under an equal-attention + [6] + [7] + [8] + [9] +
fixation assumption [10] + [11] + [12] + [13]))
duration ÷4
(ΔOE)
Note: Measures are represented as a period of time, typically in ms, that corresponds to the length of
the individual fixations.
fixation duration (see Figure 7.6 and Table 7.2). Some of these, such as first fixa-
tion duration, gaze duration, regression path duration, and total time, are now
standard in the text-based eye-tracking literature (see Figure 7.6, top panel).
Other measures, such as refixation duration, second pass time, and rereading time,
are related but have not been reported quite as frequently, perhaps due to the
statistical properties of these variables (i.e., they are often zero). Finally, the area of
durational eye-movement measures has also enjoyed its share of innovation, with
individual researchers and research teams developing new measures (e.g., first sub-
gaze, last fixation duration, delta [Δ] total time) to respond to their research needs
(Hoversten & Traxler, 2016; Indrarathne & Kormos, 2017, 2018; Miwa, Dijkstra,
Bolger, & Baayen, 2014).
FIGURE 7.7 Two different reading patterns for an unfamiliar word, tahamul, embedded
in context.
(Source: Godfroid et al., 2018).
late measures roughly coincide with the initial visit and any subsequent visits to
the region of interest, respectively. In Figure 7.1, early measures and late meas-
ures are represented as two separate branches in the fixation duration category.
The early measures (i.e., first fixation duration, single fixation duration, refixation
duration, gaze duration or first-pass reading time, first subgaze, and regression-
path duration or go-past time) may index “processes that occur in the initial
stages of sentence processing” (Clifton, Staub, & Rayner, 2007, p. 349), such as
word recognition or lexical access.The other measures (e.g., second pass time,
rereading time, last fixation duration, total time, total visit duration) signal com-
paratively late stages of processing and may signal an interruption to the normal
reading process. Although these characteristics hold true in general, exactly how
the distinction between early and late measures plays out in language research will
depend on the research area and the topic under investigation (see examples fol-
lowing). The coding work that went into the current synthetic review revealed it
is much more common for an effect to show up in multiple measures (i.e., both
early and late). When this happens, researchers may feel more confident in their
findings because different measures essentially provide converging evidence that
an effect is real. Though it is rarer, an effect showing up in one (set of) temporal
measure(s) but not in the others could be theoretically more interesting, provided
the finding can be replicated. In what follows, I will present two exemplary studies
218 Eye-Tracking Measures
from different research areas to illustrate this point. In each case, the authors used
their findings to make specific theoretical claims—about the nature of L1 and L2
parsing (Felser, Cunnings, Batterham, & Clahsen, 2012) or about the time course
of lexical activation (Taylor & Perfetti, 2016).
Felser et al. (2012) investigated how L1 and L2 English speakers compute wh
dependencies (filler-gap dependencies for relative clauses) during reading (also see
Section 3.2.1). Participants read complex English sentences with relative clauses.
In half the conditions, the sentences contained an additional relative clause (dou-
ble embedding), as in (1c) and (1d). These doubly embedded relative clauses are
“extraction islands” (p. 67), meaning the relative pronoun of the first relative clause
(e.g., thati) cannot originate from there (example from Experiment 2 in Felser et al.,
p. 87, with layout added; ei is the empty category signaling the base extraction site).
makes it unsuitable for parsing” (Clahsen & Felser, 2006a, p. 117, as cited in Felser
et al., 2012) and hence may give rise to delayed sensitivity to structural manipula-
tions as in (3b) versus (3a).
Second, Taylor and Perfetti (2016), in a word-learning experiment, looked at
the effects of individual differences and lexical knowledge on L1 reading behav-
ior. In their second experiment, 35 native English speakers were trained on 180
rare English words using one of the following combinations of orthographic (O),
phonological (P), and meaning (M) information: O, P, OP, OM, PM, and OPM.
Participants saw each word one, three, or five times. After completing the word-
training paradigm, the participants read sentences embedded with the new words.
This was the part for which eye movements were recorded. Of interest was how
the processing of words in sentences would differ as a function of partial word
knowledge (O, P, and/or M; number of exposures) that participants had obtained
from training.
Taylor and Perfetti found temporally distributed effects of different types of
partial word knowledge. In general, increasing orthographic exposures during
training had an effect on early processing measures. Meaning training, in interac-
tion with reading expertise, affected a late processing measure and phonological
training, in a three-way interaction with number of exposures and a participant’s
lexical knowledge, affected both early and late measures. Taylor and Perfetti’s
study is noteworthy for including seven eye-movement measures (plus another
two skipping measures in Experiment 1), covering both early and late durational
measures as well as probabilities. In so doing, the authors could uncover the tem-
poral dynamics of different aspects of word knowledge, with effects of meaning
crucially appearing after effects of form.
The previous examples show how eye-movement recordings enable a more
nuanced understanding of what types of information are used during language
processing. This includes, but is not limited to, semantic versus structural cues
(Felser et al., 2012) and form versus meaning (Taylor & Perfetti, 2016). Eye-
movement recordings thus offer the prospect of deconstructing general processes
such as word recognition, lexical access, or reanalysis into qualitatively dis-
tinct component processes, as was shown most convincingly in Taylor and Perfetti’s
word-learning experiment. Eye-tracking research is also emerging as a valuable
addition to research on implicit and explicit processing and knowledge (Andringa
& Curcic, 2015; Godfroid, Loewen, Jung, Park, Gass, & Ellis, 2015; Godfroid &
Winke, 2015; Suzuki, 2017; Suzuki & DeKeyser, 2017), where the focus is on
controlled versus automatic processing (Godfroid et al., 2015; Godfroid & Winke,
2015) or the real-time retrieval of linguistic knowledge (Andringa & Curcic,
2015; Suzuki, 2017; Suzuki & DeKeyser, 2017). As the number of applications of
eye tracking in L2 and bilingualism research continues to diversify, we can expect
to see more work that pursues the timing aspects of eye behavior.
We now turn to an overview of the different eye-movement measures that
have been identified in the present review of eye-tracking literature. This section
220 Eye-Tracking Measures
will reflect the early-late distinction and be structured accordingly: from early, to
intermediate, and finally late eye-movement measures.
where there was one and one fixation only. Because many words are fixated more
than once, analyzing single fixation duration entails a significant loss of data. This
is why first fixation duration is generally preferred as an early measure (Rayner,
1998). That being said, single fixation duration can be used to address the same
questions as first fixation duration, namely on word-level lexical processing, gram-
mar acquisition, and single-word reading. In SLA and bilingualism, only Cop,
Drieghe, and Duyck (2015) novel-reading study included single fixation duration
as a dependent variable. Unlike first fixation duration, which showed an effect of
orthographic overlap between words, single fixation duration did not reveal any
significant differences in this study.
Gaze duration is the sum of all the fixations made in an interest area until
the eyes leave the area (see Table 7.2 and Figure 7.7, for an example). In reading
research, the saccade that takes the eyes out of the area could be either forward
or back; it doesn’t matter for gaze duration. For interest areas that consist of more
than one word, for instance idioms or collocations, larger grammatical construc-
tions, or subtitles, the same measure is called first pass reading time. Gaze
duration and first pass reading time are therefore calculated in the same manner,
but gaze duration applies to single-word regions whereas first pass reading time
applies to larger areas.
Of all the standard durational measures, gaze duration is perhaps the most
important and the most widely reported one. A large number of studies, all gram-
mar or vocabulary research, report both gaze duration and first fixation duration
as early measures (e.g., Balling, 2013; Carrol & Conklin, 2015; Clahsen et al.,
2013; Felser & Cunnings, 2012; Godfroid et al., 2013). An equal number of stud-
ies, however, report only gaze duration and not first fixation duration. These are
predominantly grammar studies (e.g., Felser et al., 2012; Sagarra & Ellis, 2013;
Spinner, Gass, & Behney, 2013; Vainio, Pajunen, & Hyönä, 2016) but there are
examples from idiom and collocation processing (Siyanova-Chanturia, Conklin,
& Schmitt, 2011; Sonbul, 2015), caption processing (Montero Perez et al., 2015),
and input enhancement (Winke, 2013) as well.
When the region of interest contains multiple words, as is the case in some of
these studies, it is worth thinking about what type of information first fixation dura-
tion might provide and how valuable that is. Readers will often need to make more
than one fixation to take in larger areas of text and so the duration of (just) the first
fixation may not tell us all that much. For instance, Sonbul (2015) analyzed first
pass reading time (and not first fixation duration) for adjective-noun collocations
such as fatal mistake, citing previous work by Siyanova-Chanturia and colleagues
(Siyanova-Chanturia, Conklin, & Schmitt, 2011; Siyanova-Chanturia, Conklin, &
van Heuven, 2011) as a rationale for doing so. Vainio et al. (2016), in a study on
modifier–noun case agreement, also limited their choice of early measures to gaze
duration because, the authors reported, first fixation duration did not show an effect.
In many other studies, interest areas do consist of single words, yet research-
ers still include only gaze duration as an early measure. Oftentimes, they will
222 Eye-Tracking Measures
not explain why they did not analyze first fixation duration, but a few potential
reasons come to mind. One reason is that first fixation duration is subsumed in
gaze duration (gaze duration is the duration of the first fixation plus any other
first-pass fixations) and so the two measures are not independent. This has impli-
cations for statistical testing, which eye-tracking researchers have mostly ignored,
until recently (Von der Malsburg & Angele, 2017). Another, substantive reason is
that researchers may be keen on capturing early reading processes (see Section
7.2.1.2.2.1), as reflected in first fixation duration and gaze duration, because they
believe early processes are most likely to reflect automatic, non-strategic read-
ing or parsing procedures (Godfroid & Winke, 2015). Beyond this focus on early
processes, however, the finer distinction between the first fixation and any addi-
tional first-pass fixations may be less important for specific research questions. For
instance, Sagarra and Ellis (2013), in a study on temporal cues in sentence process-
ing, reported only gaze duration and second pass duration as summary measures
of early and late processing, respectively. These two measures combined capture
most viewing activity on a word.
In cases when researchers do wish to distinguish between the first fixation and
any additional first-pass fixations in an area, refixation duration offers that possibil-
ity. Refixation duration (not to be confused with rereading time, which is a late
measure) is the difference between gaze duration (or first-pass reading time) and
first fixation duration (see Table 7.2 and Figure 7.7, for an example). Refixation
duration is therefore independent of first fixation duration, unlike gaze duration,
which subsumes both. In SLA, only one study has reported refixation duration to
date. Alsadoon and Heift (2015) found Arabic ESL learners had longer refixation
durations (and also longer first fixations) on enhanced than unenhanced English
words when reading sentences.
First subgaze is the duration of all fixations in an interest area before the par-
ticipant presses a button or makes some other type of overt response. First subgaze
is an example of a custom-made measure that a team of researchers developed
specifically for the purposes of their study. Miwa and colleagues (2014) designed
a lexical decision task with eye tracking to study L1 influence on L2 word pro-
cessing. Besides first fixation duration, their data analysis included two new meas-
ures, first subgaze duration and last fixation duration (described later), which
take into account the task-specific properties of lexical decision. In particular,
participants in a lexical decision task are asked to indicate by means of a button
press whether a string of letters presented on the screen is a word or not (yes/
no response). Because pressing a button is a type of conscious, overt behavior, the
researchers argued that first subgaze is “less contaminated by conscious lexical
decision response strategies than the last fixation, which was ended with a but-
ton press” (p. 452). More generally, Miwa and colleagues included eye tracking in
their lexical decision task in order to obtain more insight into the time course of
lexical processes that lead up to a lexical decision, which is hypothesized to be the
outcome of a series of events. The Japanese-English speakers’ eye-tracking data
Eye-Tracking Measures 223
7.2.1.2.2.2 Late Measures
Second pass time is the summed duration of all fixations made in an interest
area when the eyes visit the interest area a second time or after the eyes initially
skipped that area (see Table 7.2 and Figure 7.7, for an example). It is similar
to, but different from, rereading time, which includes any non-first-pass fixa-
tions (see below). Second pass time assumes an intermediate position in terms
of reporting frequency. It is a well-established measure in sentence- and text-
processing research, including grammar (Felser et al., 2009; Hopp & León Arriaga,
2016; Roberts, Gullberg, & Indefrey, 2008; Sagarra & Ellis, 2013) and vocabu-
lary research (Godfroid et al., 2013; Montero Perez et al., 2015), yet second pass
time is reported less frequently than first fixation duration, gaze duration, or total
time. Although researchers seldom explain why they did not include a particular
measure in their analyses, I suspect the many 0 values obtained for second pass
time play a role. Second pass time will be 0 when participants finish processing
a word in first pass or skip the area altogether. Many 0s in a variable will cause
that variable to be non-normally distributed (bimodal and skewed), which will
require researchers to transform the data and/or perform a different test.2 Even
though second pass time may require some additional data preparation, there are
good reasons for including this measure in the statistical analysis. Of note, second
pass time captures reanalysis following an initial processing difficulty and is a
pure late-processing measure, unlike total time and total visit duration (see later).
Roberts et al. (2008) compared L1 and L2 Dutch processing of personal pronouns
that referred either to a sentence-internal or a sentence-external antecedent. The
late measures, including second pass time, were more informative than the early
measures in this study because differential attachment preferences in the L2 Dutch
groups, compared to the L1 Dutch speakers, only surfaced later in the reading
process. Second pass time was also a useful measure in Sagarra and Ellis (2013),
who used it alongside gaze duration to obtain a full picture of the reading process
(gaze duration + second pass time ≈ total time). By using second pass time, rather
than total time, the authors were able to distinguish late from early processing
more clearly.
Rereading time (not to be confused with refixation duration, which is an
early measure) is the difference in reading time between total time and gaze
duration or first pass reading time (see Table 7.2 and Figure 7.7, for an example).
Eye-Tracking Measures 225
Therefore, rereading time is the sum of all fixations in an interest area except for
those fixations made during first-pass reading. Because visiting the same interest
area more than twice is relatively rare, rereading time and second pass time will
often yield the same values. Rereading time shares the same general properties as
second pass time; that is, a high occurrence of 0 values and a skewed distribution.3
Rereading time has been used in a handful of grammar studies (Boxell & Felser,
2017; Felser & Cunnings, 2012; Felser et al., 2012) and a study on the effects of
input enhancement on reducing vowel blindness in L1 Arabic–L2 English speak-
ers (Alsadoon & Heift, 2015). The work by Felser and her colleagues deals with
the resolution of long-distance dependencies, which are often studied using struc-
turally complex sentences (see example [1] in Section 7.2.1.2.1 for discussion, and
Section 3.2.1, for review). Like second pass time, rereading time is a suitable meas-
ure for capturing the amount of reanalysis in which L1 and L2 speakers engage
when trying to parse a sentence.
It should be noted that the distinction between rereading time and second
pass time is not always clear in empirical studies. Specifically, second pass time is
sometimes defined as any rereading or refixations of an interest area without fur-
ther reference to when the rereading or refixations occurred (i.e., during second
pass or beyond). Strictly speaking, this renders the measure an index of rereading
time, because second pass time refers to second-pass fixations only. Several studies
in the synthetic review had ambiguous definitions of second pass time like this,
which leads me to believe rereading time is a more widespread measure than
the numbers in Figure 7.6 suggest. To improve terminological precision, future
researchers should include clear definitions of their measures and, in the case of
second pass time versus rereading time, specify whether refixations beyond the
second pass are included.
Last fixation duration is the duration of the last eye fixation before a par-
ticipant makes a response. This measure was introduced in the field of bilingual-
ism by Miwa et al. (2014) in a lexical decision study with eye tracking involving
Japanese-English bilinguals. Last fixation duration was the duration of the last eye
fixation on the letter string before the participants pressed a button to make their
lexical decision. Miwa and colleagues argued that last fixation duration is “more
dedicated to response planning and execution” (p. 455). It stands in contrast with
the early measure of first subgaze, described previously, which can reflect “lexi-
cal effects in the word identification system not affected by conscious response
strategies” (ibid.). Although last fixation duration is a new measure for the fields
of SLA and bilingualism, it could also be informative in other tasks that require
participants to make an overt response, including grammaticality judgment tests
(GJTs), translation tests, sentence-picture matching tasks, and written language
assessment tasks.
Total time is the sum of all fixations made in an interest area (see Table 7.2
and Figure 7.7, for an example). It is the most frequently reported eye-tracking
measure in SLA and bilingualism, represented across all five strands of text-based
226 Eye-Tracking Measures
(i) because [total time] is the variable that is of most pedagogical interest,
(ii) because it has yielded the strongest associations with learning in previ-
ous studies, and (iii) because it encapsulates first fixation duration and gaze
duration (see Von der Malsburg & Angele, 2017, for caveats on multiple
testing in eye-movement research).
(Godfroid et al., 2018, p. 568)
In other words, total time is the go-to measure whenever global effects are of pri-
mary interest, although, in general, this should not stop researchers from analyzing
additional measures as well.
Effects are likely to surface in total time, because it combines all the viewing
activity that took place in a given area. In L2 assessment, Bax (2013) investi-
gated the cognitive validity of two IELTS reading tasks (sentence completion
and matching) with a total of 11 test items. In five out of the 11 items, total time
differentiated between successful and unsuccessful test takers (i.e., those who did
and did not answer the item correctly).These findings were echoed in Bax’s other
dependent variables—namely, visit duration, visit count, and fixation count. In
instructed second language acquisition, Indrarathne and Kormos (2017, 2018)
compared the effectiveness of different instructional conditions for teaching and
learning the causative had construction (e.g., he had the house painted).The authors
analyzed mean total time for 21 different occurrences of causative had and related
this to L2 English learners’ gains on two separate pre- and post-tests (Indrarathne
& Kormos, 2017). Within the strand of subtitles research, Bisson et al. (2014)
asked L1 English speakers to watch four chapters of the SpongeBob Square Pants
movie with either English, Dutch, or no subtitles. The movie soundtrack also
varied between Dutch and English. Bisson and colleagues compared total time
in the subtitle region for these different conditions and normalized their measure
for the time each subtitle was shown on the screen. In grammar research, Ellis,
Hafeez, Martin, Chen, Boland, and Sagarra (2014) analyzed total time as the sole
indicator in a study on learned attentional biases in processing temporality in
L2 Latin. Finally, Godfroid and colleagues (2018), whose vocabulary study was
introduced at the beginning of this section, also chose to focus their analyses
on total time, given that the authors’ primary aim was to uncover associations
between overt attention and incidental vocabulary learning.
Although it is often described as a late measure, total time is actually a hybrid
measure that conflates both early and late stages of processing (i.e., gaze dura-
tion + rereading time). In that regard, total time bears some similarity to regres-
sion path duration (described above), which conflates early processing and the
time it takes to overcome a processing difficulty. The hybrid nature of total time
Eye-Tracking Measures 227
could be a reason some researchers choose not to include total time in their
analyses. As mentioned, total time aggregates several other measures and there-
fore these measures (e.g., gaze duration) tend to be correlated with total time
(for a visual representation, see Figure 7.20). When researchers analyze multiple,
correlated eye-movement measures for the same set of eye-tracking data, they
run several non-independent statistical comparisons. To control the Type I error
rate, Von der Malsburg and Angele (2017) suggested lowering the significance
level α by applying a Bonferroni correction or using a rule of thumb whereby
at least two eye-tracking measures need to be significant for an effect to be
considered reliable. A third possibility, suggested here, would be to analyze a
set of measures that are not correlated with each other, such as first fixation
duration, refixation duration, and rereading time (for a visual representation, see
Figure 7.21). In that case, no additional steps are necessary to ensure the validity
of one’s statistical results.
Total visit duration is the summed duration of all visits to a particular inter-
est area (see Table 7.2 and Figure 7.7, for an example). A visit is defined as “the
time interval between the first fixation [in an interest area] and the end of the last
fixation within the same [interest area] when there have been no fixations out-
side the [interest area]” (Tobii Studio User’s Manual v. 3.4.5, p. 110). Therefore, a
visit is similar to the concept of a pass in reading, but the term visit is used more
broadly in other areas of eye-tracking research as well. Total visit duration is a less
frequently used measure, reported only in Bax (2013) so far. Total visit duration
is very similar to total time, discussed previously, and therefore, it may not be
clear what additional information total visit duration can provide.4 In a reading
assessment study, Bax (2013) sought to validate items from the IELTS reading
test as measures of careful local reading and expeditious (i.e., quick and selec-
tive) local reading, respectively. Bax analyzed total visit duration along with three
other measures to gauge how readers attend to specific portions of a text when
answering specific test items. Successful test takers (i.e., those who solved an item
correctly) differed in their total visit durations and other eye-tracking measures
from unsuccessful test takers on a subset of all test items in a manner Bax argued
supported the cognitive validity of the test.
Screen displays in assessment research like Bax’s study will often feature func-
tionally distinct regions on the screen, such as test prompts or questions, answer
options, and reading texts, images, or videos. This may warrant highlighting the
concept of a visit, and hence analyzing total visit duration, because a visit to a par-
ticular area is a functionally meaningful event in language assessment. Generalizing
a bit, total visit duration would seem a useful measure in any research study that
includes large interest areas on the screen (see Section 6.1.2). This includes subti-
tles research and some areas of instructed second language acquisition, in addition
to assessment research. Reporting both total visit duration and total time seems
like overkill, because the two measures are so similar. Researchers who are not
228 Eye-Tracking Measures
sure which measure to include can ask what the role of the different areas on the
screen is.When interest areas are all the same (e.g., all words rather than words and
images together), total time is the default option.
Expected fixation duration is the expected time participants will spend
in a given region on the screen if they distribute their attention evenly across
all the information on the screen (see Table 7.2 and Figure 7.7, for an example).
Expected fixation duration is usually compared with observed fixation duration,
which is the time participants actually spend in the region, as measured by the eye
tracker. Therefore, the difference between observed and expected fixation
duration, or ΔOE, indicates whether participants spent a proportionally greater
or smaller amount of their time in a given area than would be expected if they
processed all information with equal depth. This renders ΔOE the quantitative
equivalent of a color patch on a heatmap (see Section 7.2.3.1): positive values
indicate more attention (warmer colors in a heatmap) while negative values indi-
cate less attention (cooler colors).
Calculations of ΔOE can be letter-, syllable-, or word-based (for arguments
in favor of a syllable-based measure, see Indrarathne & Kormos, 2017, 2018). For
instance, say a participant reads a short, 80-syllable text in 8s or 8,000 ms. This cor-
responds to a mean reading time of 100 ms per syllable. If the target structure in
the text is five syllables long, the expected fixation duration for the structure is 500
ms. If the participant actually spent 625 ms reading the target structure, then ΔOE
is 125 ms. Indrarathne and Kormos (2017, 2018) calculated ΔOE in this manner to
quantify attentional processing (also see Godfroid & Uggen, 2013). They used total
time as a basis for their calculations, but the same formula could in principle be
applied to any fixation duration measure. When using ΔOE, attention is measured
in reference to the participant’s performance on the task itself, rather than vis-à-vis
a control condition, and so researchers may not need to worry about the compara-
bility of their experimental and control conditions (Indrarathne & Kormos, 2017).
That said, in Indrarathne and Kormos (2017, 2018), the results for ΔOE and total
time (a traditional measure) were highly similar so more research is needed to show
that calculating attention in this new manner makes a difference.
At a conceptual level, ΔOE is an interesting measure when groups differ in
their time on task, as in Indrarathne and Kormos’s studies. In theory, ΔOE should
be sensitive to target-form-specific differences in attention after controlling for
differences in time on task. ΔOE could also be useful to quantify changes in
attention that result from repeating the same task. For instance, some empirical
studies on the Output Hypothesis (Swain, 1985) have used an input–output–input
sequence (i.e., reading–writing–reading) in their experimental designs (e.g., Izumi
& Bigelow, 2000; Izumi, Bigelow, Fujiwara, & Fearnow, 1999; Song & Suh, 2008).
In a replication with eye tracking (He & Li, 2018), ΔOE could show whether
attention to the target structure increases disproportionately during the second
reading task, as the noticing function of output would predict (Swain, 1985), or
whether readers just generally speed up or slow down in round two of the task.
Eye-Tracking Measures 229
7.2.1.3 Fixation Latency
First fixation latency, or time to first fixation, is the time it takes for a par-
ticipant to look at a particular interest area, as measured from a prespecified point
in the trial (see Table 7.3). First fixation latency differs from first fixation duration,
described previously, in that first fixation latency is not about how long the initial
fixation lasted, but rather, how long it took for the fixation to happen.This makes
first fixation latency a “one-point measure” (Andersson, Nyström, & Holmqvist,
2010, p. 3). Only the beginning of the first fixation matters for calculating first
fixation latency.
By nature, first fixation latency needs to be measured relative to some other
event in the trial. The simplest case is measuring first fixation latency from trial
onset (i.e., from the beginning of the trial). In other studies, including visual
world experiments, it may make more sense to start measuring from a later time
point, for instance the onset (beginning) or offset (end) of a linguistic cue that
is embedded within the spoken or written input (e.g., Encuentra la pelota, “Find
the ball” ), where the feminine article la cues the feminine noun pelota (Grüter,
Lew-Williams, & Fernald, 2012). When measurement is to begin later in the trial,
researchers need to mark the onset of measurement by inserting a time stamp
in the eye-tracking software. This procedure is demonstrated in Section 6.3.2.2,
using the auditory stimuli for a visual world study as an example. The idea is to
measure when, exactly, a critical word begins or ends and to enter that informa-
tion into the programming software. Conceptually, adding a time stamp is like
programming a stopwatch, so it starts timing the “race” for an eye fixation at the
right moment in the trial.
First fixation latency is the second most widely used measure in the visual world
paradigm, after fixation proportion and fixation probability (see Figure 7.3). Latency
measures have been reported in 25% of all visual world studies in the present review
(Dussias,Valdés Kroff, Guzzardo Tamargo, & Gerfen, 2013; Grüter et al., 2012; Hopp,
2013, 2016; Hopp & Lemmerth, 2018), including two language production stud-
ies (Flecken, 2011; Kaushanskaya & Marian, 2007). First fixation latency has also
proven useful when working with specific tasks such as picture-word interference
(Kaushanskaya & Marian, 2007) and a modified Stroop task (Singh & Mishra, 2012).
In comparison, only one print study (De León Rodríguez et al., 2016) included first
fixation latency in order to gauge the time it took participants to fixate on a word
in a single-word reading task (see Figure 7.8).
For their Stroop task, Singh and Mishra (2012) instructed participants to select
the ink color of the print word, while ignoring the word’s meaning, by looking
at one of four color patches on the edges of the screen (see Figure 7.9). This use
of first fixation latency in eye tracking is akin to measuring reaction times in a
button-press experiment (also see Section 4.2.1). In an elegant production study,
Flecken (2011) used first fixation latency to gain insight into how early bilinguals
conceptualize and describe events. She found that a higher use of progressive
aspect (aan het V in Dutch, V-ing in English) correlated with faster looks to the
action region in the video (e.g., the hands of a man folding a paper airplane),
which was the region that contained information about the ongoing nature of
the event. Flecken argued participants’ looks to the action region showed that
they were extracting information about the ongoing status of the event, which
in turn was linked to their use of the progressive aspect (see Section 4.2.4). First
fixation latency has also played an important role in the prediction strand of visual
world research (Dussias et al., 2013; Grüter et al., 2012; Hopp, 2013, 2016; Hopp
& Lemmerth, 2018). Here, latency has been analyzed to uncover anticipatory
processing; that is, looks to a referent on screen that is yet to be mentioned in the
auditory input (see Section 4.2.2). It is expected that listeners will look faster to
the target image on trials that contain an informative cue that allow for predic-
tion, compared to trials that do not. First fixation latency can tell us whether
listeners do, in fact, look faster in the prediction trials.
To illustrate, first fixation latency has been the measure of choice in a series of
visual world experiments on grammatical gender by Holger Hopp and colleagues
(Hopp, 2013, 2016; Hopp & Lemmerth, 2018). In the 2016 study, Hopp examined
FIGURE 7.8 First fixation latency in single-word reading. First fixation latency was
the time it took participants to make an eye movement from the left-
cross (stage B) to the word or pseudoword on the right (stage C).
(Source: De León Rodríguez et al., 2016).
Eye-Tracking Measures 231
FIGURE 7.9 First fixation latency in an oculomotor Stroop task. The participants saw
a color term in their L1 (e.g., hara, “green”) and needed to make an eye
movement to the color patch that matched the ink color (here, red)
while ignoring the meaning of the word.
(Source: Singh & Mishra, 2012).
7.2.1.4 Fixation Location
First fixation location represents the landing position of the eye, expressed as
a percentage of the total length of the interest area (see Table 7.4). For instance,
if a participant initially lands on the fourth letter s in the eight-letter word
amassale (see Figure 7.10), his or her first fixation location will be 50%. Fixation
location is still a new measure in L2 and bilingualism research. So far, only De
León Rodríguez et al. (2016) used this measure (alongside first fixation duration
and first fixation latency) to uncover crosslinguistic influences on reading strate-
gies. De León Rodríguez and his colleagues recruited balanced French-German
bilinguals, who read out loud French or German words and pseudowords
232 Eye-Tracking Measures
(see Figure 7.10). The researchers wanted to know if differences in the two
languages’ orthographies, French being a more opaque language than German,
would influence the bilinguals’ first fixations. The researchers found no effects
of language on first fixation latency, discussed previously, but an effect on first
fixation location, which they attributed to differences in French and German
orthography. First fixation location, then, can help researchers understand the
fine, sublexical details of the reading process. Future researchers may find in first
fixation location a useful measure to study the eyes’ landing site (see Section
2.4) and how this landing site may shift during reading development.
7.2.2 Regressions
Regressions are eye movements that transport the eye opposite to the reading
direction, for instance right-to-left eye movements in English and left-to-right
eye movements in Arabic. Inherent in regressions is the notion of a task order;
that is, a clear sequence for the eyes to follow, from interest area 1, to interest
area 2, to interest area 3, and so on. When the task sequence is interrupted and
the eyes move back to an earlier interest area on the screen, the movement is
defined as a regression. Regressions are most meaningful in reading and closely
related tasks, precisely because there is a default manner to complete the task
Eye-Tracking Measures 233
Perhaps surprisingly, the researchers found that the regression measure patterned
with first fixation duration and showed a rapid decrease over the first five encoun-
ters with unfamiliar words in a text (see Section 3.2.2). There is no consensus,
then, yet as to whether the landing sites of regressions reflect a strong linguistic
influence or are mostly spatially determined.
Just as what goes up must come down, what regresses in must regress out.
Regressions out represent movements from a given interest area that take the
eyes against the direction of reading (see Table 7.5 and Figure 7.12, for an example).
While regressions in are delayed by nature (you must first move past a word in order
to be able to return to it later), regressions out can occur at different stages, or passes,
in the reading process. A first-pass regression out is a regression that is launched
upon the initial visit of an interest area. Example studies that have looked at first-
pass regressions are Keating (2009), Lim and Christianson (2015), and Mohamed
(2018). Delayed regressions out are regressions that ensue following a revisit of
an interest area (temporally delayed regressions) or regressions that are launched
from a word after the primary interest area, such as the spillover region (spatially
delayed regressions). First-pass and delayed regressions combined make up total
regressions out. Researchers have generally analyzed a single regression measure,
alongside one or more temporal eye-movement measures (i.e., measures of eye
fixation duration, see Section 7.2.1.2). A notable exception is Keating (2009), who
differentiated between the different types of regressions out (i.e., first pass, spatially
delayed, and total) in his analyses. Keating found that only advanced L2 Spanish
speakers picked up on ungrammatical noun-adjective agreement during natural
FIGURE 7.12 Two different reading patterns for an unfamiliar word, tahamul, embedded
in context.
(Source: Godfroid et al., 2018).
236 Eye-Tracking Measures
reading and even then, only when the adjective immediately followed the noun
(see Section 3.2.1).The results were strongest for total regressions (more regressions
when there was a noun-adjective gender mismatch), but trended in the same way
when first-pass and delayed regressions were analyzed separately.
Because saccades are so fast (see Section 2.2), researchers care more about
whether or not a regressive eye movement took place rather than how long the
regression lasted. Accordingly, analyses of regressions focus on regression counts
and measures derived from counts; that is, regression proportions (expressed as a
number between 0 and 1) and regression probabilities (expressed as a percent-
age, ranging from 0 to 100). Readers who need a refresher on these concepts are
referred to Section 7.2.1, which dealt with counts, probabilities, and proportions
in the context of fixations and skips. Regression rates is a more general term
researchers use to refer to either the proportion or probability of regression. In
sum, a regression is seldom simply a regression (see Table 7.5). As a reader, we must
figure out what types of regressions are meant (in or out), for which region and
which time window in the analysis (initial, delayed, total), as well as how the little
mavericks were measured (counts, proportions, or probabilities).
viewing activity. Depending on the type of heatmap it is, this could mean the
region attracted either longer fixations or more fixations; in practice, these two
will often correlate.To create a heatmap, researchers first need to select the partici-
pants and trials they want to include. Because heatmaps capture the distribution
of eye fixations in space, it is usually better not to aggregate data across different
trials unless the trials have the exact same spatial layout (i.e., the same background
image or sentence). Heatmaps can be produced for a single participant (e.g., Bax,
2013) or groups of participants, such as native and non-native English-speaking
children completing the same task (e.g., Lee & Winke, 2018). Figure 7.14 repre-
sents two heatmaps of group-level data from Lee and Winke (2018) (for a color
version of these images, see Lee and Winke’s article).
Once you have selected the data subset to include in the heatmap, the eye-
tracking software will plot all the data samples (i.e., individual data points recorded
by the eye tracker) against the background image. Following a scaling process, in
which fixations are compared against the full range of values in the data set,
every eye fixation will receive a color that reflects its weight. Areas with more or
longer eye fixations will be painted in warmer hues (see Holmqvist et al., 2011,
for technical details). The color legend, which is now standard in major eye-
tracking software, summarizes the outcome of this process (see Figure 7.15a).
Older software versions did not provide a legend, which prevented readers from
accurately interpreting heatmaps included in published studies. Because colors
are relative to the data collected, the color legend should always be included as
a part of research articles. Using the software settings, researchers can further
customize the scale of their heatmaps by lowering or raising the cutoff for the
maximum value (e.g., what counts as red). This is recommended when the goal
is to compare participant groups that differ in their overall viewing activity (e.g.,
native and non-native speakers) because without such an adjustment, the colors
in the two heatmaps will not mean the same. Once the algorithm has worked its
magic, the software will produce a smooth fixation landscape, where all fixation
activity in the display is rendered on a color scale, from warm (most activity) to
cold (least activity).
The first thing to decide when creating a heatmap is whether you would like
to use fixation counts or fixation durations as a dependent variable for the heat-
map (see Table 7.6). Make sure you report this information in a publication to
help readers interpret the figures accurately. In a fixation count heatmap, every
fixation will be assigned the same weight, regardless of its duration. Fixation
duration heatmaps, on the other hand, will weigh fixations differently depend-
ing on their length (longer fixations will be painted in warmer colors).Two major
eye-tracking manufacturers further provide the option of creating heatmaps for
relative, rather than absolute, fixation behavior. Relative-duration and relative-
count heatmaps are useful for multi-participant and/or multi-trial heatmaps
in which the recordings for the participating subjects or trials differ in length.
Without adjusting for trial length from individual participants or trials, the weight
FIGURE 7.14
Heatmaps of fixation behavior during an English speaking test: L1
English children (top) and English language learners (bottom).
(Source: Reprinted from Lee, S., & Winke, P., 2018. Young learners’ response processes when taking
computerized tasks for speaking assessment. Language Testing, 35(2), 239–269, with permission from
Sage. © 2017 The Authors © 2013 Educational Testing Service (ETS). Sample task from TOEFL
Primary® Speaking Test reprinted by permission of ETS the copyright owner).
FIGURE 7.15
Two visual representations of essay-rating data: (a) heatmap and (b)
luminance map. Note: the figures represent eye-fixation durations from
ten raters using an analytic rubric to rate essays. Data are shown for the
entire trial period (36 minutes) using the default scale.
(Source: Data supplied by Dr. Laura Ballard, ETS, and Dr. Paula Winke, Michigan State University;
Ballard, 2017).
Eye-Tracking Measures 241
Measure Definition
Fixation duration Color-coded representation of the duration of individual fixations
heatmap on a background image. Longer fixations are represented with
warmer colors.
Fixation duration A fixation duration heatmap that expresses fixation duration as a
heatmap proportion of trial length (useful when there are multiple trials
(relative) and trials differ in length)
Fixation count Color-coded representation of the number of fixations on a
heatmap background image. A higher fixation density is represented with
warmer colors.
Fixation count A fixation count heatmap that expresses fixation count as a
heatmap proportion of the total number of fixations in a trial (useful
(relative) when there are multiple trials and trials differ in length)
plots, every fixation is plotted as a separate dot. The size of the dots may or may
not be proportional with fixation duration: when there are size differences, larger
dots will signify longer fixations. The dots are connected by lines (saccades) and
together, they represent the fixation sequence, or scanpath, for a given task. Thus,
gaze plots are one type of visual representation of a scanpath. They are used for
descriptive purposes only. In the next section, we will consider how scanpaths can
be quantified and submitted to statistical analysis, in what is known as a scanpath
analysis.
When plotting individual fixations separately, as is the case in a gaze plot,
it may be better to limit the amount of data to a short time period (e.g., eye
movements from one participant or one short trial) lest displays get crowded
and become difficult to interpret (see Bax, 2013). In some cases, it may be more
meaningful to zoom in on a particular time segment within a trial. For instance,
Lee and Winke (2018) juxtaposed a native and a non-native English-speaking
child’s gaze plots at a similar point in a timed speaking task (17–19 seconds left).
They showed how the two children interacted differently with the onscreen
timer and, following extensive data triangulation, recommended that the timer
be removed or made more child-friendly in a revision of the speaking test (see
Section 3.2.5). The gaze plots, then, were one of multiple sources of information
on which the researchers based their claims. This underscores the point that gaze
plots should bring clear added value to a study to warrant their inclusion along-
side other data sources in a research paper and will, in general, require triangula-
tion with other measures.
7.2.3.2 Scanpaths
Scanpaths are visual or numeric representations of eye-movement patterns that
show a sequence of fixations and saccades in time and space. Compared to, say
eye fixation durations or regressions, scanpaths capture eye-movement behavior
over a larger time window and a greater area of space. This earns scanpaths their
status as an integrated measure. In eye-tracking software, scanpaths are commonly
represented as gaze plots (for an example, see Figure 7.16) and in this form,
they can be used for descriptive purposes. Thus, one function of scanpaths is
descriptive analysis based on scanpath visualizations (e.g., gaze plots). Scanpaths
can also be used to check recording quality, especially when working with text-
based stimuli (see Section 8.1.2). When a scanpath representation of reading data
floats systematically above or below a line of text (for an example, see Figure 8.5),
this indicates there was a vertical offset in the data recording and action may be
required. Lastly, scanpaths distinguish themselves from other integrated measures
such as heatmaps and gaze plots, in that scanpaths can be subjected to statistical
analysis (see Godfroid et al., 2015, for an example).Thus, when used appropriately,
scanpaths may combine the appeal of a more holistic, integrated measure with the
rigor of a statistical data analysis.
Eye-Tracking Measures 243
FIGURE 7.16 Gaze plots during an English speaking test: L1 English child participant
(top) and English language learner (bottom).
(Source: Reprinted from Lee, S., & Winke, P. 2018. Young learners’ response processes when taking
computerized tasks for speaking assessment. Language Testing, 35(2), 239–269, with permission from
Sage. © 2017 The Authors © 2013 Educational Testing Service (ETS). Sample task from TOEFL
Primary® Speaking Test reprinted by permission of ETS the copyright owner).
•• What global reading strategies do adult L1 readers use when reading exposi-
tory text? (Hyönä, Lorch, & Kaakinen, 2002).
•• How does background sound influence the gaze scanpath of people watch-
ing a film clip? (Vilaró et al., 2012).
244 Eye-Tracking Measures
•• How do expert and novice school teachers from different cultural back-
grounds use their eye gaze in real-world classrooms? (McIntyre & Foulsham,
2018).
•• Can participants meaningfully interpret their own and other people’s static
and dynamic gaze displays? (Van Wermeskerken, Litchfield, & Van Gog, 2018).
FIGURE 7.17
Sequence of teacher scanpaths. The teacher’s eye gaze alternated
between a student, teacher material, and the classroom at large.
(Source: Reproduced from McIntyre and Foulsham (2018) under a Creative Commons Attribution
4.0 International License © 2018 The Authors, http://creativecommons.org/licenses/by/4.0/).
Eye-Tracking Measures 245
subtitle area could be another region. Or, in a modified version of the same pro-
ject, the image area could be one region and the subtitle area could be subdivided
into several, smaller regions. It all depends on what the researcher wants to study.
In a study on written GJTs, Godfroid et al. (2015) discerned four functional
regions in their test sentences, which centered around the grammatical violation
(see Figure 7.18). The error in the ungrammatical version of the sentence was
labeled the primary interest area (B), because this is where participants are
first expected to slow down (see Section 3.2.1). The primary interest area was
preceded by a sentence-initial region (A),and followed by a spillover region
(C), and a sentence-final region (D). In essence, functional regions like A–D
are interest areas (see Section 6.1) that are custom-drawn by the researcher for
his or her analysis. Because this entails a level of subjectivity, it is a good idea to
cross-check your partitioning of space with that of a colleague. If done well, the
resulting segmentation will be a simpler representation of the stimulus (i.e., a
visual display or text) that retains the information that is important for answer-
ing the research questions.
Once the segmentation is in place, researchers can represent the observed eye-
movement patterns by means of symbol strings. To do so, each region is denoted
by a symbol (see Figure 7.18), and each eye fixation or visit to the region is rep-
resented with the corresponding symbol. Thus, the fixation sequence shown in
Figure 7.19 can be represented as AAAAAABBCDDD, at the level of individual
246 Eye-Tracking Measures
FIGURE 7.19
Sentence reading pattern with different functional regions
superimposed. This eye-fixation sequence can be represented as
AAAAAABBCDDD, at the level of individual fixations, or ABCD, at
the visit level.
fixations, or it can be further condensed into ABCD, if only visits are of interest.
This approach provides a common metric for describing eye-movement patterns
on otherwise distinct sentences. Godfroid and colleagues (2015) found non-native
speakers produced fewer scanpaths with regression when performing a GJT with
time pressure than without. The researchers argued that the drop in scanpaths
with regressions in the L2 group signaled a reduction in controlled processing
(e.g., Clifton et al., 2007; Reichle et al., 2009) as a result of the time restriction.
Put differently, adding time pressure to a GJT may make it more difficult for
L2 speakers to engage in controlled processing, necessary to access their explicit
knowledge and may force them to rely more on implicit (Ellis, 2005) or automa-
tized explicit (Suzuki & DeKeyser, 2017) knowledge instead.These findings, then,
underscore the importance of timing in the measurement of implicit, automatized
explicit, and explicit, automatized explicit, knowledge. At a methodological level,
Godfroid et al.’s study demonstrated how researchers can use scanpaths to under-
stand the impact of certain task conditions (e.g., timed vs. untimed test conditions)
on participants’ task performance.
Future researchers may wish to expand on this approach by comparing the
scanpath similarity of different trials (symbols strings) directly. Scanpaths are
more similar if they require fewer edits (e.g., fewer insertions, deletions, or sub-
stitutions of symbols) to be matched and, hence, carry a lower transformational
cost to be made equal. String-edit method such as the Levenshtein metric
(Levenshtein, 1966) can be used to match strings automatically or semi-auto-
matically. The outcome of a string-edit comparison will be a large number of
pairwise (string-string) similarity values, which represent the cost involved in
matching two strings. These similarity values can then be analyzed statistically.
Exemplifying this procedure, Von der Malsburg and Vasishth (2011) per-
formed a cluster analysis on the output of their own scanpath similarity algo-
rithm, which they applied to existing reading data from Meseguer, Carreiras,
and Clifton (2002). Using their novel procedure,Von der Malsburg and Vasishth
observed three representative “scanpath signatures” (p. 109) for the reading
of temporarily ambiguous sentences. Interestingly, only one of these scan-
path signatures (i.e., a regression to the beginning of the sentence followed by
Eye-Tracking Measures 247
FIGURE 7.20
Overlap (non-independence) between three common durational
measures. First fixation duration is a part of gaze duration and gaze
duration is a part of total time. A fourth measure, regression path duration
(not shown here), also correlates with gaze duration.
Eye-Tracking Measures 249
FIGURE 7.21
Alternative decomposition of a viewing episode into statistically
independent, durational measures. The values for first fixation duration,
refixation duration, and rereading time are unrelated (top panel). First
fixation duration and refixation duration can be subsumed under gaze
duration (bottom panel) if the “early” vs. “late” processing distinction is
of primary interest.
multiple measures in their analyses, but they still rely heavily on eye fixation dura-
tions. Researchers could explore the value of skips, visits, and regression measures
for their own projects. Heatmaps, gazeplots, and scanpaths may also be informative
in specific research contexts when properly used. A recurring theme in this chap-
ter is that the various measures provide different, and complementary, information
on participants’ processing behavior. For example, proportion of skips could tell
you whether or not focal attention is paid to an interest area at all (see Section
7.2.1.1). A regression count could reveal additional processing or difficulty in
processing (see Section 7.2.2). A mix of commonly used measures and some less
commonly used ones may be a good starting point.
Notes
1 It is possible to compute regression path duration when readers do not regress out of
an interest area upon first pass, but in that case, regression path duration will be the
same as gaze duration.
2 The most common approach to normalizing second pass data is to do a logarithmic
transformation (see Section 8.2.2). In some cases, the high number of 0s in the original
data set remains a concern, even after transformation, and special regression models
such as negative binomial regression (Godfroid et al., 2018), zero-inflated regression, or
gamma regression (Mohamed, 2018) can offer a solution.
3 The most common approach to normalizing rereading times is to do a logarithmic
transformation (see Section 8.2.2). In some cases, the high number of 0s in the original
data set remains a concern, even after transformation, and special regression models
such as negative binomial regression (Godfroid et al., 2018), zero-inflated regression, or
gamma regression (Mohamed, 2018) can offer a solution.
4 Compared to total time, total visit duration additionally includes the durations of any
saccades that were made in between fixations; however, given that saccades are so short,
this typically does not increase values much.
8
DATA CLEANING AND ANALYSIS
This chapter covers the steps between data collection and the reporting of results.
Most eye-tracking researchers in SLA and bilingualism perform inferential sta-
tistical analyses on their data, but to do so, some preparation is necessary. We will
consider data cleaning (Section 8.1) and outlier treatment (Section 8.2) as two
necessary steps in preparing data for analysis. Next, I will provide an overview of
common statistical practices in current eye-tracking research (see Section 8.3).
This overview will set the stage for the remainder of this chapter—an introduc-
tion to two fairly new inferential statistical techniques.
In Section 8.4, I introduce linear mixed-effects models, the fastest growing ana-
lytical technique in L2 and bilingual eye-tracking research. Section 8.5 is devoted
to the time course analysis of eye-tracking data. It includes an extensive introduc-
tion to growth curve analysis (Mirman, 2014). Sections 8.4 and 8.5 will be most
helpful to you if you are already familiar with multiple regression (for general
introductions, see Field, 2018; Gries, 2013; Jeon, 2015; Larson-Hall, 2016; Plonsky
& Ghanbar, 2018; Plonsky & Oswald, 2017). Even so, the general ideas in these
sections are meant to be useful and accessible to all readers. Each analysis section
ends with a concrete example analysis of real eye-tracking data (see Sections 8.4.4
and 8.5.2.5) and a model for how to report the results (see Sections 8.4.5 and
8.5.2.6). To conclude, a roadmap is provided to guide eye-tracking researchers in
their choice of a statistical method (see Section 8.6), taking into account the dif-
ferent possible types of eye-tracking measures.
8.1 Data Cleaning
Data cleaning refers to the steps researchers take in between data collection and
analysis. Data are not clean (i.e., they are “messy” or “noisy”) when they reflect the
252 Data Cleaning and Analysis
influence of outside factors. As a researcher, you can reduce the noise in your data
by creating a well-designed experiment (see Chapters 5 and 6) and by following
best practices for data collection (see Chapter 9).You should always do your best
to collect the best possible, cleanest data you can. Clean data are good data or,
to put it more precisely, data quality is a strong indicator of overall study quality.
Even so, some degree of noise will be inevitable in behavioral research, including
eye tracking. This noise will come from various sources, including technical error
(i.e., from the eye tracker or other equipment) and human error (participant error,
your own error). So given that you cannot get rid of noise completely, how do
you deal with it?
To perform the data cleaning procedure, researchers will likely use one or
more software programs (see Section 8.1.1). With the help of this special soft-
ware, researchers typically (i) inspect individual participant records and trials (see
Section 8.1.2) and (ii) correct them for drift (optional) (see Section 8.1.3), before
they start inspecting their data set for outliers (see Section 8.2). It is important to
keep a master copy of your data throughout this process; that is, keep the original
recordings intact and make sure you work on a copy of your data set.
in DataViewer, or export timestamp data in Tobii Pro Studio or gaze data (rather
than event data) in BeGaze.
If you are proficient in one program (e.g., R), you may well be able to clean
your raw data entirely with it. For many researchers, however, it will make more
sense to use a combination of programs (e.g., dedicated software and R, SPSS,
or Excel) to conduct different aspects of the data handling and analysis. The key
is to know all the steps that are necessary and determine how you can do them
most efficiently given your experience with the different software programs. In
this section, I intend to provide you with such an overview so you can tackle data
cleaning with confidence.
FIGURE 8.2 Temporal graph of the raw data in Figure 8.1 (0–4.8 sec). The eye-
movement recording shows two continuous traces of position
information in screen pixels that are uninterrupted by blinks (compare
with Figure 8.4).
Data Cleaning and Analysis 255
was simply skimming the text. Small amounts of skimming may be a part of natu-
ral reading, especially when reading longer texts. Therefore, whether this trial (or
participant) ought to be excluded will depend on the goals of the study and how
pervasive the skimming behavior was.
A different picture emerges from Figures 8.3 and 8.4, which show eye-tracking
data in a “noisy” trial. The data in Figure 8.3 shows a lot of downward movement,
which is an atypical reading behavior. These vertical lines could be blinks, track
loss, or the eye tracker going wild because it detects two corneas (i.e., split cor-
nea, see Holmqvist et al. [2011]). Again, we need to look at the raw eye-tracking
data in a spreadsheet or a temporal graph to understand what is going on. Both
blinks and track loss will result in missing values for position information. A split
cornea, on the other hand, will produce inconsistent position values, but no miss-
ing data. The data visualization in Figure 8.4 strongly favors an account in terms
of blinks or track loss. In the first 4.8 seconds of the trial alone, there are four large
vertical bars, and this pattern will repeat itself throughout the trial (compare with
the previous Figure 8.2). The vertical bars are time intervals when the position
information was not available. Because the interruptions are fairly short (< 100 ms),
these were likely blinks and not track loss, although from a technical perspective,
the distinction does not matter because both result in missing information.
Unlike in some studies with event-related potentials, participants in eye-
tracking experiments are not commonly instructed to suppress their blinks.
FIGURE 8.4 Temporal graph of the raw data in Figure 8.3 (0–4.8 sec). The eye-
movement recording shows two traces of eye position information in
screen pixels that are frequently interrupted by blinks (compare with
Figure 8.2).
Asking participants to suppress blinks during reading may actually have the oppo-
site effect and cause participants to start blinking more. In any event, blinks are an
artifact in eye-movement data; their impact on data quality merits careful assess-
ment (see Section 9.3.2.2 for tips to minimize blinking during data collection). If
there are many blinks, as in the current sample trial, it is better to discard that trial.
In other cases, blinks can be deleted from the record.The good news is that blinks
will not affect your calculation of fixation duration measures (see Section 7.2.1.2)
because blinks are contained in two artificial saccades (the downward and upward
lines in Figure 8.3). Therefore, the manufacturer software will automatically filter
the blinks out of any calculations of eye fixation duration.
In sum, researchers can perform different checks to ensure eye data qual-
ity (Mulvey et al., 2018), including checking for track loss. Proper training and
practice operating the eye tracker (see Section 9.3.2.2.2) will go a long way in
reducing track loss. Similarly, track loss can be preempted by proper experimental
design. Mixed, computer- and paper-based designs, in which participants need
to look away from the screen, for instance to read or write something on paper,
are not recommended, because these designs will inherently induce track loss.
Researchers can further take ownership of their data quality by inspecting their
raw data, as illustrated in this section. Holmqvist et al. (2011) reported a typical
data loss of 2–5% with trained eye-tracker operators for an average population of
Data Cleaning and Analysis 257
Europeans who were not prescreened. These levels will vary as a result of techni-
cal, human (participant and operator), and task design factors (for further discus-
sion, see Section 9.3.2.2).
For greater transparency, the amount of missing data and participant exclusions
ought to be reported in research articles, and this information should be broken
down by participant group and condition. To maintain adequate statistical power
(see Section 5.5), researchers will need to recruit new participants to replace any
excluded individuals’ data. As these practices become ingrained in our field, the
eye-tracking community will be able to evaluate what are typical and acceptable
levels of track loss in SLA and bilingualism. Most importantly, researchers and read-
ers will be more assured that the collected eye-tracking data are valid and complete.
coordinates. In the current example, the average position is above the text line;
therefore, in Figure 8.5d I brought them down further by hand. With manual
correction, all fixations are typically corrected the same amount, so that small
fluctuations in pixel height may persist, but the researcher can determine how far
fixations are moved up or down.
For a more efficient approach that does not involve human judgment, Cohen
(2013) wrote an R function, Fix_Align.R, that can do the data cleaning for you.
If you have large amounts of data to clean, this program could be a life-saver.
Fix_Align.R uses linear regression to assign individual fixations to a text line
(in a multi-line experiment) and removes or flags outliers and ambiguous fixa-
tions based on the regression analysis. The best-fitting regression line, and hence
the program’s cleaning solution, is the one that maximizes the likelihood of the
regression line, given the recorded eye fixation locations. Cohen (2013) reported
near-perfect classification agreement (99.78% agreement) between the software
and an experienced eye-tracking researcher who cleaned the same data manually.
Data Cleaning and Analysis 259
To evaluate the quality of the automated cleaning procedure for their own data,
researchers can use the trial_plots argument in R.
To minimize the need for post-hoc adjustments, it is good practice to make
your interest areas large enough so they can absorb small amounts of drift (see
Section 6.1). Specifically, by including an extra buffer in interest areas, you will
be able to account for human and technical (eye tracker) error in eye-move-
ment registration. Buffers around images or objects in images are a case in point:
see Figures 6.9, 6.11, and 6.12, for examples from visual world and production
research. In text-based research, work on glosses (i.e., translations or paraphrases of
difficult words in a text) offers a particularly salient example of why interest areas
matter. Marginal glosses are a clear, stand-alone target during reading. As can be
seen in Figure 8.6, the glosses are reached from the text via long-distance saccades,
which tend to be more error prone.Therefore, any saccades directed at the general
area of the gloss are likely intended for the gloss, even though the eye tracker may
not register them as actually landing on the gloss. These various sources of error,
then, can be accounted for in the design or analysis of the study, by making the
interest areas around the glosses a bit larger.
In sum, the availability of partially and fully automatic data cleaning procedures
has made this step of the research process more engaging, or even fun. Researchers
can now choose to correct drift manually, by moving fixations up or down, or
automatically, with the help of special software functions or code. Open source
software may offer an alternative for researchers whose default software program
lacks a drift correction function. Information on drift correction—whether and
how you fixed small amounts of drift in your data—should be a part of your
research article.
L2 speakers’ advanced proficiency level, differences between the native and non-
native speakers are expected to be smaller. In contrast, participants in Godfroid and
Uggen (2013) were beginning learners of German, who had had only 3.5 weeks
of college instruction (see Section 3.2.1). All groups engaged in natural reading
of level-appropriate materials—an English-language novel for the advanced and
native English speakers and simple sentences for the beginning German learners.
Figure 8.8 presents three histograms for these respective data sets.
What stands out in these histograms is the similarity in distributions. Mean
fixation duration increases somewhat as proficiency level goes down, as would be
expected, and the curves become slightly flatter. However, overall the shape remains
very similar across the three groups. Of note, the right tail of the distribution does
not reveal a noticeable increase in the number of overly long fixations among L2
speakers. Fixations longer than 800 ms accounted for 0.1% of all native speakers’
fixations, 0.2% of all advanced non-native speakers’ data, and 1% of the begin-
ning learners’ data. These numbers align with Rayner’s (1998) previous estimates.
Therefore, the 800 ms cutoff value for normal viewing behavior may extend
to L2 reading research including participants at different L2 proficiency levels. To
assess the proposed cutoff in the context of their own studies, and to determine a
good lower duration threshold as well, researchers could plot the data from their
own studies in a similar manner.
8.2.2 Data Transformation
Like the general class of reaction time (RT) data, eye-fixation durations and laten-
cies tend to be skewed.They are not normally distributed, but tend to have a long
tail on the right (for examples, see Figure 8.8), due to a small number of obser-
vations with relatively large values. As Whelan (2008) noted, “using hypothesis
tests on data that are skewed, contain outliers, are heteroscedastic [have unequal
variances], or have a combination of these characteristics … reduces the power
of these tests” (p. 477, my addition). Therefore, eye-tracking researchers, like RT
researchers, need to address skew in their data in order to satisfy the normality
assumption of parametric tests and safeguard statistical power.
A common way to address the skewness problem is by performing a logarith-
mic transformation on the data; that is, to create a new variable X’ that equals
the logarithm of the original variable: X’ = log(X). A logarithmic transformation
will reduce high values more strongly than lower values: logb(x) = y ⇔ by = x. And
this is exactly what is needed with right-skewed data. Thus, the new, log-trans-
formed variable will approximate normality more closely and it is this variable,
log(X), that should be used for statistical analysis.2
Although the normality assumption (i.e., the assumption that the depend-
ent variable is normally distributed) is central to parametric statistics, it has not
always been checked consistently in eye-tracking research. To examine this issue
in more detail, I inspected all the studies that included eye fixation duration or
latency measures in my sample. Specifically, a research assistant and I coded (i)
whether the researchers reported transforming their variables and, if not, (ii)
whether they confirmed a data transformation was not necessary because the data
were normally distributed. Perhaps reassuringly, no authors reported their dura-
tion or latency measures were normally distributed (as mentioned, eye-tracking
data are generally skewed). In spite of the apparent non-normality of the data,
however, only 26% of researchers reported performing a logarithmic or other
transformation. It is possible that some of the remaining 74% of researchers actu-
ally transformed their data, but failed to report it, which would be less than ideal
in terms of research transparency and could potentially make interpretation of
findings more difficult. Even so, the present numbers suggest that violations of
264 Data Cleaning and Analysis
regression, and linear mixed-effects models (LMMs) provide researchers with tools
to identify outliers after a statistical model has been fit. This approach is known
as model criticism (Baayen & Milin, 2010). It has the advantage of potentially
leaving a larger portion of the data set intact, while still improving model fit. I will
discuss both options here.
Before attempting any outlier detection, researchers need to ensure that their
data are in the distribution they wish to use for their data analysis (see Section
8.2.2). An outlier could well be a “normal citizen” (Baayen & Milin, 2010, p.
16) after data transformation and this would make any further steps unnecessary.
Once any appropriate data transformations (e.g., logarithmic transformation)
have been performed, researchers who engage in a priori data cleaning can
choose from a range of different options to identify outliers (see Textbox 8.1).
These approaches differ in how many data points are affected and how the trim-
ming will affect the power of the statistical analysis (see Ratcliff, 1993). Common
practice in L2 and bilingualism research is for researchers to set a threshold—
typically 2, 2.5, or 3 standard deviations (SDs) above or below the mean—past
which an observation is considered an outlier. Because reading speed is highly
individual, and will differ both between individuals and between items, mean
values and SDs should be calculated at the level of individual participants and
items, rather than at the group level (Lachaud & Renaud, 2011). This, of course,
presupposes there are enough observations per participant and per item to calcu-
late a meaningful SD (see Section 5.5). If SDs are computed for the grand mean
(i.e., for all participants and items combined), data from slow individuals and
difficult items will be truncated disproportionately, while outliers in compara-
tively fast individuals and easier items will go undetected more often. Therefore,
it is better to avoid using the grand mean. An intermediary solution (if SDs for
individual participants and individual items are prohibitively large) is to use the
mean per condition instead. Lastly, it is worth noting that L2 eye-tracking data
are inherently more variable (i.e., L2 users tend to have larger SDs). Therefore,
even with cutoff values at mean ± 2 SD, a wide range of values will still be con-
sidered “normal”.
When outliers have been identified, researchers can either trim (delete, trun-
cate, eliminate) these observations or replace them by their corresponding cut-
off value. The process of replacing outliers is known as winsorization (Barnett
& Lewis, 1994; Wilcox, 2012). Compared to outlier deletion, which results in
a reduced data set, winsorization preserves more information. Researchers can
again set the window over which they want to winsorize their data, similarly to
what they would do for outlier deletion. For instance, in a 0.10 winsorization, all
values below the fifth percentile or above the 95th percentile are replaced by the
value at the fifth percentile or the 95th percentile, respectively.
Data Cleaning and Analysis 267
case, trimming or winsorizing was not necessary. Hence, model criticism dictates
that only the observations with large residuals should be trimmed.
Although Baayen and Milin (2010) discussed model criticism in the context
of LMMs, this approach is not specific to mixed-effects modeling. Researchers
can also save residuals in ANOVA or linear regression (for more information,
see Field, 2018) and then follow the same steps. To engage in model criticism,
researchers will first fit their statistical model (e.g., ANOVA, regression, LMM)
to the data and save the standardized residuals (for more information, see Field,
2018).With this information, researchers can identify outliers as those data points
with absolute standardized residuals exceeding 2 SD (2 < |z|), 2.5 SD (2.5 <
|z|), or 3 SD (3 < |z|); it is up to the individual researcher to decide how strict
or lenient she wants to be (see Textbox 8.1). It is assumed that the residuals of a
parametric statistical analysis will be normally distributed. Therefore, a suitable
model will have about 5% standardized residuals with absolute values larger than
2, 1.2% residuals with absolute values larger than 2.5, and 0.27% residuals with
absolute values larger than 3. Standardized residuals outside the proposed range
should be removed from analysis. For example, in the residual scatterplot in
Figure 8.10, any data points outside ± 2.5 SD will be deleted before running the
analysis again.
After researchers remove the outliers, they will rerun the same analysis on the
trimmed data set and compare the results with the original analysis. This part
(comparing the two analyses) is shared by all researchers—those who clean their
data a priori and those who engage in model criticism. The comparison is called a
sensitivity analysis (Lachaud & Renaud, 2011) because it is designed to test to
what extent the results of the statistical analysis are sensitive to the chosen clean-
ing procedure (Ratcliff, 1993).
The sensitivity analysis can reveal that results (i) remain the same, (ii) gain, or
(iii) lose statistical significance after data cleaning. Baayen and Milin (2010) argued
that in all three cases, the results from the model post criticism are the more reli-
able ones. In their words, only the final analysis reveals an effect (or the lack of an
effect) “that is actually supported by the majority of data points” (p. 26).
Ratcliff (1993), in a seminal overview of a priori cleaning procedures, con-
cluded that a result should be replicable across a range of different cutoff values.
When the results do not converge, further investigation of the original dataset is
necessary to understand what may be causing the observed differences (Lachaud
& Renaud, 2011). One possibility is that outlier treatment has truncated some
distributions (e.g., by condition) in the data. Specifically, if an experimental effect
is observed only in the longer fixation times (i.e., the “tail” of the distribution),
then outlier treatment may cause a true effect to disappear in the statistical analy-
sis (Ratcliff, 1993; Whelan, 2008). Again, these concerns apply only to a priori
cleaning, which is performed on the raw observations, rather than the residuals.
The takeaway point, then, is that researchers need to report their findings for the
sensitivity analysis and, for a priori approaches, describe potential actions taken to
handle discrepancies that may arise from different cleaning procedures (e.g., trim-
ming or winsorizing at 2, 2.5, or 3 SD).
To illustrate, I will demonstrate the impact of four outlier treatment strategies
on the statistical modeling of some previously published data (Godfroid & Uggen,
2013). I applied four broad strategies—no trimming, mild trimming of biologi-
cally implausible values only, mild trimming combined with model criticism,
and aggressive a priori trimming—to the original dataset. To perform a sensitivity
analysis, I evaluated each strategy, first, in terms of the number of observations that
needed to be removed and, second, overall model fit (R2 value, which represents
the amount of variance explained). A third index, the significance levels for the
independent variables in the model, will be presented in Table 8.4 that follows
after the general introduction to LMMs. Together, these three indices (number of
observations, model fit, and significance levels) enable us to assess the impact of
the different cleaning procedures.
The original dataset (i.e., no trimming) contained a total of 946 observa-
tions. When I fit a mixed-effects model (for detailed discussion of the model
specifications, see Section 8.4), the conditional R2 value was 0.086. These two
numbers will serve as the baseline for the present sensitivity analysis. Note that
the dependent variable for this analysis was a difference score (i.e., Total Time
270 Data Cleaning and Analysis
those (see Figure 8.12). This will then set the stage for the larger and more varied
body of eye-tracking studies with text (see Figures 8.14 and 8.15).
Over half of the visual world studies published to date have relied on ANOVA
for data analysis. This predilection for ANOVA holds true, regardless of the type
of outcome measure used. Studies with durational measures, latency measures, and
counts, proportions, and probabilities are all highly ANOVA-based, even though
this is now rapidly changing (see the following). In fact, the use of ANOVA for all
these measurement types is not without potential concern. As detailed in Section
8.4.1, counts, probabilities, and proportion data should not be subject to ANOVA
(Jaeger, 2008). Alternative techniques such as logistic and quasi-logistic regression
274 Data Cleaning and Analysis
are better suited for capturing the properties of binary or proportional data.These
techniques (logistic and quasi-logistic regression) will be described in Section
8.5.2.3. On the other hand, fixation duration and fixation latency measures are
compatible with ANOVA, provided the data are normally distributed. This will
usually require doing a logarithmic data transformation first to normalize the data
(see Section 8.2.2), but unfortunately many researchers either omit information
about data transformation or assumption checking from their papers or fail to
transform their data altogether. Use of ANOVA in eye-tracking research, there-
fore, requires knowing the properties of your data first and making sure the data
fulfill ANOVA’s statistical assumptions.
The centrality of ANOVA notwithstanding, recent years have seen a sharp
increase in the use of linear mixed-effects models (LMM) and generalized
linear mixed-effects models (GLMM): see Figure 8.13. LMMs and GLMMs are
both extensions of the linear model (LM) that lies at the basis of linear regression and
ANOVA. LMMs and GLMMs owe their name—mixed-effects model—to the fact
that they can accommodate a mix of fixed and random variables (see Section 8.4.2).
If you are familiar with multiple regression (see Field, 2018; Jeon, 2015; Larson-Hall,
2016; Plonsky & Ghanbar, 2018; Plonsky & Oswald, 2017, for good introductions),
LMMs may be the logical next step in your statistics journey (see Section 8.4).
LMMs inherit all of the LM’s advantages, such as the simultaneous handling of
multiple independent variables, both continuous and categorical (see Section 8.4.2).
The main novelty lies in determining what random effects structure to use and
how to interpret these random effects (see Section 8.4.3). We will review different
approaches for fitting random effects structures in Section 8.4.3 and consider why it
may be worth investing in this new technique in Sections 8.4.1 and 8.4.2.
Data Cleaning and Analysis 275
FIGURE 8.16 Three ways of laying out the same data set: for a mixed-effects regression
analysis, a F1 by-subject ANOVA, and a F2 by-item ANOVA.The arrows
represent the units over which the data are averaged and demonstrate
the amount of information (variance) that is lost by averaging.
rapidly becoming the gold standard in L2 and bilingual eye-tracking research (see
Figure 8.13) and may increasingly replace ANOVAs in the next decade, follow-
ing similar developments in psychology (e.g., Barr, Levy, Scheepers, & Tily, 2013;
Matuschek, Kliegl,Vasishth, Baayen, & Bates, 2017).
LMMs offer the same flexibility as multivariate regression (Jeon, 2015; Plonsky
& Ghanbar, 2018; Plonsky & Oswald, 2017), but with the possibility of filtering
out additional sources of variance. In LMMs, each observation is treated as a sepa-
rate data point, represented on a separate line in a spreadsheet (see Figure 8.16, full
dataset). This enables the model to take full advantage of the information in the
data. To ensure robust and generalizable findings, the participant and item base in
a study will need to be sufficiently large for patterns to emerge as significant fixed
effects (Bell et al., 2010; Snijders & Bosker, 2012). Lastly, LMMs have drastically
changed the way “outliers” in one’s data set are addressed (for more information,
see Section 8.2.3). Therefore, opting for a LMM approach will influence every-
thing from study planning, to how the data set is organized, and whether and how
data are trimmed.
LMMs are an extension of LMs such as linear regression and ANOVA. They
derive their name from the fact that they can model a mix of fixed and ran-
dom variables, similarly to repeated-measures ANOVA.3 The fixed effects are
what is normally reported as the results of a statistical analysis in a research paper.
Random effects are variables in a study (e.g., participants, items) that exert an
influence on the outcome, without typically being the researcher’s primary focus
(for more information on fixed and random effects, see Textbox 8.2).
Random effects are the independent variables that result from random
sampling from a population. They are likely to change from one experiment
to another because, for example, researchers might recruit different partici-
pants for a new experiment.
Examples: participants is a random effect because participants are randomly
sampled from the population every time you run an experiment. Likewise, the
items in a study represent a random effect because they are a random subset
of all possible expressions of the targeted linguistic phenomenon.
278 Data Cleaning and Analysis
8.4.4 Worked Example
The aim of this section is to illustrate what LMMs look like when applied to L2
eye-tracking data. Specifically, I will adopt a backward model selection approach,
following current best practices in model selection (Matuschek et al., 2017), and
apply it to an L2 learning experiment (see Figure 8.17). The data for this demon-
stration are from Godfroid and Uggen (2013).4 The present example builds on and,
in some ways, precedes the discussion on outlier identification in Section 8.2.3.
Indeed, the aim of the current analyses is to find the best-fitting statistical model.
The residuals (error terms) of this model can then be subject to model criticism
to identify extreme observations as a part of outlier treatment (see Section 8.2.3).
Godfroid and Uggen (2013) investigated to what extent beginning-level learn-
ers of German notice, or pay extra attention to, unfamiliar, irregular German verbs
during reading (for more information, see Section 3.2.1).The participants read 24
sets of critical sentences that contained either a regular verb, an irregular verb with
an e → i(e) vowel change, or an irregular verb with an a → ä vowel change. Thus,
282 Data Cleaning and Analysis
FIGURE 8.17 Fitting a LMM for a fixed set of independent variables (fixed effects).
there was one (fixed) independent variable, Condition. It had three levels: regular
verbs, irregular e → i(e) verbs, and irregular a → ä verbs. Following Matuschek
et al.’s (2017) guidelines, I started off with a maximally specified random effects
structure. I used the lmer () function in the lme4 package (Version 1.1–17) in R
(Version 3.4.2) with restricted maximum likelihood (REML) as the estimation
method. Degrees of freedom for the t test were calculated using Satterthwaite’s
method. Here is the formula used in the R code:
In this case, the maximal structure included a by-item random intercept (1|Verb), a
by-subject random intercept (1|Subject), a random slope for Condition by subject
Data Cleaning and Analysis 283
You may wonder about trimming down the model even further, for instance to
a by-subject intercept-only model or a by-item intercept-only model. Default
practice in psychology nowadays is to leave both intercepts in (Barr et al., 2013),
on the assumption that items, like participants, will differ among themselves in
the response behavior that they elicit. In line with this practice, Matuschek and
colleagues (2017) did not trim their models down beyond by-subject and by-item
intercept models and in the current example, I will follow the same approach.
Therefore, Model 2, which has a by-item and a by-subject random intercept, is
the most parsimonious candidate for model selection.
Table 8.2 presents the goodness-of-fit statistics for the two models. Both models
also contained a fixed effect for Condition; however, we are not looking at the
results for Condition yet, because we need to find the best-fitting model first. To
select the best-fitting model, I relied mostly on the likelihood ratio test5 (LRT),
combined with the AIC criterion, an index of model fit adjusted for model com-
plexity. Smaller AIC values indicate a better model fit (for model-fitting approaches
in exploratory research, see Gries, 2015). The LRT is a comparison of two mod-
els’ deviances (see Field, 2018, for details). Following Matuschek et al. (2017), I
adopted a significance level α of .20 for the LRT. Using this criterion, p < .20 indi-
cates that the simpler model fits the data significantly less well than the more com-
plex model—that is, the more complex model wins. If p ≥ .20, the simpler model
is preferred because it does not differ statistically from the more complex model in
how well it fits the data. In the current example, both model indices point toward
the simpler model being preferred: it has the smaller AIC value (Model 1: 15021,
Model 2: 15019) and the LRT is non-significant (χ2 = 6.64, df = 5, p = .25).
Once the best-fitting model has been identified, the researcher will focus her
reporting efforts solely on that model. When reporting the results, it is recom-
mended to include the random effects structure, because this serves as a check
TABLE 8.2 Backward model selection: The maximal model and a more parsimonious competitor model
on model fit6 and can contain useful information about individual differences in
participants and items (Cunnings, 2012; Linck & Cunnings, 2015; Matuschek et
al., 2017). Table 8.3 presents the full results for Model 2. We can see that the by-
subject random intercept accounted for about three times as much variance as the
by-item intercept. This is a common finding, in that participants usually represent
the largest source of random variance in a study.
Now, at last, the time has come to take a first look at the results for the fixed
effect of Condition. Recall that these are the coefficients that will provide the
answer to the research question (whether beginning-level learners noticed the
unfamiliar, irregular verbs). Coefficient b1, for a → ä verbs, and coefficient b2, for
e → i(e) verbs, both signal the change in reading time relative to the regular verbs
(intercept coefficient b0). The noticing hypothesis (Schmidt, 1990) would predict
an increase in reading time (and, hence, positive regression coefficients) for unfa-
miliar verb types that are in the process of being acquired (Godfroid, Boers, &
Housen, 2013). As seen in Table 8.3, participants paid significantly more attention
to a → ä irregular verbs, but not e → i(e) irregular verbs.We also found, in a differ-
ent analysis, that amount of attention to the verb forms during reading correlated
with gains on a verb production post-test, confirming the beneficial effects of
increased visual attention during reading for L2 learning.
Finally, the results of this analysis need to be confirmed by means of a sensitiv-
ity analysis, designed to test the possible influence of outliers (see Section 8.2.3). I
subjected Model 2 to model criticism, whereby I saved and plotted the standard-
ized residuals, as shown in Figure 8.10. I removed the observations (k = 23) with
large residuals from the analysis (see Table 8.1) and then reran the model.Table 8.4
presents the final results, as would be reported for publication. The previous con-
clusion, that learners noticed changes in a → ä verbs, is maintained. The pattern
in the e → i(e) verbs is increasingly consistent with this finding, although it did not
quite reach significance. Thus, I found evidence for noticing, though not across
both verb types, probably because there was not enough power after the inclusion
Fixed effects B SE t p
Intercept (regular) −102.79 62.63 −1.64 .11
a → ä irregular 210.11 90.33 2.33 .030
e → i irregular 115.34 90.11 1.28 .21
Random effects Variance SD
(1|subject) 48930 221
(1|verb) 16026 127
Residual 634768 797
R2 marginal/R2 conditional 0.011/0.10
AIC 15019
(Source: Based on Godfroid and Uggen, 2013).
286 Data Cleaning and Analysis
TABLE 8.4 Final model after model criticism. Observations with large residuals (2.5 < |z|)
were removed from the analysis
Fixed effects B SE t p
Intercept (regular) −90.81 64.08 −1.42 .16
a → ä irregular 176.34 80.75 2.18 .04
e → i irregular 135.90 80.60 1.69 .11
Random effects Variance SD
(1|subject) 78550 280
(1|verb) 12471 112
Residual 503912 710
R2 marginal / R2 conditional 0.010 / 0.16
AIC 14425
(Source: Based on Godfroid and Uggen, 2013).
of both subject and item random effects (compare with Godfroid and Uggen,
2013). If any researchers would like to replicate this study, they should consider
increasing the number of items in the two irregular verb conditions. For now, the
results demonstrate how increased attention and noticing (Schmidt, 1990) can
manifest itself in eye-movement records and, more generally, illustrate the poten-
tial of eye-tracking methodology to study L2 learning processes (Godfroid, 2019).
To analyze the data, linear mixed-effects models were fit using the lmer
() function in the lme4 package (Version 1.1–17) in R (Version 3.4.2).
Restricted maximum likelihood (REML) was used as the estimation
method for model fitting. The outcome variable was Delta Total Reading
Time, which was a baseline-corrected measure of total reading time on the
critical verb form. Verb type was the main independent variable of inter-
est. It was represented as a three-level categorical variable, Condition, with
the regular verbs specified as the reference category using treatment cod-
ing. Degrees of freedom for the t test were calculated using Satterthwaite’s
method. The random-effects structure was determined by model compari-
sons, using a backward model selection approach. I began with the maxi-
mal random effects structure and progressively trimmed any random effects
that did not contribute significantly to model fit (Matuschek et al., 2017).
Changes in model fit were assessed through log-likelihood ratio tests (LRT)
Data Cleaning and Analysis 287
with the α level set at .20 (Matuschek et al., 2017) and through a com-
parison of absolute AIC values. The final random-effects structure included
random intercepts by participant and by item.
Results for the best-fitting model are presented in Table 8.4. Detailed
model comparisons are reported in the Appendix [i.e., Table 8.2]. Results
showed that Condition exerted a significant influence on reading times for
irregular verbs with an a → ä vowel change (b = 176.34, SE = 80.75, t = 2.18,
p = .04), indicating that participants spent an average 176 ms extra on verbs
with this vowel change than on matched, regular verbs. For verbs with
an e → i(e) change, Condition was not significant (b = 135.90, SE = 80.60,
t = 1.69, p = .11), suggesting L2 learners’ increase in reading times on this
particular verb type was not statistically reliable.
are collapsed into larger time bins (see Figure 8.19), enabling researchers to cal-
culate aggregate measures. The goal of data preprocessing, then, is to condense
information both spatially and temporally so the researcher can detect changes in
eye gaze patterns over time.
Figure 8.19 is a simplified representation of the binning process. In DataViewer,
the data analysis software from SR Research (see Section 8.1.1), the time course
binning report will do most of the data preprocessing for you. As a researcher,
you only need to select a bin size (e.g., 20 ms, 40 ms, or 50 ms) and one or
more eye-tracking measures (e.g., fixation count) for which you would like data
aggregation to occur. In Tobii Pro Studio, Tobii’s comprehensive software pro-
gram (see Section 8.1.1), researchers will typically export the raw eye gaze data
(timestamp data and gaze tracking data) first and then do the data preprocessing
using another program such as R. The eyetrackingR package (Dink & Ferguson,
2015) provides step-by-step instructions for converting raw eye-tracking data
into a format suitable for data analysis. Here, I illustrate the three major steps in
data preprocessing, using the raw eye-tracking data from a visual world study by
Chepyshko (2018).
Table 8.5 contains the eye-tracking data in a raw, unprocessed form. Because
Chepyshko recorded data with a 300 Hz eye tracker, every row represents one
snapshot of the eye taken at a 3.33 ms interval (for details on sampling speed, see
Section 9.1.3). The time stamp indicates when, on the eye tracker’s internal
clock, the measurement was made. These, then, are the raw eye-tracking data.
They will still require some data wrangling (either manual or automatic) before
they can be graphed or used for statistical analysis.
First, some small amount of binning will be necessary (compare with Section
8.5.1) to downsample the large amount of raw (unprocessed) eye-tracking data
into more manageable units. Binning is useful because
the [eye tracker’s] sampling frequency might be much faster than behavioral
changes (for example, an eye-tracker might record eye position every 2 ms,
but planning and executing an eye movement typically takes about 200 ms),
which can produce many identical observations and lead to false positive
results.
(Mirman, 2014, pp. 18–19)
Table 8.6 shows the outcome of the binning process.The researcher collapsed the
data into 50 ms bins, with 15 data samples making up one time bin (Chepyshko,
2018). The columns Look at Target and Look at Distractor represent the total
number of fixations in the target area and distractor area, respectively, as tallied by
the software. Different from the separate bin-by-bin analyses (see Section 8.5.1),
researchers interested in doing a growth curve analysis will retain the tempo-
ral information for their time-course data, even after binning. To do so, the bin
Data Cleaning and Analysis 291
TABLE 8.6 Binned eye-tracking data for three trials (trial excerpts) from one participant.
Every time bin represents the aggregate data from 15 raw data samples
8.5.2.2 Data Visualization
An appealing feature of visual world research is the detailed visual representations
of how participants’ eye fixation behavior unfolds over time. These graphs are
called growth curves; they are data-rich visuals that conform to best practices in
TABLE 8.7 Binned eye-tracking data with dependent variables used for plotting and analysis: fixation proportion (FixProp)
and empirical logit (elog)
SubjectID TrialID TimeBin Item Condition Interest Area SumFix N FixProp elog wts
107 1 1 pour Content Target 0 15 0.00 −3.43 2.06
107 1 1 pour Content Distractor 5 15 0.33 −0.65 0.28
...
107 2 3 spill Content Target 10 15 0.67 0.65 0.28
107 2 3 spill Content Distractor 0 15 0.00 −3.43 2.06
...
107 6 8 fill Container Target 15 15 1.00 3.43 2.06
107 6 8 fill Container Distractor 0 15 0.00 −3.43 2.06
Note: wts = weights, used in combination with elog in quasi-logistic regression.
data visualization (Larson-Hall, 2017). I really like them. Growth curves lie at the
basis of growth curve analysis. Specifically, the aim of a growth curve analysis
is to describe the shape of one or more growth curves.
Growth curves can be generated for the aggregate measures described in
Section 8.5.2.1—proportions and odds—and for measures derived from odds—
log odds (logit) and empirical logit (elog) (more information following in
Section 8.5.2.3). Growth curves depict participant looks at different images on the
screen, averaged across all participants and all items. As a researcher, you need to
decide for which images or entities on the screen you want to plot the data and for
what time period. These are non-trivial matters, as choosing what to plot and for
what period will have a large influence on your subsequent data analysis.
Consider the following example, from Chepyshko (2018). Chepyshko wanted
to test whether L1 and L2 English speakers of differing proficiency levels can
use verb semantics to predict the upcoming argument of a locative verb. His
displays contained an agent (e.g., cook), a content (e.g., coffee) and a container
(e.g., t-shirt): see Figure 8.20. Of interest was whether listeners would anticipate
different objects following a content verb such as spill (i.e., the content object cof-
fee) and a container verb such as stain (i.e., the container object t-shirt). To answer
this research question, the author could draw a number of potential comparisons,
all of which would provide related but slightly different information (also see
Sections 6.1.3.2 and 6.3.1.1):
(
Each comparison has a different odds formula recall that odds = non-looks looks , )
because the numerator and the denominator are not the same. The odds will
be your dependent variable. Each of your predictors will indicate how the odds
change as a result of your independent variables.Therefore, you want to define the
odds (i.e., pick your numerator and denominator) in a way that makes most sense
for your design and research questions. For instance, in comparison (i), looks at
Content will be the focus regardless of whether participants hear a content verb
or a container verb. In that case, the researcher expects high odds in the Content
Verb condition (where the content image is the target) but lower odds in the
Container Verb condition (where the container image is the target). In compari-
sons (ii) and (iii), the image of interest will change along with the verb, so the
analyst is always modeling looks at Target.Therefore, the odds are always expected
294 Data Cleaning and Analysis
FIGURE 8.20
Sample display from a verb argument prediction study. Each display
consisted of an agent, a content, and a container, and was paired with an
audio description including either a content verb (“The cook spilled the coffee
on the t-shirt.”) or a container verb (“The cook stained the t-shirt with coffee.”).
Note: boxes represent interest areas and were not seen by the participants.
(Source: Reproduced from Chepyshko, 2018).
to be larger than 1 (favoring the Target). In this case, the independent variable Verb
Type indicates if looks to the Target differ between the two verb types.
For the denominator, Barr (2008) recommended using all non-target looks
as the reference, in line with option (iii) outlined previously. In this approach,
looks directed at empty portions of the screen will be included in the analysis
as well (also see Dussias, Valdés Kroff, Guzzardo Tamargo, & Gerfen, 2013). By
including outside looks in the analysis, researchers can rule out potential con-
founding effects of condition on looking behavior. For example, participants
may look at empty space more in cognitively or attentionally demanding condi-
tions due to increased cognitive load. This information will be lost, and may be
confounding results, if only looks at target and distractor images are considered.
Therefore, whichever region is of interest is best compared against everything else
on the screen, for instance looks (Target)
looks (Other)
or looks (Competitor) .
looks (Other)
When graphing, however, researchers will often omit these outside looks from
their data visualizations (i.e., they will not plot looks at Other). This could be
misleading because what is shown in the graph may not be exactly what was ana-
lyzed statistically, putting the onus on readers to figure out exactly what the odds
formula looked like. To align data visualization and statistical analysis, research-
ers could consider plotting these outside looks as an additional line in their
graphs, especially when they plan to include these data later, as Barr (2008) and
I recommend. In all cases, the researcher should clearly state which regions on
Data Cleaning and Analysis 295
the screen were included for analysis so readers are able to retrace this step and
interpret the statistical results correctly.
With these considerations in mind, Figure 8.21 shows two sets of growth
curves for Chepyshko’s (2018) data. The data are for the 0–850 ms time win-
dow, which corresponded to the verb segment (The cook spilled/stained … ).7 The
graphs show looks at the Target and the Distractor, which are of primary inter-
est. Proportions do not add up to 1 because participants were also looking else-
where on the screen. Because this was an early prediction window, and the verb
followed the agent, fixation proportions start off low (many outside looks), but
they increase over time. Importantly, we can detect a widening gap between the
Target and Distractor fixations in the Content Verbs (top panel) 600–850 ms post
verb onset. If confirmed statistically, these diverging lines could indicate anticipa-
tory processing of the verb argument in the Content Verb condition. Specifically,
native English speakers may be able to anticipate, and look at, the argument of
content verbs (e.g., spill the coffee) but not container verbs (e.g., stain the t-shirt) as
soon as they hear the verb form.
Visual inspection of the data, as exemplified here, is an important first step
in the analysis. By looking at graphical data representations carefully, you can
spot trends that suggest anticipation or competition. These patterns can later be
confirmed or disconfirmed statistically, using a growth curve analysis (see Section
8.5.2.5).You could inspect proportion data, as shown here, or use the log odds or
empirical logit data you plan to use for analysis.8 What matters most is that you
look at your data first because this can really help you understand what is going
on in your study. Likewise, when reading other people’s work, it is a good habit to
lay the results of the statistical analysis and the graphs side by side and check if you
can see the statistical results mirrored in the visual data representations.
FIGURE 8.21 Time course graph for the L1 English group. Lines show the proportion
of fixations to the Target (circles) and Distractor (triangles) during
auditory presentation of Content (top panel) and Container (bottom
panel) locative verbs.
(Source: Chepyshko, 2018).
296 Data Cleaning and Analysis
of the distribution (p = .50) than toward the endpoints. This violates ANOVA’s
assumption of equality of variance and is a second reason ANOVA should not be
used for analyzing proportion data (Jaeger, 2008). In sum, fixation proportions
may be used for data visualization purposes, but they should not be treated as a
dependent variable for analysis.
( looks
Odds non-looks ) are another measure of likelihood (see Sections 8.5.2.1 and
8.5.2.2) that is more advantageous for statistical analysis. Odds lend themselves to
known analytical solutions, such as the LM (i.e., regression), because odds can be
transformed into unbounded data. To do so, a two-step link function is applied.
First, the software calculates the odds based on the binary fixation data. By defini-
tion, odds range from 0 to ∞. They have no upper bound. Second, the software
takes the natural logarithm of the odds, called the log odds or logit. This will
remove the lower bound: log-transformed data range from -∞ to ∞. Together,
these two steps turn your original binary variable into a potential outcome vari-
able for a linear regression! Therefore, conceptually, logistic regression is nothing
more than linear regression in log-odds space.
Knowing a few basic facts about the relationship between log odds, odds, and
proportions can help you interpret the regression coefficients in logistic regres-
sion (see Section 8.5.2.5 for a worked example). A .50 proportion corresponds to
( )
an odds of ..50 ( )
.50
50 = 1 and a log odds of ln .50 = ln(1) = 0. For higher proportions,
the odds are larger than 1 and the log odds are larger than 0 (i.e., positive), whereas
for lower proportions the odds are smaller than 1 and the log odds are smaller
than 0 (i.e., negative): see Figure 8.22 and Textbox 8.4.9 It is thus possible to move
between the log odds, the odds, and the proportion scale, which comes in handy
when reporting the results.
The logit link function preserves the original directionality of the effect. This
means that a positive (negative) regression coefficient on the log-odds scale
298 Data Cleaning and Analysis
Unlike with logistic regression, where the software takes care of the logit
transformation, researchers need to do the elog transformation themselves. To
do so, one can simply add a 0.5 adjustment factor to both the numerator and the
denominator of the odds formula. Then take the natural logarithm as one would
for the logit. This will yield the elog value that is used for analysis. A handful of
L2 and bilingualism researchers have successfully applied this approach to eye-
tracking data: see Mitsugi and MacWhinney (2016) for an example growth curve
analysis and Cunnings et al. (2017), Dijkgraaf et al. (2017), and Ito et al. (2018) for
sample elog analyses on separate time bins.
Once you have your elog variable, you can use general linear mixed-effects
modeling to analyze the data. These are the same models that were described
previously in Section 8.4. In contrast, when using true odds for logistic regression,
researchers should opt for a generalized linear mixed-effects model with a logistic
Data Cleaning and Analysis 299
(binomial) link function; that is, a logistic mixed-effects model. Doing so will
cause the software to apply the logit link function (so you do not need to calcu-
late the log odds yourself), but it will also increase processing time if the model
contains random variables (see Mirman, 2014, for technical details). One practical
consideration, therefore, might be that LMMs based on elog take relatively less
time to compute. As you will see in the example that follows (see Section 8.5.2.5),
the conclusion should be very similar, if not the same, regardless of which out-
come variable (logit transformed or elog transformed) researchers select.Table 8.8
summarizes the main differences between the two approaches.
FIGURE 8.23
Visual inspection of behavioral data. (a) A straight line does not fit
the data well, but (b) a curved line, or (c) a curvilinear line does. This
suggests a need for higher-order polynomials (Time2 or Time3) in the
statistical analysis.
(Source: Chepyshko, 2018).
the observations will fall close to the line, in a random pattern). In the case of
Chepyshko’s data, as in many visual world studies, a straight line actually misses
a fair number of data points. There are systematic deviations in the data pat-
tern, which come from an upward shift in observations (especially for the Target)
toward the end of the time segment. If you see this type of curvature in the data,
a curvy line with one or more bends may be better suited to account for your
data. Indeed, the lines in the mid and bottom panel capture the shape of the data
much better. One option to implement this pattern statistically is by introduc-
ing higher-order Time terms (e.g., quadratic Time or cubic Time) in the analysis.
These terms are sometimes referred to as higher-order polynomials, which is
basically a fancy term for a predictor that has been raised to a power higher than 1.
The basic idea is that the polynomial order of your predictor (e.g., Time1,
Time2, Time3) should equal the number of bends in the line + 1. Recognizing
what kind of curve you have may require some practice at first, but as you gain
experience looking at graphs, you will find it easier to spot the turns or inflec-
tion points in the data. Straight lines have no bend. Therefore, they can be mod-
eled with a linear Time predictor: Time1 = Time. When the line has one bend
Data Cleaning and Analysis 301
(one point of inflection), a quadratic term can be considered: Time2. For lines
that change direction twice (i.e., S-shaped curves), a cubic term could be appro-
priate: Time3. In theory, one could keep adding higher-order terms to the model
until the fitted line and the observed line perfectly match. This would be overfit-
ting the data, however, and would limit the generalizability of findings.12 To avoid
overfitting and difficulty in interpretation, it is recommended that researchers
do not go beyond a third-order (Barr, 2008) or a fourth-order (Mirman, 2014)
Time term.
When entering more than one Time term into a statistical analysis, predic-
tors will be highly correlated: as linear Time increases (Time = 1, 2, 3, 4, …),
quadratic Time (Time2 = 1, 4, 9, 16, …) and cubic Time (Time3 = 1, 8, 27,
64, …) increase as well. Thus, one shortcoming of natural polynomials is that
they violate the non-collinearity assumption in regression analysis (Field, 2018;
Larson-Hall, 2016). When predictors are highly correlated, the overall model is
still valid but the results for individual predictors can no longer be interpreted
independently. To address this issue, researchers can use orthogonal (uncor-
related) polynomials instead. With orthogonal polynomials, the Time variable
has been rescaled and centered around the mean (Mirman, 2014) such that all the
polynomials are on the same scale and, more importantly, the resulting time values
are independent (see Figure 8.24). Researchers are then able to inspect the dif-
ferent components of a growth curve independently—the steepness of the slope
(regression coefficient for Time) and the sharpness of the curvature (regression
coefficient for Time2 and/or Time3). Thus, for many research designs, orthogonal
polynomials will be preferred (Mirman, 2014).
Orthogonalizing polynomials does affect the interpretation of the inter-
cept b0. In a visual-world eye-tracking experiment, the intercept will normally
correspond to the likelihood of looks at the target image (or any other chosen
image) at the outset of the time window, or Time 0. The intercept value could
be informative for identifying anticipatory baseline effects (Barr, 2008; Barr,
Gann, & Pierce, 2011); that is, visual biases in participants’ looking behavior before
they have heard the critical part of the input (for more information, see Sections
6.3.1.1, 6.3.1.2, and 6.3.2.2). Such effects can then be distinguished statistically
from rate effects (the coefficients for the different Time predictors), which
reflect the influence of the linguistic signal proper (Barr, 2008; Barr et al., 2011).
In models with orthogonal polynomials, on the other hand, the intercept signifies
differences between the conditions’ overall means (due to the centering) rather
than the differences between the experimental conditions at Time 0. Therefore,
when it is important to test for differences at Time 0 (e.g., to rule out preexisting
differences between groups or conditions), natural polynomials can be employed;
otherwise, orthogonal polynomials may prove a more informative and more fine-
grained approach.
8.5.2.5 Worked Example
In this section, we have unpacked growth curve analysis into its different compo-
nents. To recapitulate, growth curve analysis of visual-world eye-tracking data is a
mixed-model (Section 8.4), logistic or quasi-logistic regression (Section 8.5.2.3)
that can accommodate nonlinear growth over time (Section 8.5.2.4). Now, the
time has come to combine these different building blocks and apply them to the
analysis of a real-world example.We will re-analyze fixation data from Chepyshko
(2018) for the 0–850 ms time window (see Section 8.5.2.2).
Recall that Chepyshko was interested in whether L1 and L2 English speak-
ers can use the semantics of locative verbs (e.g., to fill, to pour, to spill, to stain) to
predict the verb argument. The present analysis focuses on the L1 speakers’ data
for two verb types: Content-oriented Verbs, which take a Content as a direct
object (e.g., to spill the coffeeContent on the t-shirtContainer) and Container-oriented Verbs,
which go with a Container as the direct object (e.g., to stain the t-shirtContainer with
coffeeContent). Thus, when “spill” or “stain” are presented together with images of a
coffee and a t-shirt (see Figure 8.20), each verb has a clear, distinct Target. The
goal of the present analysis is to test whether native English speakers are able to
anticipate that Target early on during processing, as reflected in their looks to the
Target image during the verb segment.
For the present analysis, I started off with a base model that included only
the linear and quadratic Time terms (orthogonalized to remove collinearity, see
Section 8.5.2.4). I opted for a U-shaped (quadratic) time curve based on the
results of the visual inspection shown in Figure 8.23. Both the quadratic and the
cubic curves captured the data well; therefore, I went with the simpler solution.
Note that the base model, and the statistical significance of the Time terms in it,
indicates whether, averaged across the two verb types, participants started looking
Data Cleaning and Analysis 303
at the Target more over time. With three images on the screen (see Figure 8.20),
a .33 proportion of on-target fixations reflects chance performance. Furthermore,
because all the sentences followed the same structure (The Agent will Verb the
Object 1 Preposition Object 2), more looks could be directed to the Agent (e.g.,
the cook) in the early parts of the sentence.
Next, I added fixed effects for Verb Type in three steps. In Model 0,Verb Type
was added as a main effect. In Model 1, Verb Type was allowed to interact with
Time linear. And lastly, in Model 2, Verb Type could also interact with Time
quadratic (see Tables 8.9 and 8.10). This forward, stepwise approach is similar to
Mirman’s (2014) sample analysis with nonlinear Time terms. It combines a given
set of Time terms that are selected a priori based on visual inspection with model
comparison.
Tables 8.9 and 8.11 summarize the findings from the model comparison for
the logistic and quasi-logistic regression. Results for the two types of analysis
converged: in both cases, adding interaction terms to the model (i.e., allowing
fixations to develop differently over time for the two types of verb) signifi-
cantly improved model fit. Whereas AIC values were similar for the base model,
Model 0 and Model 1, they went down substantially for the Model 2 logistic
regression. (Recall that smaller AIC values are better.) Furthermore, the LRT
returned a significant p value for Model 2 in both the logistic and the quasi-
logistic regression analysis. Thus, the most complex model (Model 2) was also
the best one.
In light of the similarities between the two analyses, we will focus on the results
for the logistic regression for the remainder of this section (but bear in mind that
the empirical logit regression produced very similar results). Table 8.11 presents
the detailed results for Model 2. The significant interaction terms indicated that
listeners responded differently to Content and Container verbs.To understand the
nature of this interaction, I ran two follow-up analyses, in which I analyzed gaze
patterns for each verb type separately. Results showed that listeners were faster to
orient to Content objects while hearing Content verbs than when they looked at
Container objects while hearing Container verbs.13
Figure 8.25 plots the predicted values for the two types of verb, transformed
back onto the proportion scale, for ease of interpretation. We see that in this early
time window, only Content verbs exerted an influence on participants’ gaze pat-
terns. Chepyshko (2018) attributed the processing differences for Content and
Container verbs to inherent differences in the verbs’ conceptual representations.
Specifically, he posited that the perceptual and motor correlates of actions such
as spill and stain align more closely with the verb argument structure of Content
verbs in that they assign a central role to the content being spilled or causing
a stain. This, in turn, may facilitate processing of content verbs more so than
container verbs (Chepyshko, 2018). Although the present analysis focused on L1
listeners’ data only, the L2 English speakers in Chepyshko’s study showed a similar
viewing pattern.
TABLE 8.9 Forward model selection in logistic regression: A base model and three competitor models
looks ( Target )
Note: Outcome variable = ln ; Time2 = quadratic Time term.
looks (Other )
TABLE 8.10 Forward model selection in quasi-logistic regression: A base model and three competitor models
Fixed effects B SE z p
Intercept –1.36 0.21 –6.58 <.001
Time (linear) 2.00 0.50 4.01 <.001
Time (quadratic) 0.06 0.05 1.32 .19
VerbType –0.14 0.16 –0.84 .40
Time (linear) × VerbType –1.30 0.48 –2.70 .007
Time (quadratic) × VerbType –0.42 0.09 –4.74 <.001
Random effects Variance SD
(1|subject) 1.79 1.34
(Time|subject) 9.39 3.06
(1|verb) 0.08 0.28
(Time|verb) 0.66 0.82
Residual 12.87 3.59
R2 marginal/R2 conditional 0.03/0.43
AIC 113912
(Source: Based on Chepyshko, 2018).
FIGURE 8.25 Fitted (predicted) values for looks to target in the early time window.
Note: the horizontal dotted line represents chance-level looks at the
Target image. Target looks rose above chance from about 625 ms post
verb onset, but only for the Content Verbs.
(Source: Based on Chepyshko, 2018).
306 Data Cleaning and Analysis
Notes
1 Track loss can be calculated at other levels as well: for instance, track loss at the trial
level or the item level. To determine the amount of track loss, researchers calculate the
percentage of raw eye data samples with missing position information; that is, the per-
centage of rows in the spreadsheet with missing values for the x, y screen coordinates.
2 Transforming a variable will not affect the significance of results (beyond increasing
power or reducing Type I error), but it will have implications for the interpretation of
findings. Specifically, any results are to be interpreted in relation to the transformed
variable, for instance log duration or log latency.To interpret key results on the original
scale, researchers can backtransform their estimates, using an exponential function in
the case of the log transformation.
3 In repeated-measures ANOVA, the within-subject variable (e.g.,Time [Pretest, Posttest,
Delayed Posttest]) is partitioned into a fixed effect and a random effect.The fixed effect
is what is normally reported in research articles; however, the random effect is also
estimated as a part of the statistical output (Field, 2018). Specifically, the random effects
component indicates how the repeated measurements affected individual participants
or items differently (random slope) and how the participants or items differed at Time 1
(random intercept).
4 Godfroid and Uggen were early adopters of LMMs in L2 research, but like many
researchers at the time, they analyzed their data with by-subject random intercepts
only. Here, I will show what the results look like when a more complex random effects
structure is used.
5 When performing a LRT in R using the anova() function, the default is to refit models
with the maximum likelihood algorithm. Since the present demonstration focuses on
how to compare models with different random effects structures, I disabled the refit
functionality in the code. Interested readers could refer to Cunnings and Finlayson
(2015) for a more detailed discussion.
6 One thing to look out for in the random effects structure is a perfect intercept-slope
correlation (r = 1 or r = -1). Perfect correlations indicate that there were too many
parameters in the model; in other words, the model was overparameterized and the
310 Data Cleaning and Analysis
random effects structure needs to be simplified (Bates, Kliegl,Vasishth, & Baayen, 2015;
Matuschek et al., 2017).
7 The exact length of the verb window differed from trial to trial, but in all cases the
window was defined so it included only the verb. For additional examples and discus-
sion of how to set time windows, see Section 6.3.2.2.
8 Because proportions and odds are non-linearly related, some differences in patterns
will occur for proportions smaller than .30 or larger than .70.
9 For instance, a .80 fixation likelihood corresponds to an odds of .80/.20 = 4/1 = 4
(often read as “four-to-one odds”) and a log odds of ln(.80/.20) = ln(4) = 1.386. A .20
fixation likelihood equals a ¼ (one-to-four) odds and a -1.386 log odds.
10 Both ln(0) (the logit for p = 0) and ln(∞) (the logit for p = 1) are undefined.
11 Note that because we are moving into the realm of inferential statistics, I used the
elog data, rather than proportion data in the graph, in preparation for the subsequent
empirical logit analysis.
12 To test the appropriateness of the chosen solution, researchers can engage in model
selection (see Sections 8.4.3 and 8.5.2.5). In a model-selection approach, a model with
a higher-order term will be retained only if the model yields a significantly better fit
than a subset model that does not include the same term.
13 This finding was supported by the larger and consistently positive Time effects for
ContentVerbs, which signalled a stronger growth in on-target fixations:Time linear (b =
3.09, SE = 0.69, z = 4.47, p < .001) and Time quadratic (b = 0.17, SE = 0.07, z =
2.32, p = .02). For Container Verbs, the growth captured in the linear Time was not as
strong (b = 1.60, SE = 0.78, z = 2.04, p = .04) and the quadratic Time term revealed a
negative growth (b = -0.45, SE = 0.06, z = -7.5, p < .001), suggesting on-target fixa-
tions levelled off over time.
9
SETTING UP AN EYE-TRACKING LAB
This chapter provides you with information to get started on your own eye-
tracking research. It focuses on the central piece of equipment, the eye tracker (see
Section 9.1), and the physical and social space of the eye-tracking lab (see Section
9.2). To jumpstart the research process, I also present ten ideas for research, based
on actual studies with and without eye tracking (see Section 9.3.1). A list of tips
and tricks summarizes the key points from previous chapters and concludes the
book together with some additional, hands-on advice (see Section 9.3.2).
FIGURE 9.1
The Dodge photochronograph. Corneal reflection was recorded on
a slowly falling photographic plate positioned behind a 1.5 m long
enlarging camera.
(Source: Reprinted from Diefendorf, A. R., & Dodge, R., 1908. An experimental study of the ocular
reactions of the insane from photographic records, Brain, 31(3), 451–489, with permission from
Oxford University Press).
FIGURE 9.2 (a) (left) Drawing based on one healthy participant’s photographic record.
Each trial began at the bottom of the record.Vertical lines represent eye
fixations and horizontal lines represent eye movements. One dash equals
10 ms. (b) (right) Eye-movement trace for the sentence “Sometimes
victims do not tell the complete truth in court”. The trial begins at the
top of the record (Time = 0 ms).Vertical stretches of the dark black line
are fixations and horizontal stretches are saccades.
(Source: (a) Reprinted from Diefendorf, A. R., & Dodge, R., (1908). An experimental study of the
ocular reactions of the insane from photographic records, Brain, 31(3), 451–489, with permission from
Oxford University Press. (b) Engbert, R., Nuthmann, A., Richter, E. M., & Kliegl, R., 2005. SWIFT:
A dynamical model of saccade generation during reading. Psychological Review, 112(4), 777–813, APA,
reprinted with permission).
FIGURE 9.3 A scleral contact lens. Torsion coil inserted on the eye with the thin wire
exiting nasally.
(Source: Chronos Vision. Reprinted with permission).
314 Setting up an Eye-Tracking Lab
eye rotations (Duchowski, 2007; Eggert, 2007; Young & Sheena, 1975). It uses a
set of electrodes placed around the eyes to do so (see Figure 9.4). Holmqvist et
al. (2011) noted that electrooculography is “a low-cost variety of eye tracking” (p.
10), but it is not as accurate and precise as other eye-tracking systems (Wade &
Tatler, 2005).
Finally, video-based eye trackers have a wide range of applications and under-
lie most, if not all, of the language-processing research. Video-based eye tracking
has also become more affordable since it first became available in the mid-1970s.
The principle behind video-based eye trackers is to detect one or more landmarks
on the eye (e.g., pupil, limbus, iris, light reflection in the cornea; see Figure 9.5)
FIGURE 9.4
Electrooculography. Four input channels around the eye record
information about horizontal and vertical eye movements. Reprinted
with permission.
(Source: Metrovision).
FIGURE 9.6 Relative positions of the pupil and the corneal reflection for different
points of regard.
(Source: Reprinted from Duchowski, A. T., 2007. Eye tracking methodology: Theory and practice. London:
Springer, with permission of Springer Nature).
and infer the point of gaze using geometrical principles.The most commonly used
method combines pupil and corneal-reflection tracking, which is similar to
how Dodge’s photochronograph worked. While the pupil is easily recognizable
on film as the dark area in the center of the eye, the corneal reflection is caused
by a (near) infrared light beamed from the eye tracker. The light will reflect from
the front of an individual’s cornea, producing the so-called corneal reflection or
glint (see Figure 9.5), also known as the first Purkinje image (see Textbox 9.1).
Because the cornea has a higher curvature than the eyeball, the relative positions
of the corneal reflection and the pupil center will change with eye position (see
Figure 9.6). Point of gaze estimation, then, is based on a vector of the angle between
the corneal reflection and the pupil center, and other geometrical calculations.
This explains how pupil and corneal-reflection eye trackers work as a family. We
now consider the different hardware set-ups available for these eye trackers, which
are important to understand the properties of the system (Holmqvist et al., 2011).
FIGURE 9.7 A remote eye tracker with the camera inside the display monitor.
(Source: Tobii Pro TX300).
FIGURE 9.8
A remote eye tracker with the camera on the table in front of the
participant.
(Source: SR research Ltd. Eyelink 1000).
FIGURE 9.9
A remote eye tracker with the camera on the table in front of the
participant.
(Source: Applied Science Laboratories EYE-TRAC 7).
FIGURE 9.10 A head-mounted eye tracker.
(Source: SR research Ltd. Eyelink II. Photography supplied courtesy of The Center for Comparative
Psycholinguistics at University of Alberta).
FIGURE 9.13 (a) A remote eye tracker mounted in a tower above the participant’s head.
This is the same eye tracker as in Figure 9.8, but mounted differently. (b)
A hi-speed tower-mount eye tracker.
(Source: (a) SR research Ltd. Eyelink 1000. (b) Picture reproduced with kind permission from
SensoMotoric Instruments GmbH).
among the pupil and corneal-reflection eye trackers because they gently restrict
head movement (Holmqvist et al., 2011). In sum, there appears to be a tradeoff
between flexibility of use and freedom of movement, on the one hand, and data
accuracy and precision, on the other.
Some head-mounted eye trackers and some head-free, remote eye track-
ers include head tracking (i.e., tracking of head movements) as a standard or
optional feature of the equipment. Head tracking makes it possible to compensate
for small head movements when computing eye gaze position so researchers can
still obtain high-quality data without paying the cost of restricting participants’
head movements. Head trackers use either optical reflectors or magnetic sensors
320 Setting up an Eye-Tracking Lab
to measure the position of the head in space. The optical reflector is an infrared
reflector (marker) that is attached on the participant’s forehead to measure precise
head motions (see Figure 9.14).
A magnetic head-tracker (see Figure 9.15) is composed of a magnetic field
generator and a head sensor, usually mounted on the cap.The two combined yield
absolute locations and movements of the head (Stephane, 2011). When this head
position data is added to the gaze position data extracted from an eye tracker, the
generated head-eye gaze vector will allow automatized data analysis that accounts
for head movements (Holmqvist et al., 2011). However, despite such options for
rigorous data acquisition, many users prefer risking low data quality over placing
sensors or markers on participants: a trend that is evident from a low market share.
With such a variety in eye-tracking models, what type of eye tracker should you
choose? This will depend largely on your research needs, your participant popula-
tion, and your budget. Here we focus on research needs and participants, but some
tips for start-up packages and grant applications can be found in Section 9.2.1 and
in Sanz, Morales-Front, Zalbidea, and Zárate-Sández (2016). In choosing an eye
tracker, it is important to think ahead to how you are planning to use the machine:
will you conduct research on reading and writing, task-based language teaching
(TBLT), interaction and feedback, the bilingual lexicon, language assessment, com-
puter-assisted language learning (CALL), or translation processes (see Chapters 3
and 4)? Will you have small interest areas, at the word level or below, or will your
regions of analysis be larger (see Section 6.1)? Depending on your answers, the eye
tracker you need will have to be more or less sophisticated and flexible in what it
can do. Of course, both sophistication and flexibility are desirable features in an eye
tracker and manufacturers are working hard to develop machines that combine the
two, yet most present-day machines compromise on one of these two dimensions.
Mobile eye trackers are typically head-free, which makes them more suitable for
more natural tasks and research with children and clinical populations, but they
tend to be slower and less accurate and precise. As a result, we will often use them
to address more coarse-grained research questions. Static eye trackers (either head-
free or with head stabilization) provide a higher degree of accuracy and precision in
data collection and are often faster, which allows them to address more fine-grained
research questions and questions about word and sublexical processing, yet they
offer less ecological validity. Figure 9.16 represents some of the questions about
eye-tracker use that can guide researchers toward either a mobile or a static model.
In this respect, one thing to bear in mind is that by choosing an eye tracker
that suits one’s needs, researchers can prevent substantial data loss and have greater
confidence in their findings. Using an eye tracker with head stabilization even
in contexts where eye-movement data are often recorded head-free (e.g., visual
world paradigm) will not have negative consequences for your research, but there
are contexts where head-free eye-movement recordings are inappropriate and
will produce data that are unusable. In SLA and applied linguistics, research that
has typically been conducted using head-free equipment includes, but is not lim-
ited to, studies with children and clinical populations; visual world studies; testing
research; research on TBLT; reading, writing, and translation research with larger
text units; CALL studies; and research on interaction and feedback (see Chapters
3 and 4, for a synthetic review). Conversely, a static eye tracker with head stabili-
zation is appropriate, and may even be necessary, for any type of research that has
interest areas at the word or morpheme level, which typically includes reading,
writing, and translation research and sentence-processing studies, but also research
in other areas (testing, TBLT, CALL, feedback), depending on the question under
investigation.
FIGURE 9.17
Eye-movement data before and after event detection. Figure 9.17a
shows the raw data samples that were collected with a 1,000 Hz eye
tracker. Figure 9.17b shows the data output for the same trial after the
computer algorithm identified fixations (circles) and saccades (lines).
(Source: Data from Godfroid et al., 2015).
represent individual snapshots of eye location; that is, all of the times the eye was
filmed. The pegs of a 500 Hz eye tracker follow each other closely, at 2 ms inter-
vals, whereas the pegs of a 50 Hz eye tracker are more dispersed, at 20 ms inter-
vals. The 250 Hz eye tracker is somewhere in between, with pegs spaced at 4 ms
intervals. The three eye trackers are recording a participant’s eye behavior, who
at some point looks at an object. The eye fixation that the participant makes has
a beginning (onset), an end (offset), and a duration, which we would like the eye
trackers to measure as accurately as possible. However, as indicated by the squares
in Figure 9.18, there is some measurement error. The squares represent the differ-
ence between the fixation as recorded by the eye tracker and the fixation that the
participant actually made.This difference is known as the temporal sampling error.
Measurement of fixation onset is delayed until the next sample (peg), by 1 ms, 3 ms,
and 15 ms for the 500 Hz, the 250 Hz, and the 50 Hz eye tracker, respectively.
Similarly, the eye trackers do not register the fixation has ended until the next
sample, which is recorded 1 ms after the true end of fixation for the 500 Hz and
250 Hz eye trackers and 9 ms later for the 50 Hz eye tracker.Therefore, all the eye
trackers overestimate both the beginning and end of the participant fixation, but
the errors are much larger for the 50 Hz eye tracker.
Because fixations have a beginning and an end, both of which are measured by
the eye tracker, fixation duration is a so-called two-point measure (Andersson
et al., 2010). Put differently, fixation durations are bound by two points in time
and both points are filmed by the eye tracker (see Section 7.2.1.2). For another
class of measures, which are known as one-point measures (Andersson et al.,
2010), only one point is measured, usually the time of onset. One-point measures
are latency measures such as fixation latency or time until first visit (see Section
7.2.1.3), saccade latency (how long it takes to initiate a saccade), and time before
timeout. Perhaps somewhat counterintuitively, the average measurement error for
a two-point measure will be smaller than that for a one-point measure. This is
because in two-point measures, both the onset and the offset of the event are
overestimated and so the estimate of the time interval between them will be more
FIGURE 9.18 Temporal sampling frequencies of three eye trackers. Each peg represents
one data sample. Squares indicate the temporal sampling error.
(Source: Modified from Andersson et al., 2010).
326 Setting up an Eye-Tracking Lab
accurate. In contrast, nothing compensates for the sampling error associated with
the measurement of a one-point measure. As can be seen in Figure 9.18, latencies
that are measured relative to the beginning of the trial (e.g., time until first visit,
saccade latency) will be overestimated (onset of measurement is delayed until the
next snapshot of the eye), whereas any measures that reference the end of a trial
(e.g., time until timeout) will be underestimated (Andersson et al., 2010).
Using the central limit theorem, Andersson et al. (2010) demonstrated that
the measurement error for two-point durational measures will average to 0 and
the error for one-point latency measures will average to half an eye-tracker sam-
ple (e.g., 10 ms for a 50 Hz eye tracker) given sufficient data. This means that if
you have a large data set, your durational measures will be error-free and your
latency measures can be calculated accurately by subtracting or adding half an eye-
tracker sample. Of course, these guidelines only apply if you have a large data set. If
that is not the case, the sampling error is random and may obscure any true effects
in the data, especially if the effects are small. One solution is to use a faster eye
tracker. As was previously discussed, a faster eye tracker will have a smaller window
of error, and therefore fewer data will be needed to bring the error to its central
tendency (i.e., 0 for durational measures and half a sample for latency measures).
Doubling the sampling speed will bring about a fourfold reduction in data points
needed to maintain the same sampling error (Andersson et al., 2010). For example,
if 100 data points (e.g., 10 trials × 10 participants in the same experimental condi-
tion) are necessary to maintain a < 1ms temporal sampling error with a 250 Hz
eye tracker, only 25 data points will be necessary with a 500 Hz eye tracker, but
as many as 1,600 observations will be necessary with a 60 Hz eye tracker. As a
general rule, the temporal noise in the data set must be well below the magnitude
of the effect you hope to find. Therefore, small effects (e.g., a 15 ms difference
between experimental conditions) require fast eye trackers or very large amounts
of data. For large effects (> 80 ms) eye-tracking speed is not as important.
To sum up, Textbox 9.2 lists the main reasons that speed matters. Eye-tracking
speed is important—it is the property that eye-tracking manufacturers advertise
the most about their equipment (Andersson et al., 2010; Holmqvist et al., 2011;
Wang, Mulvey, Pelz, & Holmqvist, 2017) and it influences the temporal accuracy
and precision of measurement. Nevertheless, other properties that are perhaps less
well advertised also matter a great deal for data quality. These properties include
spatial precision and accuracy and will be discussed next.
Accuracy and precision are two major indices of data quality.2 When research-
ers measure a participant’s point of gaze with an eye tracker, they hope (and often
assume) that the measurement is accurate and precise, so they, as a researcher, can
be confident in the data and the results. Although the terms accuracy and precision
are often used interchangeably in everyday life, within the field of eye tracking and
other areas of measurement the two mean something different. Accuracy (or off-
set) refers to the difference between a participant’s true eye gaze position and the
eye gaze position as measured by the eye tracker. It follows from this definition
that in order to assess measurement accuracy, we must know where a participant
is really looking in addition to where the eye tracker registers the participant’s eye
gaze. One way to do this is to ask your participant; however, in practice, researchers
and manufacturers usually instruct participants (or robotic stand-ins in the form of
artificial eyes) to look at very simple visual targets on the screen such as calibration
points (i.e., dots in different corners of the screen). Precision, which is better known
as reliability in our field, refers to how consistently the same (steady) eye fixation
can be measured in one and the same position. Precision does not take into account
how far the observed data samples are from the true fixation location; in other words,
precision and accuracy are independent of each other. As shown in Figure 9.19b, it is
possible to have an instrument that is precise but inaccurate. By the same token, we
can have accurate but imprecise measures (see Figure 9.19a). This is the case when
the center of a fixation aligns closely with the visual target, but the individual data
samples are dispersed around the fixation center.The ideal scenario is to obtain data
that are both accurate and precise (see Figure 9.19c), because this will lead to the
correct identification of fixations and saccades and will safeguard the internal validity
of the study (see Figure 9.19c). Accurate and precise eye-movement data, then, are
foundational to any data analyses that researchers perform in their study.
328 Setting up an Eye-Tracking Lab
Although both accuracy and precision are important, they impact a research
study differently. The precision of measurement affects event detection; that is,
the parsing of the eye-movement recording—a huge collection of raw data sam-
ples—into a sequence of fixations and saccades (e.g., Figure 9.17). As discussed
previously in this section, an eye tracker records a very large number of individual
data points (x, y coordinates), which serve as input for an event detection algo-
rithm. If the data points are scattered because of low precision, any calculations
of the algorithm will be off and the resulting fixations and saccades may not be
accurately defined either. For example, a dispersion-based algorithm may identify
a single eye fixation as two shorter fixations because some data samples fell outside
a certain radius. A lack of precision, therefore, can be detrimental to the valid-
ity of your findings because it can change the dependent variables (fixations and
saccades) in the analysis. Conversely, the accuracy of an eye tracker is a concern
when interest areas are small, such as in reading research (Nyström, Andersson,
Holmqvist, & Van De Weijer, 2013). An example is work on different scripts,
where researchers have begun to chart the eyes’ landing position (optimal view-
ing position and preferred viewing location, see Section 2.4) in non-alphabetic
languages (e.g., for Chinese: Li, Liu, & Rayner, 2011;Tsai & McConkie, 2003;Yan,
Kliegl, Richter, Nuthmann, & Shu, 2010; Yang & McConkie, 1999; Zang, Liang,
Bai, Yan, & Liversedge, 2013). Such analyses require that words be sliced into
smaller units for analysis, such as the first and second half of a Chinese character or
the individual characters in a Japanese word.To ensure valid conclusions, accuracy
of measurement is crucial, especially along the horizontal dimension, because any
offsets will qualitatively alter researchers’ conclusions about where the eyes land
when reading text. Interestingly, eye-tracker accuracy also became a focus in the
debate on parafoveal-on-foveal effects, which centers on whether the properties
of the word to the right of fixation (word n + 1) can influence processing of the
Setting up an Eye-Tracking Lab 329
currently fixated word (see Section 2.6). Because an offset of a few letter spaces
can make the difference between a fixation on word n and a fixation on word n
+ 1, Kliegl, Nuthmann, and Engbert (2006) included only those observations in
their analyses for which the binocular eye-movement recording assigned fixations
from both eyes to the same word. Accordingly, the authors removed 23% of cases
where the two eyes landed on different words (i.e., cases of binocular disparity).
In so doing, they hoped to preempt the criticism that “any erroneous assignment
of fixations to neighboring words due to limits of spatial resolution of the eye
tracker … could generate parafoveal-on-foveal effects” (p. 18). Because parafo-
veal-on-foveal effects are important to understand how attention is allocated dur-
ing reading (see Section 2.6), this example shows how seemingly technical details
such as the eye tracker’s spatial accuracy can have far-reaching empirical and
theoretical consequences (also see Rayner, Pollatsek, Drieghe, Slattery, & Reichle,
2007; Rayner, Warren, Juhasz, & Liversedge, 2004).
In text studies such as the two preceding examples, accuracy tends to be a
greater concern along the horizontal axis. This is because the text line or lines on
the screen make it easier to spot and correct systematic vertical offsets in the data
(see Section 8.1), but similar reference points for the horizontal dimension are
missing. In other areas of research, external referents with which to align the eye
gaze data may be less clear and so using larger interest areas may be a safer option
if accuracy is a concern. In either case, it is a good idea to check beforehand
whether your eye-tracking software enables you to make manual adjustments
to the recording after the data have been collected. Always bear in mind that by
cleaning the data, you are essentially changing the original recording and so great
care is needed (see Section 8.1, for further details on data cleaning).
A myriad of factors has the potential to influence data quality. Among them,
it is possible to distinguish factors that are specific to the eye-tracking hardware
and software, factors that relate to individual participant characteristics, and envi-
ronmental factors (Holmqvist et al., 2011; Nyström et al., 2013;Wang et al., 2017).
Holmqvist et al. (2011) tested the precision of 20 different eye trackers from
several manufacturers. They reported large effects of the eye camera on preci-
sion: overall camera quality and camera resolution (i.e., number of pixels used
to capture the eye image), as well as the position of the eye in the camera image
(something the researcher can adjust). Similar findings emerged from Nyström
et al.’s (2013) study, which focused specifically on the accuracy and precision of
one high-end, tower-mount eye tracker, the SMI HiSpeed 500 Hz. Nyström
and colleagues concluded that “data quality is directly related to the quality of
the eye image and to how robustly features can be extracted from it” (p. 285).
Any factors that interfere with the eye image, such as glasses, contact lenses, pupil
diameter, eye color, and downward pointing eyelashes can be detrimental to the
accuracy and/or precision of a recording (Holmqvist et al., 2011; Nyström et al.,
2013). More generally, it is important to keep in mind that the estimates provided
in eye tracker manuals and on manufacturers’ websites represent the upper end
330 Setting up an Eye-Tracking Lab
of what accuracy and precision can be expected with a given device. These fig-
ures are obtained under optimal recording conditions, often using artificial eyes
rather than human participants and very short and simple visual tasks (Wang et al.,
2017). In an effort to promote independent comparisons of different eye tracker
models, the Eye Movements’ Researchers Association (EMRA) and the academic
network COGAIN launched the Eye Data Quality Standardization Project in
2012 (COGAIN, n.d.; Eye Movements Researchers’ Association, 2012). The goal
of this large-scale project is to obtain independent evidence for the accuracy and
precision of a large number of commercially available eye trackers. In a recent
comparison of 12 eye trackers that was conducted as a part of this initiative, Wang
et al. (2017) found that the Dual-Purkinje imaging eye tracker, the EyeLink 1000,
and the SMI HS240 yielded the most precise data with human eyes. As more
research findings are published (Holmqvist & Zemblys, 2016; Holmqvist et al.,
2015), researchers will be able to make informed decisions on what eye trackers
to purchase based on independent and unbiased quality metrics.
lifespan, and cost. For example, some companies offer free installation and on-site
training, whereas others will charge for training. Yet other companies offer free
training at their facilities, in which case the researcher only needs to cover his or
her own travel expenses. It is important that training sessions be tailored to the
specific kind of research for which the equipment is being purchased; that is, it
should be at the right level of specificity and should include hands-on experience
with programming and data collection. Consider contacting the training team in
advance to request personalized sessions (i.e., based on research topics, specific
research designs, kind of data, etc.).
As with any major purchase, it is recommended to check the duration of the
product warranty, which will vary greatly between eye trackers. Although computer
hardware will likely require updates once every four or five years, the life of the eye
tracker could (but will not necessarily) exceed this timespan. Aspiring researchers
are advised to ask existing users about any repairs or technical expenses they have
made for their eye trackers. It is worth noting that customer support and responsive-
ness beyond initial training sessions vary across companies. Some third-party provid-
ers will terminate their involvement after installation and/or training sessions except
for warranty services. Other companies offer lifelong technical support, which is
either included or contingent upon having a current software license. If a special
software license is required, the licensing fee, which may be several thousand dollars,
needs to be added to the total sales price as a recurrent cost. In terms of the level
and depth of support, there is again considerable variation. Support can range from
helpline operators answering general questions about the equipment to staff solv-
ing the specific problems that you as a researcher encountered in your experiment.
In short, it is preferable to inquire about the eye tracker’s life expectancy and the
company’s support line before making a purchase. Satisfied customers are the best
publicity; hence, novel eye-tracking researchers are encouraged to get in touch with
existing users to learn about their experiences.
Another consideration when buying an eye tracker is transportability.
Transportability is an attractive feature of eye trackers, because it opens up many
possibilities for data collection. By and large, video-based eye trackers seem to fall
into one of three categories, depending on how the eye tracker is set up. Many
eye trackers come with two computers—a participant PC and a host PC—which
make transportation more cumbersome. Laptop PCs are easier to transport than
desktop PCs, but even so, mounting the two computers, and potentially a third
stand-alone unit with the eye-tracking camera, will take time.Therefore, it mostly
makes sense to transport a two-computer set-up if one is setting up a temporary
lab in a new location, rather than using the eye tracker for a one-time data collec-
tion. Eye-tracking equipment of this kind is transported in special hard shell cases,
which are available for purchase from the manufacturer. When traveling, check if
the warranty covers (foreign) travel or special insurance is needed.
In contrast, eye-tracking cameras that attach to a laptop or are mounted on
a tripod can be transported with minimal effort, including in a purse or hand
332 Setting up an Eye-Tracking Lab
luggage. These are the third and most transportable types of eye trackers, but they
are often also the slowest and least accurate and precise equipment (Holmqvist
et al., 2011). Thus, when considering what eye tracker to purchase, the reader
needs to contemplate both the type of research to be conducted as well as the
level of detail required from the analyses.The smaller the area to be analyzed (e.g.,
an interlocutor’s face vs. their mouth or eyes; a text paragraph vs. a single word;
an image or an object in the image), the more precise and accurate the equip-
ment needs to be. By the same token, some research designs (e.g., studies that
use gaze-contingent display changes [see Section 2.3] and studies that focus on
saccadic properties) require fast eye trackers for saccades to be detected and the
experiment to work. On the other hand, users of mobile eye tracking may trade
in some precision and accuracy for greater flexibility. Mobile eye trackers have the
advantage that data can be collected outside the university, for example in schools
or hospitals, in people’s homes, and even in communities at home or abroad that
do not live close by to a university campus or research facility. In many cases,
researchers will be able to reach large groups of participants who are currently
underrepresented in the research literature and who are often more heterogene-
ous than university students.
As a recent example of in-situ research, Indrarathne and Kormos (2017, 2018)
collected data from EFL students in Sri Lanka using a 60 Hz portable eye tracker
(see Figure 9.20) which the first author carried with her on the plane (Indrarathne,
personal communication, May 19, 2016). Lew-Williams (in preparation) is col-
lecting eye-movement data in people’s homes in Chicago, using a laptop and a
regular video camera mounted on a small tripod (see Figure 9.21). The data are
hand coded afterwards, frame by frame, using custom software (also see Godfroid
& Spino, 2015). McDonough, Crowther, Kielstra, and Trofimovich (2015) tracked
interlocutors’ eye gaze as they were completing communicative tasks, using a
combination of four eye-tracking cameras (two for each speaker, one for each eye)
and two webcams that served as scene cameras (see Figure 9.22).
On-site eye-tracking research may present its own set of challenges, but with
careful planning and familiarization with the research context, it can also be very
fruitful. First, a quiet, distraction-free space with control over temperature and
lighting is desirable (also see Section 9.2.2) and may require negotiation with local
contacts. From a data quality standpoint, it is preferable to collect all the data in
the same space, rather than move the eye tracker around, because this will ensure
similar recording conditions for each participant. Even so, there may be cases
334 Setting up an Eye-Tracking Lab
where researchers need to settle for less than ideal solutions. A second considera-
tion is that on-site technical support may be unavailable or limited to general
IT questions and that the internet may not be as reliable. Thus, researchers must
have a plan for when they hit a technical roadblock and be prepared to handle
unexpected events. As a case in point, researchers may experience power cuts or
electricity surges in rural or isolated areas, which necessitate the use of a power
generator and electricity guard. By familiarizing themselves with the context of
their projects, researchers can prepare for success and contribute data from more
diverse populations to the field of SLA.
FIGURE 9.23 (a) (left) A two-PC configuration with Host PC and Display PC. (b)
(right) Two-room eye-tracking lab.
(Source: Modified after EyeLink II specifications).
camera set-up. An angular set-up, as seen in Figure 9.23a, is probably ideal for this
purpose. However, many times the host PC and display PC will be set up against
the same wall (i.e., aligned), as seen in Figure 9.24. In our experience, this can be
distracting for participants, because they may be tempted to look at the host PC
and thus disengage with their own task. As a simple solution to this problem, we
have placed a panel in between the two PCs in one of our labs (see Figure 9.24).
Another possibility is to set up the two PCs in two adjacent rooms, preferably
with a one-way mirror window in between, so the researcher can monitor the
experiment at all times (see Figure 9.23b).
Desk-mounted eye trackers should be placed on a sturdy table to avoid any
kind of vibrations. Rather than placing response devices such as button boxes,
336 Setting up an Eye-Tracking Lab
keyboards, or mouses on the table, participants should hold them in their hand or
on their lap, to avoid vibrations in the eye-movement record. No other electronic
devices such as cell phones, fans, or radios should be placed or operated near the
eye tracker; in other words, it is best for the table to be completely clear.The table
should be deep enough to allow for the recommended distance from the person’s
eyes to the eye tracker. In our two labs, that distance is 20–22 inches (50–55 cm)
for one eye tracker and 27 inches (65 cm) for the other. Check your manual
for the measures that apply to your eye tracker. The thickness of the tabletop
is another consideration if your eye-tracking system comes with head support.
Users should confirm that their table is thick enough, but not too thick, to mount
a chin- and/or forehead rest. Another crucial piece of furniture is a height-adjust-
able, stationary chair (no casters). A stationary chair will help participants remain
in position and within range of the eye-tracking cameras as they are engaging in
their task. This is especially important when working with patients or children,
with whom researchers typically do not use a chinrest or other form of head sta-
bilization. Height-adjustable, stationary chairs are not as easy to find as one might
expect and usually need to be custom made. Make sure the chair is comfortable,
because participants may be in the same position for an extended period of time.
The ideal eye-tracking lab will not have windows so the researcher can control
the lighting and viewing conditions. Incoming daylight can reflect on the com-
puter screen and cause changes in a participant’s pupil size, both of which can be
detrimental to data quality. If a windowless lab is not feasible, a few windows are
fine as long as one avoids direct sunlight near the eye tracker.This will mean using
shades and not placing the eye tracker in front of a window. Make sure to lower
the shades in your lab consistently, for each data collection, so similar lighting
conditions apply to all the sessions. The idea is to create the same lighting condi-
tions for all participants; therefore, proper illumination inside the lab is important.
Holmqvist et al. (2011) recommend using fluorescent lighting, such as neon lights,
because they produce less infrared light and do not vibrate as much as incandes-
cent light bulbs. Halogen lamps are the least recommended. A light wall paint
color will further help to make the most of the ambient light in the room. Finally,
any existing lighting timers should be deactivated, so as to prevent the light from
going out automatically during longer data collection sessions.
To produce quality data, participants need to be able to perform the experi-
ment in a quiet space free from environmental distractions. Sound-attenuated
rooms are ideal for this purpose. Another strategy is to put up Quiet Please signs
on the lab door and in the hallway to reduce noise levels and prevent people from
entering the lab when the experiment is in progress.
often limited, lab managers tend to make less money than they could in industry
positions. Therefore, retention can be a problem. What makes the job attractive,
then, is the opportunity to participate actively in research without carrying the
responsibilities that come with having an academic career.
In many programs, including our own, a graduate student takes on the role
of lab manager. While students need to go through a learning curve to master
the different aspects of programming and data collection, they tend to be highly
motivated and regard the opportunity as a gateway to a career as an eye-tracking
researcher. As a lab manager, students gain hands-on experience with all stages
of the research process—from planning and designing a study to publishing the
results—which looks great on their CV and prepares them for a future career in
the academy. Student lab managers are also a valuable resource for their fellow
students, because they can train others to conduct their own experiments or
participate in existing projects.This becomes especially important as the lab man-
ager prepare to graduate and leave the program, as passing on their accumulated
knowledge and expertise is essential to maintaining a healthy eye-tracking lab.
The lab manager position ultimately depends on the funding available at the
institution and the size of the lab. For small labs, having a full-time technician may
be challenging due to the small amount of work that he or she may have over
some periods when eye-tracking experiments are not in progress. In comparison,
a full-time lab manager position may offer more stability in the long term, assum-
ing there is research funding or financial support from within the university.
9.3 Getting Started
9.3.1 Ideas for Research
9.3.1.1 Research Idea 1: Entry-Level: Create a
Sentence-Processing Experiment
Sample study: Lim, J. H., & Christianson, K. (2015). Second language sen-
sitivity to agreement errors: Evidence from eye movements during compre-
hension and translation. Applied Psycholinguistics, 36(6), 1283–1315.
sentences for comprehension (see Section 5.4). The researcher may also decide
to use the comprehension questions as a criterion for participant inclusion in the
data analysis. For example, Lim and Christianson (2015) excluded four partici-
pants who answered less than 85% of all comprehension questions correctly from
their data analysis.
For a sample experiment with the EyeLink 1000, refer to “TextLine with
Comprehension Questions” in the Experiment Builder usage discussion forum
(https://www.srsupport.com/forums/forumdisplay.php?f=7).
Sample study: Godfroid, A., Boers, F., & Housen, A. (2013). An eye for words:
Gauging the role of attention in incidental L2 vocabulary acquisition by means
of eye- tracking. Studies in Second Language Acquisition, 35(3), 483–517.
340 Setting up an Eye-Tracking Lab
Researchers who seek to study eye movements under more naturalistic conditions
may choose to present readers with longer stretches of connected text. Such text
could range from paragraphs (Balling, 2013; Bolger & Zapata, 2011; Godfroid,
Boers, & Housen, 2013) to short stories (Pellicer-Sánchez, 2016) to multiple book
chapters (Elgort, Brysbaert, Stevens, & Van Assche, 2018; Godfroid et al., 2018),
or even a whole novel (Cop, Drieghe, & Duyck, 2015; Cop, Keuleers, Drieghe, &
Duyck, 2015). An important consideration when designing a text reading study is
how to lay out the text on the screen; that is what font size and interline spacing
to use (see Figure 9.26, Section 6.2.1, and Figure 6.14).
Text-reading experiments are useful to compare reading patterns for different
groups, for example bilinguals and monolinguals, non-native and native speakers,
or L2 learners of different proficiency levels.This type of research design also lends
itself to studying whether readers can pick up new vocabulary incidentally from
reading longer texts. When the focus is on global reading patterns, sentence-level
analyses can be informative. Sentence-level measures differ from word-based meas-
ures in that they are computed based on all the words in the sentence. Thus, a
sentence-level analysis may involve sentence reading times and fixation counts for
the whole sentence, as well as aggregate eye-movement measures such as average
fixation duration, average saccade length, probability of regression, and probability
of word skipping (Cop, Keuleers, et al., 2015).When the focus is more local, includ-
ing an analysis of individual target words, a word-level analysis makes more sense.
In this case, the researcher extracts eye-movement measures for predefined target
words only and ignores the other words in the sentence. The “big four” dependent
variables in word-level analyses are first fixation duration, gaze duration, regression
path duration, and total time (see Section 7.2.1.2), which can be supplemented
with other measures such as fixation count (see Section 7.2.1.1) and regression-in
(see Section 7.2.2). Researchers typically include both early and late eye-movement
measures to capture the time course of word processing (see Section 7.2.1.2.2).
For a sample template to run a natural reading experiment with the EyeLink
1000, refer to the “TextPage” template supplied by Experiment Builder (v.1.10)
under File -> Examples (https://www.sr-research.com/experiment-builder/).
Sample study: Feng, G., Miller, K., Shu, H., & Zhang, H. (2009).
Orthography and the development of reading processes: An eye-movement
study of Chinese and English. Child Development, 80(3), 720–735.
One of the big questions in reading research is how perceptual (lower-level) and
cognitive (higher-level) factors interact during the reading process (see Sections
2.4 and 2.5). To examine lower-level factors in reading, researchers can compare
reading of languages with different scripts. Given two groups of fluent readers,
the role of higher-level comprehension processes will be minimized (both groups
will be able to read without comprehension difficulty); however, the visual input
will be different, making lower-level, orthographic factors the likely source of
any differences in reading patterns. To examine higher-level factors, research-
ers can compare groups with different reading skills reading in the same lan-
guage. This includes comparing children at different grade levels with adults (e.g.,
Blythe, Liversedge, Joseph, White, & Rayner, 2009; Häikiö, Bertram, Hyönä, &
Niemi, 2009; Rayner, 1986), comparing less and more proficient L2 readers (an
understudied research area), and comparing L2 adult with L1 adult readers (Cop,
Drieghe, & Duyck, 2015). The rationale behind these approaches is that child L1
readers and most L2 readers are less efficient at higher-level processing (e.g., word
recognition, sentence parsing) than fluent L1 readers, but if they are all reading the
same input in the same script, lower-level factors will be controlled for. As a result,
any differences between these groups’ eye-movement records are commonly
taken to reflect higher-level effects. Although most reading research focuses on
only dimension of the reading process (either lower- or higher-level processing),
342 Setting up an Eye-Tracking Lab
here I present an ingenious study that looked at both factors combined, namely
Feng, Miller, Shu, and Zhang (2009).
Feng and colleagues examined the eye movements of English and Chinese chil-
dren and adults reading a mix of culture-specific and culturally unbiased texts in
their L1. Analyses of eye movements (fixations and saccades) revealed both higher-
level developmental and lower-level orthographic influences on reading. As for
developmental effects, it was found that adults processed text faster, with shorter
fixations, fewer refixations, and longer left-to-right eye movements, than chil-
dren. Consistent with their hypotheses, the authors also found orthographic effects
were more evident in younger than in older readers. Interestingly, English children
showed stronger orthographical effects than their age-matched Chinese peers, as
reflected primarily in saccade-related measures (English third- and fifth-graders
made shorter forward saccades than their Chinese peers). The authors concluded
that “larger developmental and orthographic differences are observed in saccade-
related measures than in fixation duration” (p. 732). What this means is that where
the eyes move, as seen in saccades, is more susceptible to external influences than
when the eyes move, as reflected in fixation durations (also see Sections 2.4 and 2.5).
Because children’s reading skills are not fully automatic yet, their reading behavior
offers a clearer window into the complex interplay of reading-related variables than
adult data (Feng et al., 2009).
Feng et al.’s study exemplifies the value of cross-linguistic reading research.
When comparing reading patterns across different languages, a few points deserve
special attention. First is the matching of reading materials.To level the playing field
between participant groups, it is necessary to make the reading materials for the dif-
ferent languages as similar as possible. Feng and his colleagues did this by using the
Chinese and English versions of two texts from a previous cross-cultural reading
study (Stevenson et al., 1990).These texts were assumedly free from cultural bias; to
ensure a familiar reading experience, they were supplemented with culture-specific
stories as well (see what follows). If parallel texts for different languages are unavail-
able, researchers may need to create their own translations. In that case, variables
that influence fixation duration need to be controlled between the two versions
of the text. These include average word frequency and, for languages that share the
same script, average word length (see Section 2.5). Cop, Drieghe, & Duyck (2015),
in a comparison of the English and Dutch versions of an Agatha Christie novel, also
matched sentences on “information density” (p. 9); that is, content word frequency
and the number of words, content words, and characters per sentence.
As stated previously, Feng et al. (2009) supplemented the two common texts
from Stevenson et al’s (1990) study with three culture-specific stories from US
and Chinese reading series, respectively. Third- and fifth-graders read different
stories (so a total of five stories each) and adults read all eight stories.When work-
ing with children, it is important that the topic and level of linguistic complexity
of the readings be age appropriate. To ensure this was the case, Feng et al. asked
teachers of each age group to evaluate the appropriateness of the materials for
Setting up an Eye-Tracking Lab 343
their learners. Finally, when comparing different text versions, the display of text
also matters. Feng and his colleagues adjusted the font sizes in both languages
(24 × 24 pixel Song font in Chinese; 7.3 pixel average letter width in English)
to match the number of text lines in both languages. As a result, an average six-
letter word in English corresponded to 1.5 Chinese characters (Feng et al., 2009).
Linking the two writing systems in this way enabled the researchers to compare
eye-movement measures across both languages. For example, saccade length was
measured in pixels, not letters or characters, and for Chinese and English adults
this yielded closely overlapping distributions in saccade length.This supported the
authors’ decision to opt for a pixel-based measure. In short, working with different
languages requires a careful selection and visual presentation of materials, as well as
a healthy dose of cultural awareness. The reward is that it can improve our under-
standing of the universal and language-specific aspects of reading considerably.
Sample study: Morales, L., Paolieri, D., Dussias, P. E., Kroff, J. R. V., Gerfen,
C., & Bajo, M. T. (2016). The gender congruency effect during bilingual spo-
ken-word recognition. Bilingualism: Language and Cognition, 19(2), 294–310.
typologically related languages; among their many similarities, both have a two-
gender system. Even so, translation equivalents (i.e., word pairs with the same
meaning in Italian and Spanish) do not always have the same grammatical gender:
compare “the cheese”, el(MASC) queso in Spanish and il(MASC) formaggio in Italian,
with “the monkey”, el(MASC) mono in Spanish but la(FEM) scimmia in Italian (see
Figure 9.27). Participants only listened to sentences in Spanish, the language they
acquired later in life. Italian was never spoken or mentioned. Furthermore, the
displays of critical trials always contained two pictures of which the referents
had the same grammatical gender in Spanish (i.e., two el nouns or two la nouns).
Therefore, Spanish grammatical gender was not an informative grammatical cue,
because it did not distinguish between the two pictures. However, in some trials,
the gender of the Spanish-Italian translation pair was incongruent, for instance
el(MASC) mono and la(FEM) scimmia, “the monkey” (see Figure 9.27, right panel).
Recall that participants never heard the Italian translations; even so, Morales and
colleagues found that the bilinguals automatically activated the Italian translations
and their genders during listening. Specifically, bilinguals looked less at the target
objects in gender-incongruent trials (where the Italian gender was causing com-
petition) than in gender-congruent trials (where gender cues in both languages
converged). Spanish monolinguals, who do not have any knowledge of the Italian
gender system, responded to both types of trials similarly.
Researchers who would like to replicate a visual world experiment could check
the IRIS database (https://www.iris-database.org) for pictures and audio stimuli
from published studies. At the time of writing this book, the materials from Trenkic,
Mirković, and Altmann (2014) and Andringa and Curcic (2015) could be down-
loaded directly from the database. These authors studied L2 learners’ knowledge
FIGURE 9.27
Example sentence pairs. The sentence always had the form Encuentra +
definite article el(MASC) or la(FEM) + target noun,“Find the [target noun]”.The
participant’s task was to click on the picture corresponding to the noun.
(Source: Morales et al., 2016).
Setting up an Eye-Tracking Lab 345
similar-proficiency readers who speak the same L1. Individual differences will be
more pronounced in mixed-proficiency, mixed-L1 groups. Individual differences
research can inform models of eye-movement control (Henderson & Luke, 2014)
and L1/L2 reading development. Researchers can study differences in word-, sen-
tence-, and text-level processing, thus adding to our understanding of micro- and
macro-level processes in reading and how they vary between individuals (Hyönä
& Nurminen, 2006). In L2 reading changes within the same individual over time
(i.e., with increasing proficiency) would be another avenue to explore.
Individual differences research on eye movements in L2 reading is rare, but the
L1 reading literature has several good examples of what such a study could look
like. For example, Hyönä and Nurminen (2006) had 44 students of the university of
Turku, Finland, read a text in L1 Finnish on endangered species. The students’ task
was to summarize the text, akin to what English for Academic Purposes students
might be asked to do in their English courses. Hyönä and Nurminen performed a
cluster analysis of the eye-movement data to group their participants into clusters
of similarly reading individuals. Their cluster analysis revealed three reader groups,
who had also emerged in an earlier study (Hyönä et al., 2002). The three groups
were slow linear readers, fast linear readers, and topic structure processors. (Topic
structure processors were readers who looked back at the subtitles and topic sen-
tences of the different sections in the text.) Finally, the participants also rated their
own reading behavior in a post-task questionnaire, which enabled the researchers to
investigate to what extent readers were conscious of their own reading style.
Submitting participants’ eye-movement data to a cluster analysis is an inter-
esting application of this statistical technique that other individual differences
researchers could follow. As is the case with most advanced statistical techniques,
cluster analysis requires a fairly large sample size. The number of participants in
the study will determine what is feasible in the analysis. Larger samples (n > 100)
make it easier to identify small groups (i.e., clusters with few members; Hair,
Black, Babin, & Anderson, 2010). Conversely, smaller sample sizes, as in Hyönä
and Nurminen’s study, are better suited to identifying relatively large clusters.
Even when researchers do not cluster analyze their eye-tracking data, they may
still want to collect data from many people to ensure their tests have good statisti-
cal power (see Section 5.5). Recall that statistical power is the likelihood one can
detect an effect or association in the data if it exists. For example, to detect a small
to moderate correlation, which is typically what you find in individual differ-
ences research, a sample size of 85 is needed (Unsworth, personal communication,
August 9, 2016). Larger samples are also more likely to adequately represent the
wide range of individual differences that exist in the population. Therefore, study
findings based on a large sample are more likely to generalize to other studies and
research contexts. Finally, the choice of reading materials, in individual differences
research as in other reading studies, requires careful consideration. Some factors to
consider are text genre (e.g., novel, newspaper article, academic article, expository
text), content familiarity, text length, and purpose for reading (e.g., reading for
Setting up an Eye-Tracking Lab 347
Goal: To study whether L2 listeners and bilinguals can use verb tense to
predict what will come next in the sentence
Sample study: Altmann, G. T., & Kamide, Y. (2007). The real-time media-
tion of visual attention by language and world knowledge: Linking anticipa-
tory (and other) eye movements to linguistic processing. Journal of Memory
and Language, 57(4), 502–518.
Sample study: McDonough, K., Crowther, D., Kielstra, P., & Trofimovich, P.
(2015). Exploring the potential relationship between eye gaze and English L2
speakers’ responses to recasts. Second Language Research, 31(4), 563–575.
In other words, if one speaker looks at the other speaker’s face during a feedback
episode or if both speakers look at each other, will this increase the chances that
the L2 learner makes a correct reformulation? The researchers found this to be the
case. Both L2 speaker eye gaze and mutual eye gaze with the interlocutor predicted
target-like responses to corrective feedback (see Section 4.2.4).
When examining eye-movement behavior during interaction, it is important
to maintain a natural and more ecologically valid setting where participants can
engage in activities without extensive control of movements. To do so, future
researchers should consider the following elements. First is the setup of the eye
trackers and scene trackers. One of the biggest differences between interaction
studies and reading- or listening-based studies is the visual display; that is, the
interlocutors will be looking at each other in interaction studies whereas read-
ing and listening studies normally take place in front of a computer screen. Such
naturalistic set-ups require the use of an additional camera, which is known as
a scene camera, to determine where in the field of vision a given participant is
looking. (In reading and listening studies, the computer screen is defined as the
visual scene.) Researchers can place the scene camera behind the interlocutors
(McDonough et al., 2015) or next to the eye trackers (McDonough et al., 2017).
Special care is needed to maintain the location of both scene cameras—one for
each interlocutor—and the distance between interlocutors and the eye tracker.
This can be done by fixing the chairs to a designated location to prevent partici-
pants from moving and also fixing the location of the scene camera. In addition,
care should be given to the selection of communicative activities. Given that
researchers have less control over the movements of participants on the spot (e.g.,
looking down), activities can be selected that would require less head movements
(e.g., looking at a poster rather than reading from a paper).
Goal: To examine the role of visual support during listening on test takers’
performance in an integrated writing task
Sample study: Cubilo, J., & Winke, P. (2013). Redefining the L2 listen-
ing construct within an integrated writing task: Considering the impacts of
visual-cue interpretation and note-taking. Language Assessment Quarterly,
10(4), 371–397.
people listen, they also rely on paralinguistic cues (e.g., gestures, facial expressions,
scenes) that can be encoded visually. As such, assessment researchers ask the ques-
tion of whether it is appropriate to include visuals (e.g., pictures or videos) in tests
that are designed to assess L2 listening ability.
Cubilo and Winke (2013) pursued this question in the context of an inte-
grated writing task; that is, “a task that requires test takers to read, listen, and
then write in response to what they have read and heard” (Educational Testing
Service, 2005). Cubilo and Winke varied the amount of visual support test takers
received for the listening portion of the test. In the audio/still-picture condition,
participants listened to a two or three-minute lecture while they saw a still picture
on the screen, whereas in the video condition, they viewed and listened to a video
recording of a different lecture. The authors found no differences on overall writ-
ing scores. (Scores on one component of the rubric, language use, however, were
significantly higher for video-based lectures.) Test takers took more notes when
they listened to audio-based lectures than when they viewed video-based lectures,
and the majority of test takers indicated on an exit questionnaire that the video
lecture was more helpful for content comprehension.
Cubilo and Winke suggested a follow-up study with eye tracking to obtain
more robust and detailed information about where and when test takers look
during video-based listening. Eye tracking could be used to quantify what per-
centage of the time test takers were looking at the screen (vs. looking down to
take notes or listening to the lecture with their eyes closed). Eye-movement data
might also help researchers understand what elements in the video cued test tak-
ers to take notes. One interesting possibility is that the lecturer’s paralinguistic
behavior (e.g., their gestures and facial expressions) and visual aids (e.g., graphs or
figures) add emphasis to the propositional content of the video and might there-
fore trigger the viewers to take more notes.
When replicating this study with eye tracking, the selection of video-based lec-
tures is important. Ideally, the researchers may want a video of a lecturer who dis-
plays a range of paralinguistic behaviors and a combination of content-related and
context-related elements in the video (see Suvorov, 2015). The researchers would
need to take into consideration that gestures may also be viewed and processed in
the parafovea and not looked at directly (see Section 2.1). Therefore, although it is
safe to state that test takers processed a speaker’s gestures when they looked at the
gestures directly, the opposite is not true.Viewers might still acquire some gestural
information from the corner of their eyes. To estimate the amount of parafoveal
viewing, researchers could calculate the size of a participant’s visual field based
on his or her seating distance from the screen (see Section 2.1). It is worth bear-
ing in mind that the quality of visual information degrades quickly away from
center vision, thus making the parafoveal processing of gestures increasingly less
likely. Alternatively, the focus of the study could be on gestures and eye-movement
behavior. In that case, the gestures themselves could be controlled to investigate
whether they trigger eye movements and how this might influence comprehension.
Setting up an Eye-Tracking Lab 351
Learning to read and write in Chinese comes with the acquisition of a new script, a
large and time-consuming endeavor, and one of the reasons Chinese ranks among
the most difficult (Category IV) foreign languages in the United States (Defense
Language Institute Foreign Language Center, n.d.). College students at American
universities are typically expected to learn 3,000 characters in a four-year Chinese
program (Shen, 2014), although actual learning rates are lower and would seem
to call for more realistic learning goals (Shen, 2014). The learning burden for
Chinese characters is heavy because of the need to memorize many new forms
and because the pronunciation of most characters cannot be derived from the
written form (Liu, Wang, & Perfetti, 2007; Wang, Perfetti, & Liu, 2003). Both fac-
tors distinguish logographic from alphabetic writing systems. As a result, learners
of Chinese must acquire three interlinked constituents of character knowledge—
their shape (orthography), sound (phonology), and meaning (semantics) (Perfetti,
Liu, & Tan, 2005). Adding to the learning burden is that each constituent may
require focused training to be internalized (Guan, Liu, Chan,Ye, & Perfetti, 2011).
Lee and Kalyuga (2011) focused on how best to organize the three sources of
word information in a vocabulary learning task: the character, the phonetic trans-
literation known as pinyin, and the English translation.The authors found that dif-
ferent presentation formats affected the learning outcomes. Australian high school
students (mostly heritage learners) who saw the character, pinyin, and translation
displayed from top to bottom learned more words than those who received the
horizontal format, with the character, pinyin, and translation arranged from left
to right (see Figure 9.28). Lee and Kalyuga attributed the higher learning rates in
the vertical format to a reduction in extraneous load, a type of cognitive load
that is detrimental to learning (see van Merriënboer & Sweller, 2005 for a review
of Cognitive Load Theory). Specifically, in the vertical format the two syllables in
pinyin were placed directly below the corresponding characters (see Figure 9.28,
b), which enabled a direct comparison of each character and their pronunciation.
352 Setting up an Eye-Tracking Lab
However, in the horizontal format, learners had to search and match the pinyin
syllables with the characters (see Figure 9.28, a), which the authors hypothesized
led to a split attention effect (Paas, Renkl, & Sweller, 2004).
Lee and Kalyuga (2011) measured the learners’ cognitive load on a nine-
point Likert scale. By adding an eye-tracking component to the study, research-
ers could obtain concurrent evidence of attention and verify the central claim
of the original article that the horizontal presentation format led to split atten-
tion. Saccadic transition patterns, which reflect eye movements between differ-
ent interest areas on the screen, would shed light on this issue (see Figure 9.29).
Furthermore, a new, adjacent format, where the characters are placed above the
FIGURE 9.29 An example of a transition matrix for wèizhi,“location”, with the Chinese
characters, pinyin, and English translation arranged vertically. Numbers
represent probabilities of where a learner’s eyes will move next.
(Source: Figure supplied by Xuehong (Stella) He, Michigan State University).
Setting up an Eye-Tracking Lab 353
pinyin and to the left of the translation (see Figure 9.28c), might prove even
more efficient than the horizontal format (Lee & Kalyuga, 2011), because in this
case the pinyin will still be matched to the characters but now the translation
will also be close.
To adapt the original study to an eye-tracking experiment, I recommend
removing the spoken component from the learning session, so as not to bias
participants to look at the pinyin. The fixation point prior to each learning
trial should be fixed (e.g., on the first character of the word) so researchers can
compare transition patterns between the different elements on the screen (i.e.,
characters, pinyin, and translation) for the three conditions. It is also a good idea
to control the distance between these elements. One way to do this is to create
image files for each trial, which one could do in Adobe Acrobat or Microsoft
PowerPoint. Next, the researcher could create interest area templates and apply
or copy the templates to the different word trials. Finally, to ensure robust and
reliable measurement of word learning, it is a good idea to expand the vocabulary
posttest with more test items. For eye tracking as for other types of experimental
research, test reliability is a very important consideration (see Section 5.5).
Sample study: Michel, M., & Smith, B. (2019). Measuring lexical alignment
during L2 chat interaction: An eye-tracking study. In S. Gass, P. Spinner, &
J. Behney (Eds.), Salience in Second Language Acquisition (pp. 244–268).
New York: Routledge.
chat partners’ tendency to reproduce each other’s phrases, a notion called lexical
alignment, and to what extent heightened, overt attention is associated with this
phenomenon.
Michel and Smith investigated L2 learners’ alignment during interactive writ-
ing tasks that involved composing an academic abstract. The authors employed
two eye trackers (one in the UK and one in the US) to measure their participants’
attention to multi-word expressions in the chat log (e.g., the last part, oral cmc and ftf
[oral computer mediated communication and face to face], L2 vocabulary learning)
that they later reproduced in their writing. The authors examined whether such
instances involved higher levels of overt attention (as measured by phrase-length-
corrected total time and fixation counts) or whether lexical alignment happened
more implicitly or whether the repeating occurrences were perhaps a coincidence
(i.e., not a true case of lexical alignment).
Six participants studying at either a British or an American university were
paired up to reconstruct an academic abstract based on bullet-pointed informa-
tion. The participants interacted via written SCMC over six sessions while eye
trackers in the two countries measured their eye movements on the computer
screen. (Each participant was sitting in front of a different computer screen, which
yielded two distinct sets of eye-movement data.) Using the programming language
R, Michel and Smith identified all instances where both interlocutors produced
the same three- to ten-word unit (e.g., of the study). If these instances received at
least one fixation, they defined them as interest areas for the eye-movement analy-
sis (58 cases in an 8,759-word text log). Next, they compared eye movements for
these possible sources of lexical alignment with baseline data, which were turns in the
conversation that did not show any lexical overlap and were also viewed at least
once (135 turns in total). The authors found that 16 of the 58 possible sources
of lexical alignment were fixated on more and for longer than the baseline texts.
They termed these identified sources for lexical alignment. The authors concluded
that although chat partners do align their lexical productions with each other (as
shown by the 16 cases of lexical overlap + heightened attention), the amount of
conscious, strategic alignment may be lower than what was previously estimated.
Michel and Smith’s innovative use of eye-tracking technology in a chat environ-
ment can inform future researchers who seek to use eye tracking in dynamically
changing contexts such as a chat log or a video. In such environments, the most
challenging and time-consuming task is to draw interest areas around the moving
targets (also see Section 6.1.3.2).This is because the location of utterances in a chat
interface changes each time participants type in more text, which earned SCMC
the alternative definition of “Spontaneously Created and Moving Constantly”
(Michel & Smith, 2017, p. 461). Likewise, target objects in a video, including cap-
tions or subtitles, will move as the video plays and the story unfolds. Current prac-
tice is to manually adjust interest areas each time the target moves on the screen;
however, this is very time-consuming. For example, let’s imagine you identified
“of the study” to be a possible source of alignment between two interlocutors and
Setting up an Eye-Tracking Lab 355
drew an interest area around this phrase. As soon as one of the chat partners entered
new text, the location of “of the study” would move up one line in the chat log. In
Michel and Smith’s study, there were about 30 turns per screen, divided roughly
equally between the two speakers (Michel, personal communication, November
10, 2016). As a result, there were about 15 changes in location before a turn dis-
appeared from the screen,4 hence about 14 manual moves of the interest area. To
do so, the researchers deactivated the original interest area around a target phrase
and drew a new interest area at the new location each time the target phrase
shifted. Coding one baseline conversation of approximately 30 turns in this way
was about two days’ worth of work (Michel, personal communication, November
10, 2016). In the future, eye-tracking research with moving interest areas will likely
become less labor intensive as new interest area detection programs such as EyeAnt
are developed (Anthony & Michel, 2016). Indeed, the eye-tracking manufacturer
SensoriMotoric Instruments already has a function in their data analysis software
that enables the automatic tracking of moving targets in videos.
FIGURE 9.30 An example of the Porta test of eye dominance, where the thumb is
aligned with a stop sign.The person closes one eye at a time to see if the
sign “jumps” to the side or stays put.
360 Setting up an Eye-Tracking Lab
object seems to “jump” to the side with one eye, but stays in place with the
other.The eye that can stay open while keeping the object aligned with your
thumb is your “dominant eye”.
Two thirds of the population are right-eye dominant (Eser, Durrie,
Schwendeman, & Stahl, 2008) but a sizable minority have a dominant
left eye (a small segment of the population have equally dominant eyes, a
quality highly sought after in sports such as archery where aim is crucial!).
Monocular recordings tend to be more accurate than binocular record-
ings, so if tracking of only one eye is an option on your machine, do that and
record the dominant eye only. If you change the eye being tracked (e.g., left
instead of right), you may need to adjust the camera position, so it captures a
good, central image of the eye. Monocular eye tracking is also a solution for
participants who have a lazy eye (amblyopia).
•• Camera set-up and calibration are key. If calibration fails, do not proceed
with data collection. Make some adjustments and recalibrate. Some partici-
pants are very eager to participate in your study and you may feel some pres-
sure to go ahead and collect data from them anyway, but the data will not
be usable. So take a deep breath, try making some of the adjustments listed
below and calibrate again.
•• Consider using the following tricks for improving calibration accuracy.
Many of these boil down to changing the angle at which the video cameras
are filming the eye. For starters, check your participant’s distance from the
screen. Make sure they are seated in front of the center of the screen and
adjust the seat height, if necessary, so the participant’s eyes align with the top
quarter of the screen. If you are recording head-free, ask participants to limit
their head movements. With head-stabilized systems, you can try adjusting
the height of the chinrest. Some chinrests also move forward and backward,
which adds more degrees of freedom to the set-up.
•• If calibration fails, it is possible that the eye tracker is mistaking something
else for the pupil. Common sources of confusion are eye makeup, down-
ward pointing eye lashes, and dark eyebrows. Some manufacturers provide
an image of the eye camera, including the areas that were detected as
the pupil and the corneal reflection. This will make it easier to diagnose the
problem. If you do not have access to the eye image from the camera, you can
still locate the problem by ruling out the following potential culprits:
•• Mascara or eyeliner? Remove with makeup remover wipes.
•• Long and/or downward pointing eyelashes? Keep an eyelash curler in
the lab.
•• Dark eyebrows? Define the search area for the eye so it excludes the
eyebrows (option available on some eye trackers only). Alternatively,
ask your participant if they would agree to cover their eyebrows with
white tape.
Setting up an Eye-Tracking Lab 361
•• How to deal with glasses. Most companies will declare that glasses do not
represent a problem for calibration, but reality can be different. If you do
have trouble calibrating participants with glasses, check that the glasses did
not slide. Glasses should be up on the bridge of the nose and clean. Then ask
the participant to tilt their glasses slightly so that the infrared light from the
eye tracker hits the glasses at a different angle. This is to prevent glare (the
reflection of the light beam from the glasses, rather than the eye). Participants
who normally wear bifocal glasses should bring another pair to the lab, if they
have one, because bifocals cannot be calibrated. Dark glasses are also difficult
as the dark frames may look like pupils to the camera. Adjust the thresholds
for pupil detection manually, if your eye tracker has this function, or ask your
participant to wear their spare pair of glasses or contact lenses. Alternatively,
you may cover the frames with light tape, but only with the participant’s
permission of course.
•• Eyes differ in shape. Some shapes are easier to calibrate than others. If you
are recording eye movements from participants who have a single eyelid, as
is common in some parts of the world, calibration may be more challenging
because the eyelid may partially cover the pupil under certain angles, such as
when the participant looks down.The same can occur with participants who
have droopy eyelids (ptosis), which is found more often in older people.Your
best chance of getting a successful calibration is to change the angle at which
the camera is filming, for instance by moving the camera a bit more closely
so it films from below.
•• To use or not to use a chinrest. Stabilizing the head during an eye-move-
ment recording will typically result in higher-quality data but comes at the
cost of reduced ecological validity. My view is that it is best to use a chinrest
if your study participants are comfortable with it. If your eye tracker does not
include a chinrest, you can buy one secondhand on the internet or have it
custom made.
an a priori quality standard for their data recordings (% track loss) and
automatically discard any participant files that do not meet the standard. All
participant files that pass muster should additionally be screened for quality
on a trial by trial basis (see Section 8.1.2).
•• Do not delay eyeballing your data. With data collection still fresh in your
mind and your logbook by your side, it will be easier to detect any anomalies in
the recording, such as offsets, excessive artifacts, or track loss (see Section 8.1.2).
When faced with a noisy recording, your only safe option is to exclude the data
from further analysis. Data adjustments (i.e., moving the eye gaze data or the
interest areas manually) only makes sense when there is a clear external referent
(e.g., a line of text) with which the eye gaze data can be aligned (see Section
8.1.3). Even then, cleaning should be used sparingly and with care, because you
are essentially making changes to your original recording.
•• Defining interest areas. Whenever possible, interest areas should be defined
before data collection because this is more objective and saves work in the
long run (see Section 6.1). However, with some types of experiments (e.g.,
writing research, classroom-based or interaction research) you may not know
beforehand what the region of analysis will be. In either case, your interest
areas should be conceived as semantic or thematic units that relate directly to
your research questions. Do not be be tempted to change your interest areas a
posteriori because you believe it would be favorable to your results (Holmqvist
et al., 2011). When you draw your interest areas, make sure to include a
margin (e.g., whitespace around a picture or above and below a text line; use
double-spacing). This will ensure that eye-movement data with a slight offset
will still be assigned to the proper target region.
•• Be aware that dynamic (moving) interest areas (such as an interlocu-
tor’s face in natural discourse, characters in movies, websites, written chat
applications, and anything where participants can scroll on the screen) are
time-consuming to draw. To a large extent, dynamic interest areas are still
coded manually today, which is an extremely time-consuming and monoto-
nous process. This situation will likely improve as new interest area detection
programs such as EyeAnt are developed (Anthony & Michel, 2016). In the
meantime, it is good to weigh the costs and benefits of a study design with
dynamically changing interest areas (see Section 9.3.1, research idea #10). In
some cases, a simple fix such as disabling the scrolling function on a website
can save many tens of hours of manual coding work afterwards.
Lastly, the more experiential knowledge that one needs for conducting eye-
tracking research comes from practice. Experienced researchers know that man-
aging the bits and bobs of eye-tracking research takes time, and it may indeed be
a never-ending process as researchers discover new horizons that require innova-
tive solutions. I expect that innovation will only increase as growing numbers of
L2 and bilingualism researchers incorporate eye tracking into their research and
Setting up an Eye-Tracking Lab 363
Notes
1 There are separate algorithms for detecting optic artifacts such as blinks and rarer eye
behavior such as smooth pursuit, which are not included in this discussion.
2 In the discussion that follows, we refer to the spatial accuracy and precision of a meas-
urement. This supplements considerations about temporal accuracy and precision that
relate to when an event is detected in time versus when it occurred (see Figure 8.18).
3 This idea was submitted by Xuehong (Stella) He, a PhD candidate in Second Language
Studies at Michigan State University.
4 Michel and Smith only studied lexical alignment with the other interlocutor’s text, not
one’s own text.
REFERENCES
Baltova, I. (1994).The impact of video on the comprehension skills of core French students.
Canadian Modern Language Review, 50(3), 507–531. doi:10.3138/cmlr.50.3.507
Bar, M. (2007).The proactive brain: Using analogies and associations to generate predictions.
Trends in Cognitive Sciences, 11(7), 280–289. doi:10.1016/j.tics.2007.05.005
Bar, M. (2009). The proactive brain: Memory for predictions. Philosophical Transactions of the
Royal Society B: Biological Sciences, 364(1521), 1235–1243. doi:10.1098/rstb.2008.0310
Barnes, G. R. (2011). Ocular pursuit movements. In S. P. Liversedge, I. Gilchrist, & S.
Everling (Eds.), The Oxford handbook of eye movements (pp. 115–132). Oxford University
Press. doi:10.1093/oxfordhb/9780199539789.013.0007
Barnett,V., & Lewis, T. (1994). Outliers in statistical data (3rd ed.). New York: Wiley.
Barr, D. J. (2008).Analyzing ‘visual world’ eyetracking data using multilevel logistic regression.
Journal of Memory and Language, 59(4), 457–474. doi:10.1016/j.jml.2007.09.002
Barr, D. J., Gann,T. M., & Pierce, R. S. (2011). Anticipatory baseline effects and information
integration in visual world studies. Acta Psychologica, 137(2), 201–207. doi:10.1016/j.
actpsy.2010.09.011
Barr, D. J., Levy, R., Scheepers, C., & Tily, H. J. (2013). Random effects structure for
confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language, 68(3),
255–278. doi:10.1016/j.jml.2012.11.001
Bates, D., Kliegl, R., Vasishth, S., & Baayen, H. (2015). Parsimonious mixed models.
Retrieved from http://arxiv.org/abs/1506.04967
Bates, E., D’Amico, S., Jacobsen, T., Székely, A., Andonova, E., Devescovi, A., … Tzeng, O.
(2003). Timed picture naming in seven languages. Psychonomic Bulletin & Review, 10(2),
344–380. doi:10.3758/BF03196494
*Bax, S. (2013).The cognitive processing of candidates during reading tests: Evidence from
eye-tracking. Language Testing, 30(4), 441–465. doi:10.1177/0265532212473244
Bell, B. A., Morgan, G. B., Schoeneberger, J. A., Loudermilk, B. L., Kromrey, J. D., & Ferron,
J. M. (2010). Dancing the sample size limbo with mixed models: How low can you
go? SAS Global Forum Proceedings. Retrieved from http://support.sas.com/resources/
papers/proceedings10/197-2010.pdf
Bertera, J. H., & Rayner, K. (2000). Eye movements and the span of the effective stimulus
in visual search. Perception and Psychophysics, 62(3), 576–585. doi:10.3758/BF03212109
Bialystok, E. (2015). Bilingualism and the development of executive function: The role of
attention. Child Development Perspectives, 9(2), 117–121. doi:10.1111/cdep.12116
Binda, P., Cicchini, G. M., Burr, D. C., & Morrone, M. C. (2009). Spatiotemporal distortions
of visual perception at the time of saccades. Journal of Neuroscience, 29(42), 13147–13157.
doi:10.1523/JNEUROSCI.3723-09.2009
*Bisson, M.,Van Heuven, W. J. B., Conklin, K., & Tunney, R. J. (2014). Processing of native
and foreign language subtitles in films: An eye tracking study. Applied Psycholinguistics,
35(2), 399–418. doi:10.1017/S0142716412000434
Blumenfeld, H. K., & Marian, V. (2007). Constraints on parallel activation in bilingual
spoken language processing: Examining proficiency and lexical status using eye-tracking.
Language and Cognitive Processes, 22(5), 633–660. doi:10.1080/01690960601000746
Blumenfeld,H.K.,& Marian,V.(2011).Bilingualism influences inhibitory control in auditory
comprehension. Cognition, 118(2), 245–257. doi:10.1016/j.cognition.2010.10.012
Blythe, H. I. (2014). Developmental changes in eye movements and visual information
encoding associated with learning to read. Current Directions in Psychological Science,
23(3), 201–207. doi:10.1177/0963721414530145
References 367
Blythe, H. I., & Joseph, H. S. S. L. (2011). Children’s eye movements during reading.
In S. P. Liversedge, I. Gilchrist, & S. Everling (Eds.), The Oxford handbook of
eye movements (pp. 643–662). Oxford University Press. doi:10.1093/oxfor
dhb/9780199539789.013.0036
Blythe, H. I., Liversedge, S. P., Joseph, H. S., White, S. J., & Rayner, K. (2009). Visual
information capture during fixations in reading for children and adults. Vision Research,
49(12), 1583–1591. doi:10.1016/j.visres.2009.03.015
Boers, F., & Lindstromberg, S. (2009). Optimizing a lexical approach to instructed second language
acquisition. London, UK: Palgrave Macmillan. doi:10.1057/9780230245006_1
Boers, F., & Lindstromberg, S. (2012). Experimental and intervention studies on formulaic
sequences in a second language. Annual Review of Applied Linguistics, 32, 83–110.
doi:10.1017/S0267190512000050
Boersma, P., & Weenink, D. (2018). Praat: doing phonetics by computer (Version 6.0) [Computer
software]. Amsterdam, the Netherlands. Retrieved June 1, 2018 from http://www.
praat.org
Bojko, A. (2009). Informative or misleading? Heatmaps deconstructed. In J. A. Jacko
(Ed.), Human-computer interaction (pp. 30–39). Berlin: Springer. doi:10.1007/978-3-
642-02574-7_4
*Bolger, P., & Zapata, G. (2011). Semantic categories and context in L2 vocabulary learning.
Language Learning, 61(2), 614–646. doi:10.1111/j.1467-9922.2010.00624.x
Boston, M. F., Hale, J., Kliegl, R., Patil, U., & Vasishth, S. (2008). Parsing costs as predictors
of reading difficulty: An evaluation using the Potsdam Sentence Corpus. Journal of Eye
Movement Research, 2(1), 1–12. doi:10.16910/jemr.2.1.1
Bowles, M. A. (2010). The think-aloud controversy in second language research. New York:
Routledge.
*Boxell, O., & Felser, C. (2017). Sensitivity to parasitic gaps inside subject islands in native
and non-native sentence processing. Bilingualism: Language and Cognition, 20(3), 494–
511. doi:10.1017/S1366728915000942
Braze, D. (2018). Researcher contributed eye tracking tools. Retrieved from https://
github.com/davebraze/FDBeye/wiki/Researcher-Contributed-Eye-Tracking-Tools
Brône, G., & Oben, B. (Eds.). (2018). Eye-tracking in interaction: Studies on the role of eye gaze
in dialogue (Vol. 10). New York: John Benjamins.
Brysbaert, M., & Stevens, M. (2018). Power analysis and effect size in mixed effects models:
A tutorial. Journal of Cognition, 1(1), 9. doi:10.5334/joc.10
Bultena, S., Dijkstra,T., & van Hell, J. G. (2014). Cognate effects in sentence context depend
on word class, L2 proficiency, and task. The Quarterly Journal of Experimental Psychology,
67(6), 1214–1241. doi:10.1080/17470218.2013.853090
Burnat, K. (2015). Are visual peripheries forever young? Neural Plasticity, 2015, 307929.
doi:10.1155/2015/307929.
Cameron, A. C., & Trivedi, P. K. (1998). Regression analysis of count data. Cambridge, MA:
Cambridge University Press.
Canseco-Gonzalez, E., Brehm, L., Brick, C. A., Brown-Schmidt, S., Fischer, K., &
Wagner, K. (2010). Carpet or Cárcel: The effect of age of acquisition and language
mode on bilingual lexical access. Language and Cognitive Processes, 25, 669–705.
doi:10.1080/01690960903474912
Carrol, G., & Conklin, K. (2015). Eye-tracking multi-word units: Some methodological
questions. Journal of Eye Movement Research, 7(5), 1–11. doi:10.16910/jemr.7.5.5
368 References
*Carrol, G., & Conklin, K. (2017). Cross language lexical priming extends to formulaic
units: Evidence from eye-tracking suggests that this idea “has legs.” Bilingualism: Language
and Cognition, 20(2), 299–317. doi:10.1017/S1366728915000103
*Carrol, G., Conklin, K., & Gyllstad, H. (2016). Found in translation: The influence of
the L1 on the reading of idioms in a L2. Studies in Second Language Acquisition, 38(3),
403–443. doi:10.1017/S0272263115000492
Chambers, C. G., Tanenhaus, M. K., Eberhard, K. M., Filip, H., & Carlson, G. N. (2002).
Circumscribing referential domains during real-time language comprehension. Journal
of Memory and Language, 47(1), 30–49. doi:10.1006/jmla.2001.2832
*Chamorro, G., Sorace, A., & Sturt, P. (2016). What is the source of L1 attrition? The effect
of recent L1 re-exposure on Spanish speakers under L1 attrition. Bilingualism: Language
and Cognition, 19(3), 520–532. doi:10.1017/S1366728915000152
Chen, H. C., & Tang, C. K. (1998). The effective visual field in reading Chinese. In C. K.
Leong & K. Tamaoka (Eds.), Cognitive processing of the Chinese and Japanese languages (pp.
91–100). Dordrecht, the Netherlands: Springer. doi:10.1007/978-94-015-9161-4_5
Chepyshko, R. (2018). Locative verbs in L2 learning: A modular processing perspective (Doctoral
dissertation). Retrieved from ProQuest Dissertations and Theses Global. (10827321).
Choi, J. E. S., Vaswani, P. A., & Shadmehr, R. (2014). Vigor of movements and the cost
of time in decision making. Journal of Neuroscience, 34(4), 1212–1223. doi:10.1523/
JNEUROSCI.2798-13.2014
*Choi, S. (2017). Processing and learning of enhanced English collocations: An eye
movement study. Language Teaching Research, 21(3), 403–426. doi:10.1177/136216881
6653271
Choi, S.Y., & Koh, S. (2009). The perceptual span during reading Korean sentences. Korean
Journal of Cognitive Science, 20(4), 573–601. doi:10.19066/cogsci.2009.20.4.008
Choi, W., Lowder, M. W., Ferreira, F., & Henderson, J. M. (2015). Individual differences
in the perceptual span during reading: Evidence from the moving window technique.
Attention, Perception, & Psychophysics, 77(7), 2463–2475.
Chukharev-Hudilainen, E. Saricaoglu, A., Torrance, M., & Feng, H. (2019). Combined
deployable keystroke logging and eyetracking for investigating L2 writing fluency. Studies
in Second Language Acquisition, 41(3), 583–604. doi:10.1017/S027226311900007X
Cicchini, G. M., Binda, P., Burr, D. C., & Morrone, M. C. (2013). Transient spatiotopic
integration across saccadic eye movements mediates visual stability. Journal of
Neurophysiology, 109(4), 1117–1125. doi:10.1152/jn.00478.2012
*Cintrón-Valentín, M., & Ellis, N. C. (2015). Exploring the interface: Explicit focus-on-
form instruction and learned attentional biases in L2 Latin. Studies in Second Language
Acquisition, 37(2), 197–235. doi:10.1017/S0272263115000029
Cintrón-Valentín, M. C., & Ellis, N. C. (2016). Salience in second language acquisition:
Physical form, learner attention, and instructional focus. Frontiers in Psychology, 7, 1–21.
doi:10.3389/fpsyg.2016.01284
Clahsen, H. (2008). Behavioral methods for investigating morphological and syntactic
processing in children. In I. A. Sekerina, E. M. Fernández, & H. Clahsen (Eds.),
Developmental psycholinguistics: On-line methods in children’s language processing (pp. 1–27).
Amsterdam, the Netherlands/Philadelphia, PA: John Benjamins. Retrieved from http://
www.uni-potsdam.de/fileadmin/projects/prim/papers/methods07.pdf
*Clahsen, H., Balkhair, L., Schutter, J., & Cunnings, I. (2013). The time course of
morphological processing in a second language. Second Language Research, 29(1), 7–31.
doi:10.1177/0267658312464970
Clahsen, H., & Felser, C. (2006a). Continuity and shallow structures in language processing.
Applied Psycholinguistics, 27(1), 107–126. doi:10.1017/S0142716406060206
References 369
Clahsen, H., & Felser, C. (2006b). Grammatical processing in language learners. Applied
Psycholinguistics, 27(1), 3–42. doi:10.1017/S0142716406060024
Clark, A. (2013). Whatever next? Predictive brains, situated agents, and the future
of cognitive science. Behavioral and Brain Sciences, 36(3), 181–204. doi:10.1017/
S0140525X12000477
Clark, H. H. (1973). The language-as-fixed-effect fallacy: A critique of language statistics
in psychological research. Journal of Verbal Learning and Verbal Behavior, 12(4), 335–359.
doi:10.1016/S0022-5371(73)80014-3
Clifton, C. J., & Staub, A. (2011). Syntactic influences on eye movements during
reading. In S. Liversedge, I. Gilchrist, & S. Everling (Eds.), The Oxford handbook of eye
movements (pp. 895–909). Oxford, UK: Oxford University Press. doi:10.1093/oxfor
dhb/9780199539789.013.0049
Clifton, C. J., Staub, A., & Rayner, K. (2007). Eye movements in reading words and
sentences. In R. P. G.Van Gompel, M. H. Fischer, W. S. Murray, & R. L. Hill (Eds.), Eye
movements: A window on mind and brain (pp. 341–372). Oxford, UK: Elsevier.
COGAIN. (n.d.). The COGAIN Association. Retrieved from http://www.cogain.org/
home
Cohen, A. D. (2006). The coming of age of research on test-taking strategies. Language
Assessment Quarterly, 3(4), 307–331. doi:10.1080/15434300701333129
Cohen,A. L. (2013). Software for the automatic correction of recorded eye fixation locations
in reading experiments. Behavior Research Methods, 45(3), 679–683. doi:10.3758/
s13428-012-0280-3
Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Hillsdale, NJ: Erlbaum.
Conklin, K., & Pellicer-Sánchez, A. (2016). Using eye-tracking in applied linguistics
and second language research. Second Language Research, 32(3), 453–467.
doi:10.1177/0267658316637401
Conklin, K., Pellicer-Sánchez, A., & Carrol, G. (2018). Eye-tracking. A guide for applied
linguistics research. Cambridge, MA: Cambridge University Press.
Cooper, R. (1974). The control of eye fixation by the meaning of spoken language: A
new methodology for the real-time investigation of speech perception, memory,
and language processing. Cognitive Psychology, 6(1), 84–107. https://psycnet.apa.org/
doi/10.1016/0010-0285(74)90005-X
*Cop, U., Dirix, N., Van Assche, E., Drieghe, D., & Duyck, W. (2017). Reading a book
in one or two languages? An eye movement study of cognate facilitation in L1
and L2 reading. Bilingualism: Language and Cognition, 20(4), 747–769. doi:10.1017/
S1366728916000213
Cop, U., Drieghe, D., & Duyck, W. (2015). Eye movement patterns in natural reading:
A comparison of monolingual and bilingual reading of a novel. PLOS ONE, 10(8),
e0134008. doi:10.1371/journal.pone.0134008
Cop, U., Keuleers, E., Drieghe, D., & Duyck, W. (2015). Frequency effects in monolingual
and bilingual natural reading. Psychonomic Bulletin & Review, 22(5), 1216–1234.
doi:10.3758/s13423-015-0819-2
Corbetta, M. (1998). Frontoparietal cortical networks for directing attention and the eye to
visual locations: Identical, independent, or overlapping neural systems? Proceedings of the
National Academy of Sciences of the United States of America, 95(3), 831–838. doi:10.1073/
pnas.95.3.831
Corbetta, M., & Shulman, G. L. (1998). Human cortical mechanisms of visual attention
during orienting and search. Philosophical Transactions of the Royal Society B: Biological
Sciences, 353(1373), 1353–1362. doi:10.1098/rstb.1998.0289
370 References
Cubilo, J., & Winke, P. (2013). Redefining the L2 listening construct within an integrated
writing task: Considering the impacts of visual-cue interpretation and note-taking.
Language Assessment Quarterly, 10(4), 371–397. doi:10.1080/15434303.2013.824972
Cuetos, F., & Mitchell, D. C. (1988). Cross-linguistic differences in parsing: Restrictions
on the use of the Late Closure strategy in Spanish. Cognition, 30(1), 73–105.
doi:10.1016/0010-0277(88)90004-2
Cunnings, I. (2012). An overview of mixed-effects statistical models for second language
researchers. Second Language Research, 28(3), 369–382. doi:10.1177/0267658312443651
Cunnings, I., & Finlayson, I. (2015). Mixed effects modeling and longitudinal data analysis.
In L. Plonsky (Ed.), Advancing quantitative methods in second language research (pp. 159–
181). New York: Routledge.
*Cunnings, I., Fotiadou, G., & Tsimpli, I. (2017). Anaphora resolution and reanalysis
during L2 sentence processing. Studies in Second Language Acquisition, 39(4), 621–652.
doi:10.1017/S0272263116000292
Dahan, D., Magnuson, J. S., & Tanenhaus, M. K. (2001).Time course of frequency effects in
spoken-word recognition: Evidence from eye movements. Cognitive Psychology, 42(4),
317–367. doi:10.1006/cogp.2001.0750
Dahan, D., Swingley, D., Tanenhaus, M. K., & Magnuson, J. S. (2000). Linguistic gender and
spoken-word recognition in French. Journal of Memory and Language, 42(4), 465–480.
doi:10.1006/jmla.1999.2688
Dahan, D., & Tanenhaus, M. K. (2005). Looking at the rope when looking for the snake:
Conceptually mediated eye movements during spoken-word recognition. Psychonomic
Bulletin & Review, 12(3), 453–459. doi:10.3758/BF03193787
Dahan, D., Tanenhaus, M. K., & Pier Salverda, A. (2007). The influence of visual processing
on phonetically driven saccades in the “visual world” paradigm. In R. Van Gompel,
M. Fischer, W. Murry, & R. Hill (Eds.), Eye movements: A window on mind and brain (pp.
471–486). Oxford, UK: Elsevier. doi:10.1016/B978-008044980-7/50023-9
Dambacher, M., & Kliegl, R. (2007). Synchronizing timelines: Relations between fixation
durations and N400 amplitudes during sentence reading. Brain Research, 1155(1), 147–
162. doi:10.1016/j.brainres.2007.04.027
De Beugher, S., Brône, G., & Goedemé, T. (2014). Automatic analysis of in-the-wild
mobile eye-tracking experiments using object, face and person detection. International
Conference on Computer Vision Theory and Applications (VISAPP), Lisbon, Portugal,
625–633.
De Bot, K., Paribakht, T. S., & Wesche, M. B. (1997). Toward a lexical processing model for
the study of second language vocabulary acquisition: Evidence from ESL reading. Studies
in Second Language Acquisition, 19(3), 309–329. doi:10.1017/S0272263197003021
Defense Language Institute Foreign Language Center. (n.d.). Languages taught at DLIFLC
and duration of courses. Retrieved from http://www.dliflc.edu/home/about/
languages-at-dliflc/
De León Rodríguez, D., Buetler, K. A., Eggenberger, N., Preisig, B. C., Schumacher, R.,
Laganaro, M., … Müri, R. M. (2016).The modulation of reading strategies by language
opacity in early bilinguals: An eye movement study. Bilingualism: Language and Cognition,
19(3), 567–577. doi:10.1017/S1366728915000310
DeLong, K. A., Urbach, T. P., & Kutas, M. (2005). Probabilistic word pre-activation during
language comprehension inferred from electrical brain activity. Nature Neuroscience, 8(8),
1117–1121. doi:10.1038/nn1504
References 371
Deutsch, A., & Bentin, S. (2001). Syntactic and semantic factors in processing gender
agreement in Hebrew: Evidence from ERPs and eye movements. Journal of Memory and
Language, 45(2), 200–224. doi:10.1006/jmla.2000.2768
Deutsch, A., & Rayner, K. (1999). Initial fixation location effects in reading Hebrew words.
Language and Cognitive Processes, 14(4), 393–421. doi:10.1080/016909699386284
Diefendorf, A. R., & Dodge, R. (1908). An experimental study of the ocular reactions
of the insane from photographic records. Brain, 31(3), 451–489. doi:10.1093/
brain/31.3.451
Dienes, Z. (2014). Using Bayes to get the most out of non-significant results. Frontiers in
Psychology, 5, 1–17. doi:10.3389/fpsyg.2014.00781
*Dijkgraaf, A., Hartsuiker, R. J., & Duyck, W. (2017). Predicting upcoming information
in native-language and non-native-language auditory word recognition. Bilingualism:
Language and Cognition, 20(5), 917–930. doi:10.1017/S1366728916000547
Dimigen, O., Kliegl, R., & Sommer, W. (2012). Trans-saccadic parafoveal preview benefits
in fluent reading: A study with fixation-related brain potentials. NeuroImage, 62(1), 381–
393. doi:10.1016/j.neuroimage.2012.04.006
Dimigen, O., Sommer,W., Hohlfeld, A., Jacobs, A. M., & Kliegl, R. (2011). Coregistration of
eye movements and EEG in natural reading: Analyses and review. Journal of Experimental
Psychology: General, 140(4), 552–572. doi:10.1037/a0023885
Dink, J.W., & Ferguson, B. (2015). eyetrackingR: An R library for eye-tracking data analysis.
Retrieved from www.eyetrackingr.com
Dodge, R. (1903). Five types of eye movement in the horizontal meridian plane of the field
of regard. American Journal of Physiology - Legacy Content, 8(4), 307–329. doi:10.1152/
ajplegacy.1903.8.4.307
Dodge, R. (1904). The participation of the eye movements in the visual perception of
motion. Psychological Review, 11(1), 1–14. doi:10.1037/h0071641
Dodge, R., & Cline,T. S. (1901).The angle velocity of eye movements. Psychological Review,
8, 145–157. doi:10.1037/h0076100
Dolgunsöz, E., & Sarıçoban, A. (2016). CEFR and eye movement characteristics during
EFL reading: The case of intermediate readers. Journal of Language and Linguistic Studies,
12(2), 238–252.
Drasdo, N., & Fowler, C. W. (1974). Non-linear projection of the retinal image in a wide-
angle schematic eye. The British Journal of Ophthalmology, 58(8), 709.
Drieghe, D. (2008). Foveal processing and word skipping during reading. Psychonomic
Bulletin & Review, 15(4), 856–860. doi:10.3758/PBR.15.4.856
Drieghe, D., Rayner, K., & Pollatsek, A. (2008). Mislocated fixations can account for
parafoveal-on-foveal effects in eye movements during reading. The Quarterly Journal of
Experimental Psychology, 61(8), 1239–1249. doi:10.1080/17470210701467953
Duchowski, A. T. (2002). A breadth-first survey of eye-tracking applications. Behavior
Research Methods, Instruments, & Computers, 34(4), 455–470. doi:10.3758/BF03195475
Duchowski, A.T. (2007). Eye tracking methodology:Theory and practice. London, UK: Springer.
Duñabeitia, J. A., Avilés, A., Afonso, O., Scheepers, C., & Carreiras, M. (2009). Qualitative
differences in the representation of abstract versus concrete words: Evidence
from the visual-world paradigm. Cognition, 110(2), 284–292. doi:10.1016/j.
cognition.2008.11.012
Dussias,P.E.(2010).Uses of eye-tracking data in second language sentence processing research.
Annual Review of Applied Linguistics, 30, 149–166. doi:10.1017/S026719051000005X
372 References
*Dussias, P. E., & Sagarra, N. (2007). The effect of exposure on syntactic parsing in Spanish
– English bilinguals. Bilingualism: Language and Cognition, 10(1), 101–116. doi:10.1017/
S1366728906002847
*Dussias, P. E., Valdés Kroff, J. R., Guzzardo Tamargo, R. E., & Gerfen, C. (2013). When
gender and looking go hand in hand. Studies in Second Language Acquisition, 35(2), 353–
387. doi:10.1017/S0272263112000915
d’Ydewalle, G., & De Bruycker, W. (2007). Eye movements of children and
adults while reading television subtitles. European Psychologist, 12(3), 196–205.
doi:10.1027/1016-9040.12.3.196
d’Ydewalle, G., & Gielen, I. (1992). Attention allocation with overlapping sound, image,
and text. In K. Rayner (Ed.), Eye movements and visual cognition (pp. 415–427). New York:
Springer. doi:10.1007/978-1-4612-2852-3_25
d’Ydewalle, G., Praet, C., Verfaillie, K., & Van Rensbergen, J. (1991). Watching subtitled
television. Communication Research, 18(5), 650–666. doi:10.1177/009365091018005005
Eberhard, K. M., Spivey-Knowlton, M. J., Sedivy, J. C., & Tanenhaus, M. K. (1995). Eye
movements as a window into real-time spoken language comprehension in natural
contexts. Journal of Psycholinguistic Research, 24(6), 409–436. doi:10.1007/BF02143160
Educational Testing Service. (2005). TOEFL® iBT writing sample responses. Retrieved
March 30, 2017 from https://www.ets.org/Media/Tests/TOEFL/pdf/ibt_writing_
sample_responses.pdf
Eggert, T. (2007). Eye movement recordings: Methods. Developments in Ophthalmology, 40,
15–34. doi:10.1159/000100347
*Elgort, I., Brysbaert, M., Stevens, M., & Van Assche, E. (2018). Contextual word learning
during reading in a second language: An eye-movement study. Studies in Second Language
Acquisition, 40(2), 341–366. doi:10.1017/S0272263117000109
Ellis, N. C. (2006). Usage-based and form-focused language acquisition: The associative
learning of constructions, learned attention, and the limited L2 endstate. In P. Robinson
& N. C. Ellis (Eds.), Handbook of cognitive linguistics and second language acquisition (pp.
372–405). New York: Routledge.
*Ellis, N. C., Hafeez, K., Martin, K. I., Chen, L., Boland, J., & Sagarra, N. (2014). An
eye-tracking study of learned attention in second language acquisition. Applied
Psycholinguistics, 35(3), 547–579. doi:10.1017/S0142716412000501
Ellis, R. (2005). Measuring implicit and explicit knowledge of a second language: A
psychometric study. Studies in Second Language Acquisition, 27(2), 141–172. doi:10.1017/
S0272263105050096
Engbert, R. (2006). Microsaccades: A microcosm for research on oculomotor control,
attention, and visual perception. Progress in Brain Research, 154, 177–192. doi:10.1016/
S0079-6123(06)54009-9
Engbert, R., & Kliegl, R. (2011). Parallel graded attention models of reading.
In S. P. Liversedge, I. Gilchrist, & S. Everling (Eds.), The Oxford handbook of
eye movements (pp. 787–800). Oxford University Press. doi:10.1093/oxfor
dhb/9780199539789.013.0043
Engbert, R., Nuthmann, A., Richter, E. M., & Kliegl, R. (2005). SWIFT: A dynamical
model of saccade generation during reading. Psychological Review, 112(4), 777–813.
doi:10.1037/0033-295X.112.4.777
Ericsson, K. A., & Simon, H. A. (1993). Protocol analysis: Verbal reports as data (Rev. ed.).
Cambridge, MA: The MIT Press.
References 373
Eser, I., Durrie, D. S., Schwendeman, F., & Stahl, J. E. (2008). Association between ocular
dominance and refraction. Journal of Refractive Surgery, 24(7), 685–689.
Eye Movements Researchers’ Association. (2012). Eye data quality. Retrieved from http://
www.eye-movements.org/eye_data_quality
*Felser, C., & Cunnings, I. (2012). Processing reflexives in a second language: The timing
of structural and discourse-level constraints. Applied Psycholinguistics, 33(3), 571–603.
doi:10.1017/S0142716411000488
*Felser, C., Cunnings, I., Batterham, C., & Clahsen, H. (2012). The timing of island effects
in nonnative sentence processing. Studies in Second Language Acquisition, 34(1), 67–98.
doi:10.1017/S0272263111000507
Felser, C., Roberts, L., Marinis, T., & Gross, R. (2003). The processing of ambiguous
sentences by first and second language learners of English. Applied Psycholinguistics,
24(3), 453–489. doi:10.1017/S0142716403000237
*Felser, C., Sato, M., & Bertenshaw, N. (2009).The on-line application of binding Principle
A in English as a second language. Bilingualism: Language and Cognition, 12(4), 485–502.
doi:10.1017/S1366728909990228
Fender, M. (2003). English word recognition and word integration skills of native Arabic-
and Japanese-speaking learners of English as a second language. Applied Psycholinguistics,
24(2), 289–315. doi:10.1017/S014271640300016X
Feng, G. (2006). Eye movements as time-series random variables: A stochastic model of eye
movement control in reading. Cognitive Systems Research, 7(1), 70–95. doi:10.1016/j.
cogsys.2005.07.004
Feng, G., Miller, K., Shu, H., & Zhang, H. (2009). Orthography and the development of
reading processes: An eye-movement study of Chinese and English. Child Development,
80(3), 720–735. doi:10.1111/j.1467-8624.2009.01293.x
Ferreira, F., & Clifton, J. R. (1986). The independence of syntactic processing. Journal of
Memory and Language, 25, 348–368. doi:10.1016/0749-596X(86)90006-9
Ferreira, F., Foucart, A., & Engelhardt, P. E. (2013). Language processing in the visual world:
Effects of preview, visual complexity, and prediction. Journal of Memory and Language,
69(3), 165–182. doi:10.1016/j.jml.2013.06.001
Ferreira, F., & Henderson, J. M. (1990). Use of verb information in syntactic parsing:
Evidence from eye movements and word-by-word self-paced reading. Journal
of Experimental Psychology: Learning, Memory, and Cognition, 16(4), 555–568.
doi:10.1037/0278-7393.16.4.555
Field, A. (2018). Discovering statistics using IBM SPSS statistics (5th ed.). London, UK: Sage.
Findlay, J. M. (2004). Eye scanning and visual search. In J. M. Henderson, & F. Ferreira
(Eds.), The interface of language, vision, and action: Eye movements and the visual world (pp.
135–160). Chicago, IL: Psychology Press.
Findlay, J. M., & Gilchrist, I. D. (2003). Active vision: The psychology of looking and seeing.
Oxford, UK: Oxford University Press.
*Flecken, M. (2011). Event conceptualization by early Dutch–German bilinguals: Insights
from linguistic and eye-tracking data. Bilingualism: Language and Cognition, 14(1), 61–77.
doi:10.1017/S1366728910000027
*Flecken, M., Carroll, M.,Weimar, K., & Von Stutterheim, C. (2015). Driving along the road
or heading for the village? Conceptual differences underlying motion event encoding
in French, German, and French-German L2 users. The Modern Language Journal, 99(S1),
100–122. doi:10.1111/modl.12181
374 References
Fodor, J. A. (1983). The modularity of mind: An essay on faculty psychology. Cambridge, MA:
The MIT Press.
Forster, K. I. (1970). Visual perception of rapidly presented word sequences of varying
complexity. Perception and Psychophysics, 8(4), 215–221. doi:10.3758/BF03210208
Foucart, A., & Frenck-Mestre, C. (2012). Can late L2 learners acquire new grammatical
features? Evidence from ERPs and eye-tracking. Journal of Memory and Language, 66(1),
226–248. doi:10.1016/j.jml.2011.07.007
Fox, M. C., Ericsson, K. A., & Best, R. (2011). Do procedures for verbal reporting of
thinking have to be reactive? A meta-analysis and recommendations for best reporting
methods. Psychological Bulletin, 137(2), 316–344. doi:10.1037/a0021663
Fraser, C.A. (1999). Lexical processing strategy use and vocabulary learning through reading.
Studies in Second Language Acquisition, 21(2), 225–241. doi:10.1017/S0272263199002041
Frazier, L. (1987). Sentence processing:A tutorial review. In M. Coltheart (Ed.), Attention and
performance XII.The psychology of reading (Vol. XII, pp. 559–586). Hillsdale, NJ: Laurence
Erlbaum Associates. Retrieved from http://cnbc.cmu.edu/~plaut/IntroPDP/papers/
Frazier87.sentProcRev.pdf
Frenck-Mestre, C. (2005). Eye-movement recording as a tool for studying syntactic
processing in a second language: A review of methodologies and experimental findings.
Second Language Research, 21(2), 175–198. doi:10.1191/0267658305sr257oa
Friston, K. (2010). The free-energy principle: A unified brain theory? Nature Reviews
Neuroscience, 11(2), 127–138. doi:10.1038/nrn2787
Fukkink, R. G. (2005). Deriving word meaning from written context: A process analysis.
Learning and Instruction, 15(1), 23–43. doi:10.1016/j.learninstruc.2004.12.002
Gánem-Gutiérrez, G. A., & Gilmore, A. (2018). Tracking the real-time evolution of a
writing event: Second language writers at different proficiency levels. Language Learning,
68(2), 469–506. doi:10.1111/lang.12280
Gass, S. M. (1997). Input, interaction, and the second language learner. Mahwah, NJ: Lawrence
Erlbaum Associates.
Gass, S. M., & Mackey, A. (2017). Stimulated recall methodology in applied linguistics and L2
research (2nd ed.). New York: Routledge.
Gelman, A., & Hill, J. (2007). Data analysis using regression and hierarchical/multilevel models.
Cambridge, MA: Cambridge University Press.
Gilchrist, I. D. (2011). Saccades. In S. P. Liversedge, I. D. Gilchrist, & S. Everling (Eds.), The
Oxford handbook of eye movements (pp. 85–94). Oxford, UK: Oxford University Press.
doi:10.1093/oxfordhb/9780199539789.013.0005
Godfroid, A. (2010). Cognitive processes in Second Language Acquisition: The role of
noticing, attention and awareness in processing words in written L2 input (Unpublished
doctoral dissertation). University of Brussels, Belgium.
Godfroid, A. (2012). Eye tracking. In P. Robinson (Ed.), Routledge encyclopedia of second
language acquisition (pp. 234–236). New York: Routledge.
Godfroid, A. (2016). The effects of implicit instruction on implicit and explicit knowledge
development. Studies in Second Language Acquisition, 38(2), 177–215. doi:10.1017/
S0272263115000388
Godfroid, A. (2019). Investigating instructed second language acquisition using L2 learners’
eye-tracking data. In R. P. Leow (Ed.), The Routledge handbook of second language research
in classroom learning (pp. 44–57). New York: Routledge.
Godfroid, A. (in press). Implicit and explicit learning and knowledge. In H. Mohebbi & C.
Coombe (Eds.), Research questions in language education and applied linguistics. Springer.
References 375
*Godfroid, A., Ahn, J., Choi, I., Ballard, L., Cui, Y., Johnston, S., … Yoon, H.-J.
(2018). Incidental vocabulary learning in a natural reading context: An eye-
tracking study. Bilingualism: Language and Cognition, 21(3), 563–584. doi:10.1017/
S1366728917000219
Godfroid, A., Ahn, J., Rebuschat, P., & Dienes, Z. (in preparation). Development of explicit
knowledge from artificial language learning: Evidence from eye movements.
*Godfroid, A., Boers, F., & Housen, A. (2013). An eye for words: Gauging the role of
attention in incidental L2 vocabulary acquisition by means of eye-tracking. Studies in
Second Language Acquisition, 35(3), 483–517. doi:10.1017/S0272263113000119
*Godfroid, A., Loewen, S., Jung, S., Park, J.-H., Gass, S., & Ellis, R. (2015). Timed and
untimed grammaticality judgments measure distinct types of knowledge: Evidence
from eye-movement patterns. Studies in Second Language Acquisition, 37(2), 269–297.
doi:10.1017/S0272263114000850
Godfroid, A., & Schmidtke, J. (2013). What do eye movements tell us about awareness ? A
triangulation of eye-movement data, verbal reports, and vocabulary learning scores. In
J. Bergsleithner, S. Frota, & J. K. Yoshioka (Eds.), Noticing and second language acquisition:
Studies in honor of Richard Schmidt (pp. 183–205). Honolulu, HI: University of Hawai‘i,
National Foreign Language Resource Center. Retrieved from http://sls.msu.edu/
files/5213/8229/7769/Godfroid__Schmidtke_2013.pdf
*Godfroid, A., & Spino, L. A. (2015). Reconceptualizing reactivity of think-alouds and
eye tracking: Absence of evidence is not evidence of absence. Language Learning, 65(4),
896–928. doi:10.1111/lang.12136
Godfroid, A., & Uggen, M. S. (2013). Attention to irregular verbs by beginning learners
of German. Studies in Second Language Acquisition, 35(2), 291–322. doi:10.1017/
S0272263112000897
Godfroid, A., & Winke, P. M. (2015). Investigating implicit and explicit processing using
L2 learners’ eye-movement data. In P. Rebuschat (Ed.), Implicit and explicit learning of
languages (pp. 325–348). Amsterdam, the Netherlands: John Benjamins.
Goo, J. (2010). Working memory and reactivity. Language Learning, 60(4), 712–752.
doi:10.1111/j.1467-9922.2010.00573.x
Gough, P. B. (1972). One second of reading. Visible Language, 6(4), 291–320.
Green, A. (1998). Verbal protocol analysis in language testing research: A handbook. New York:
Cambridge University Press.
Green, P., & MacLeod, C. J. (2016). SIMR: An R package for power analysis of generalized
linear mixed models by simulation. Methods in Ecology and Evolution, 7(4), 493–498.
doi:10.1111/2041-210X.12504
Green, P., MacLeod, C. J., & Alday, P. (2016). Package ‘simr’, Available at: https://cran.r-
project.org/web/packages/simr/simr.pdf.
Gries, S. Th. (2013). Statistics for linguistics with R (2nd ed.). Berlin, Germany: Walter de
Gruyter.
Gries, S. Th. (2015). The most under-used statistical method in corpus linguistics:
Multi-level (and mixed-effects) models. Corpora, 10(1), 95–125. doi:10.3366/
cor.2015.0068
Griffin, Z. M., & Bock, K. (2000). What the eyes say about speaking. Psychological Science,
11(4), 274–279. doi:10.1111/1467-9280.00255
Grosbras, M. H., Laird, A. R., & Paus,T. (2005). Cortical regions involved in eye movements,
shifts of attention, and gaze perception. Human Brain Mapping, 25(1), 140–154.
doi:10.1002/hbm.20145
376 References
*Grüter, T., Lew-Williams, C., & Fernald, A. (2012). Grammatical gender in L2: A
production or a real-time processing problem? Second Language Research, 28(2), 191–
215. doi:10.1177/0267658312437990
Grüter,T., Rohde, H., & Schafer,A. J. (2014).The role of discourse-level expectations in non-
native speakers’ referential choices. In W. Orman & M. J.Valleau (Eds.), Proceedings of the
38th annual Boston university conference on language development (pp. 179–191) Somerville,
MA: Cascadilla Press. Retrieved from https://par.nsf.gov/servlets/purl/10028988
Grüter, T., Rohde, H., & Schafer, A. J. (2017). Coreference and discourse coherence in L2:
The roles of grammatical aspect and referential form. Linguistic Approaches to Bilingualism,
7(2), 199–229. doi:10.1075/lab.15011.gru
Guan, C. Q., Liu, Y., Chan, D. H. L., Ye, F., & Perfetti, C. A. (2011). Writing strengthens
orthography and alphabetic-coding strengthens phonology in learning to read Chinese.
Journal of Educational Psychology, 103(3), 509–522. https://psycnet.apa.org/doi/10.1037/
a0023730
Gullberg, M., & Holmqvist, K. (1999). Keeping an eye on gestures: Visual perception
of gestures in face-to-face communication. Pragmatics & Cognition, 7(1), 35–63.
doi:10.1075/pc.7.1.04gul
Gullberg, M., & Holmqvist, K. (2006).What speakers do and what addressees look at: Visual
attention to gestures in human interaction live and on video. Pragmatics & Cognition,
14(1), 53–82. doi:10.1075/pc.14.1.05gul
Hafed, Z. M., & Krauzlis, R. J. (2010). Microsaccadic suppression of visual bursts in the
primate superior colliculus. Journal of Neuroscience, 30(28), 9542–9547. doi:10.1523/
JNEUROSCI.1137-10.2010
Häikiö, T., Bertram, R., Hyönä, J., & Niemi, P. (2009). Development of the letter identity
span in reading: Evidence from the eye movement moving window paradigm. Journal of
Experimental Child Psychology, 102(2), 167–181. doi:10.1016/j.jecp.2008.04.002
Hair, J. F., Black, W. C., Babin, B. J., & Anderson, R. E. (2010). Multivariate data analysis (7th
ed.). Hoboken, NJ: Pearson Education Inc.
Hama, M., & Leow, R. P. (2010). Learning without awareness revisited. Extending Williams
(2005).Studies in Second LanguageAcquisition, 32,465–491.doi:10.1017/S0272263110000045
Hattie, J. (1992). Measuring the effects of schooling. Australian Journal of Education, 36(1),
5–13. doi:10.1177/000494419203600102
Havik, E., Roberts, L., van Hout, R., Schreuder, R., & Haverkort, M. (2009). Processing
subject-object ambiguities in the L2:A self-paced reading study with German L2 learners
of Dutch. Language Learning, 59(1), 73–112. doi:10.1111/j.1467-9922.2009.00501.x
Hayes, T. R., & Henderson, J. M. (2017). Scan patterns during real-world scene viewing
predict individual differences in cognitive capacity. Journal of Vision, 17(5), 23, 1–17.
doi:10.1167/17.5.23
He, X., & Li, W. (2018, March). Working memory, inhibitory control, and learning L2
grammar with input-output activities: Evidence from eye movements. Paper to the
Annual Meeting of the American Association of Applied Linguistics, Chicago, IL.
Henderson, J. M., & Ferreira, F. (1990). Effects of foveal processing difficulty on the
perceptual span in reading: Implications for attention and eye movement control. Journal
of Experimental Psychology: Learning, Memory, and Cognition, 16(3), 417–429. https://
psycnet.apa.org/doi/10.1037/0278-7393.16.3.417
Henderson, J. M., & Ferreira, F. (2004). Scene perception for psycholinguists. In J. M.
Henderson & F. Ferreira (Eds.), The interface of language, vision, and action: Eye movements
and the visual world (pp. 1–58). New York: Psychology Press.
References 377
Henderson, J. M., & Hollingworth, A. (1999). High-level scene perception. Annual Review
of Psychology, 50(1), 243–271. doi.org/10.1146/annurev.psych.50.1.243
Henderson, J. M., & Luke, S. G. (2014). Stable individual differences in saccadic eye
movements during reading, pseudoreading, scene viewing, and scene search. Journal
of Experimental Psychology: Human Perception and Performance, 40(4), 1390–1400.
doi:10.1037/a0036330
Hering, C. (1879). Condensed materia medica (2nd ed.). New York: Boericke & Tafel.
Hilbe, J. M. (2007). Negative binomial regression. Cambridge, MA: Cambridge University Press.
Hintz, F., Meyer, A. S., & Huettig, F. (2017). Predictors of verb-mediated anticipatory eye
movements in the visual world. Journal of Experimental Psychology: Learning, Memory, and
Cognition, 43(9), 1352–1374. doi:10.1037/xlm0000388
Hirotani, M., Frazier, L., & Rayner, K. (2006). Punctuation and intonation effects on clause
and sentence wrap-up: Evidence from eye movements. Journal of Memory and Language,
54(3), 425–443. doi:10.1016/j.jml.2005.12.001
Holmqvist, K., Nyström, M., Andersson, R., Dewhurst, R., Halszka, J., & van de Weijer, J.
(2011). Eye tracking: A comprehensive guide to methods and measures. Oxford, UK: Oxford
University Press.
Holmqvist, K., & Zemblys, R. (2016, June). Common predictors of accuracy, precision and
data loss in eye-trackers. Paper presented at the 7th Scandinavian Workshop on Applied
Eye Tracking, Turku, Finland.
Holmqvist, K., Zemblys, R., Dixon, D. C., Mulvey, F. B., Borah, J., & Pelz, J. B. (2015,
August). The effect of sample selection methods on data quality measures and on
predictors for data quality. Paper presented at the 18th European Conference on Eye
Movements,Vienna, Austria.
Holšánová, J. (2008). Discourse, vision, and cognition. Amsterdam, the Netherlands: Benjamins.
Retrieved from http://ezproxy.lib.utexas.edu/login?url=http://search.ebscohost.com/
login.aspx?direct=true&db=psyh&AN=2008-00802-000&site=ehost-live
Hopp, H. (2009). The syntax–discourse interface in near-native L2 acquisition: Off-
line and on-line performance. Bilingualism: Language and Cognition, 12(4), 463–483.
doi:10.1017/S1366728909990253
*Hopp, H. (2013). Grammatical gender in adult L2 acquisition: Relations between
lexical and syntactic variability. Second Language Research, 29(1), 33–56.
doi:10.1177/0267658312461803
Hopp, H. (2014). Working memory effects in the L2 processing of ambiguous relative
clauses. Language Acquisition, 21(3), 250–278. doi:10.1080/10489223.2014.892943
Hopp, H. (2015). Semantics and morphosyntax in predictive L2 sentence processing.
International Review of Applied Linguistics in Language Teaching, 53(3), 277–306.
doi:10.1515/iral-2015-0014
*Hopp, H. (2016). Learning (not) to predict: Grammatical gender processing
in second language acquisition. Second Language Research, 32(2), 277–307.
doi:10.1177/0267658315624960
*Hopp, H., & Lemmerth, N. (2018). Lexical and syntactic congruency in L2 predictive
gender processing. Studies in Second Language Acquisition, 40(1), 171–199. doi:10.1017/
S0272263116000437000437
*Hopp, H., & León Arriaga, M. E. (2016). Structural and inherent case in the non-native
processing of Spanish: Constraints on inflectional variability. Second Language Research,
32(1), 75–108. doi:10.1177/0267658315605872
Hosseini, K. (2007). A thousand splendid suns. New York: Riverhead Books.
378 References
*Hoversten, L. J., & Traxler, M. J. (2016). A time course analysis of interlingual homograph
processing: Evidence from eye movements. Bilingualism: Language and Cognition, 19(2),
347–360. doi:10.1017/S1366728915000115
Huettig, F. (2015). Four central questions about prediction in language processing. Brain
Research, 1626, 118–135. doi:10.1016/j.brainres.2015.02.014
Huettig, F., & Altmann, G. T. M. (2004). The online processing of ambiguous and
unambiguous words in context: Evidence from head-mounted eye-tracking. In M.
Carreiras & C. Clifton Jr. (Eds.), The on-line study of sentence comprehension: Eyetracking,
ERP and beyond (pp. 187–208). New York: Psychology Press.
Huettig, F., & Altmann, G. T. M. (2005). Word meaning and the control of eye fixation:
semantic competitor effects and the visual world paradigm. Cognition, 96(1), B23–B32.
doi:10.1016/j.cognition.2004.10.003
Huettig, F., & Altmann, G. T. M. (2011). Looking at anything that is green when hearing
“frog”: How object surface colour and stored object colour knowledge influence
language-mediated overt attention. Quarterly Journal of Experimental Psychology, 64(1),
122–145. doi:10.1080/17470218.2010.481474
Huettig, F., & McQueen, J. M. (2007). The tug of war between phonological, semantic and
shape information in language-mediated visual search. Journal of Memory and Language,
57(4), 460–482. doi:10.1016/j.jml.2007.02.001
Huettig, F., Rommers, J., & Meyers, A. S. (2011). Using the visual word paradigm to study
language processing: A review and critical evaluation. Acta Psychologica, 137, 151–171.
doi:10.1016/j.actpsy.2010.11.003
Huey, E. B. (1908). The psychology and pedagogy of reading. New York: Macmillan.
Hutzler, F., Braun, M., Võ, M. L.-H., Engl, V., Hofmann, M., Dambacher, M., … Jacobs,
A. M. (2007). Welcome to the real world: Validating fixation-related brain potentials
for ecologically valid settings. Brain Research, 1172, 124–129. doi:10.1016/j.
brainres.2007.07.025
Hyönä, J., Lorch, R. F., & Kaakinen, J. K. (2002). Individual differences in reading to
summarize expository text: Evidence from eye fixation patterns. Journal of Educational
Psychology, 94(1), 44–55. https://psycnet.apa.org/doi/10.1037/0022-0663.94.1.44
Hyönä, J., & Nurminen, A. M. (2006). Do adult readers know how they read? Evidence
from eye movement patterns and verbal reports. British Journal of Psychology, 97(1), 31–
50. doi:10.1348/000712605X53678
Ikeda, M., & Saida, S. (1978). Span of recognition in reading. Vision Research, 18(1), 83–88.
doi:10.1016/0042-6989(78)90080-9
*Indrarathne, B., & Kormos, J. (2017). Attentional processing of input in explicit and
implicit conditions: An eye-tracking study. Studies in Second Language Acquisition, 39(3),
401–430. doi:10.1017/S027226311600019X
*Indrarathne, B., & Kormos, J. (2018). The role of working memory in processing L2
input: Insights from eye-tracking. Bilingualism: Language and Cognition, 21(2), 355–374.
doi:10.1017/S1366728917000098
Inhoff, A.W., & Radach, R. (1998). Definition and computation of oculomotor measures in
the study of cognitive processes. In G. Underwood (Ed.), Eye guidance in reading and scene
perception (pp. 29–53). New York: Elsevier. doi:10.1016/B978-008043361-5/50003-1
Irwin, D. E. (1998). Lexical processing during saccadic eye movements. Cognitive Psychology,
36(1), 1–27. doi:10.1006/cogp.1998.0682
Issa, B., & Morgan-Short, K. (2019). Effects of external and internal attentional manipulations
on second language grammar development: An eye-tracking study. Studies in Second
Language Acquisition, 41(2), 389–417. doi:10.1017/S027226311800013X
References 379
Issa, B., Morgan-Short, K., Villegas, B., & Raney, G. (2015). An eye-tracking study on the
role of attention and its relationship with motivation. EuroSLA Yearbook, 15, 114–142.
doi:10.1075/eurosla.15.05iss
*Ito, A., Corley, M., & Pickering, M. J. (2018). A cognitive load delays predictive eye
movements similarly during L1 and L2 comprehension. Bilingualism: Language and
Cognition, 21(2), 251–264. doi:10.1017/S1366728917000050
Ito, K., & Speer, S. R. (2008). Anticipatory effects of intonation: Eye movements during
instructed visual search. Journal of Memory and Language, 58(2), 541–573. doi:10.1016/j.
jml.2007.06.013
Izumi, S., Bigelow, M., Fujiwara, M., & Fearnow, S. (1999). Testing the output hypothesis:
Effects of output on noticing and second language acquisition. Studies in Second Language
Acquisition, 21(3), 421–452. doi:10.1017/S0272263199003034
Izumi, S., & Bigelow, M. (2000). Does output promote noticing and second language
acquisition? TESOL Quarterly, 34(2), 239–278. doi:10.2307/3587952
Jackson, C. N., & Bobb, S. C. (2009). The processing and comprehension of wh-questions
among second language speakers of German. Applied Psycholinguistics, 30(4), 603–636.
doi:10.1017/S014271640999004X
Jacobs, A. M. (2000). Five questions about cognitive models and some answers from three
models of reading. In A. Kennedy, R. Radach, D. Heller, & J. Pynte (Eds.), Reading as a
perceptual process (pp. 721–732). Oxford, UK: Elsevier.
Jacobson, E. (1930). Electrical measurements of neuromuscular states during mental
activities: 1. Imagination of movement involving skeletal muscle. American Journal of
Physiology, 95, 567–608. doi:10.1152/ajplegacy.1930.91.2.567
Jaeger, T. F. (2008). Categorical data analysis: Away from ANOVAs (transformation or
not) and towards logit mixed models. Journal of Memory and Language, 59(4), 434–446.
doi:10.1016/j.jml.2007.11.007
Jegerski, J. (2014). Self-paced reading. In J. Jegerski & B.VanPatten (Eds.), Research methods
in second language psycholinguistics (pp. 20–49). London, UK: Taylor & Francis.
Jeon, E. H. (2015). Multiple regression. In L. Plonsky (Ed.), Advancing quantitative methods in
second language research (pp. 131–158). New York: Routledge.
Jiang, N. (2012). Conducting reaction time research in second language research. London, UK:
Routledge. doi:10.4324/9780203146255
John Hopkins Medicine. (2014). Fast eye movements: A possible indicator of more impulsive
decision-making. Retrieved from https://www.hopkinsmedicine.org/news/media/
releases/fast_eye_movements_a_possible_indicator_of_more_impulsive_decision_making
Jordan, T. R., Almabruk, A. A. A., Gadalla, E. A., McGowan,V. A., White, S. J., Abedipour, L.,
& Paterson, K. B. (2014). Reading direction and the central perceptual span: Evidence
from Arabic and English. Psychonomic Bulletin & Review, 21(2), 505–511. doi:10.3758/
s13423-013-0510-4
Joseph, H. S. S. L., Wonnacott, E., Forbes, P., & Nation, K. (2014). Becoming a written
word: Eye movements reveal order of acquisition effects following incidental exposure
to new words during silent reading. Cognition, 133(1), 238–248. doi:10.1016/j.
cognition.2014.06.015
Ju, M., & Luce, P. A. (2004). Falling on sensitive ears. Constraints on bilingual lexical
activation. Psychological Science, 15(5), 314–318. doi:10.1111/j.0956-7976.2004.00675.x
Ju, M., & Luce, P. A. (2006). Representational specificity of within-category phonetic
variation in the long-term mental lexicon. Journal of Experimental Psychology: Human
Perception and Performance, 32(1), 120–138. doi:10.1037/0096-1523.32.1.120
Juffs, A., & Rodríguez, G. A. (2015). Second language sentence processing. New York: Routledge.
380 References
Juhasz, B. J. (2008). The processing of compound words in English: Effects of word length
on eye movements during reading. Language and Cognitive Processes, 23(7–8), 1057–
1088. doi:10.1080/01690960802144434
Juhasz, B. J., & Rayner, K. (2003). Investigating the effects of a set of intercorrelated variables
on eye fixation durations in reading. Journal of Experimental Psychology: Learning, Memory,
and Cognition, 29(6), 1312–1318. doi:10.1037/0278-7393.29.6.1312
Juhasz, B. J., & Rayner, K. (2006). The role of age of acquisition and word frequency
in reading: Evidence from eye fixation durations. Visual Cognition, 13(7–8), 846–863.
doi:10.1080/13506280544000075
Just, M. A., & Carpenter, P. A. (1980). A theory of reading: From eye fixations to
comprehension. Psychological Review, 87(4), 329–354. https://psycnet.apa.org/
doi/10.1037/0033-295X.87.4.329
Just, M. A., Carpenter, P. A., & Woolley, J. D. (1982). Paradigms and processes in reading
comprehension. Journal of Experimental Psychology: General, 111(2), 228–238.
doi:10.1037/0096-3445.111.2.228
Kaakinen, J. K., & Hyönä, J. (2005). Perspective effects on expository text comprehension:
Evidence from think-aloud protocols, eyetracking, and recall. Discourse Processes, 40(3),
239–257. doi:10.1207/s15326950dp4003_4
Kaan, E. (2014). Predictive sentence processing in L2 and L1. Linguistic Approaches to
Bilingualism, 4(2), 257–282. doi:10.1075/lab.4.2.05kaa
Kamide,Y., Altmann, G. T. M., & Haywood, S. L. (2003). The time-course of prediction in
incremental sentence processing: Evidence from anticipatory eye movements. Journal of
Memory and Language, 49(1), 133–156. doi:10.1016/S0749-596X(03)00023-8
*Kaushanskaya, M., & Marian,V. (2007). Bilingual language processing and interference in
bilinguals: Evidence from eye tracking and picture naming. Language Learning, 57(1),
119–163. doi:10.1111/j.1467-9922.2007.00401.x
*Keating, G. D. (2009). Sensitivity to violations of gender agreement in native and
nonnative Spanish: An eye-movement investigation. Language Learning, 59(3), 503–535.
doi:10.1111/j.1467-9922.2009.00516.x
Keating, G. D. (2014). Eye-tracking with text. In J. Jegerski & B. VanPatten (Eds.),
Research methods in second language psycholinguistics (pp. 69–92). London, UK: Taylor
& Francis.
Keating, G. D., & Jegerski, J. (2015). Experimental designs in sentence processing. Studies in
Second Language Acquisition, 37(1), 1–32. doi:10.1017/S0272263114000187
Kennedy, A., & Pynte, J. (2005). Parafoveal-on-foveal effects in normal reading. Vision
Research, 45(2), 153–168. doi:10.1016/j.visres.2004.07.037
Kennedy, A., Pynte, J., & Ducrot, S. (2002). Parafoveal-on-foveal interactions in word
recognition. The Quarterly Journal of Experimental Psychology, 55(4), 1307–1337.
doi:10.1080/02724980244000071
Kerlinger, F. N., & Lee, H. B. (2000). Foundations of behavioral research. Orlando, FL: Harcourt
College Publishers.
Khalifa, H., & Weir, C. J. (2009). Examining reading: Research and practice in assessing second
language reading. Cambridge, MA: Cambridge University Press.
*Kim, E., Montrul, S., & Yoon, J. (2015). The on-line processing of binding principles in
second language acquisition: Evidence from eye tracking. Applied Psycholinguistics, 36(6),
1317–1374. doi:10.1017/S0142716414000307
Kliegl, R., Dambacher, M., Dimigen, O., & Sommer,W. (2014). Oculomotor control, brain
potentials, and timelines of word recognition during natural reading. In M. Horsley, M.
References 381
Eliot, B. A. Knight, & R. Reilly (Eds.), Current trends in eye tracking research (pp. 141–155).
Springer. doi:10.1007/978-3-319-02868-2_10
Kliegl, R., Grabner, E., Rolfs, M., & Engbert, R. (2004). Length, frequency, and predictability
effects of words on eye movements in reading. European Journal of Cognitive Psychology,
16(1–2), 262–284. doi:10.1080/09541440340000213
Kliegl, R., Nuthmann, A., & Engbert, R. (2006). Tracking the mind during reading: The
influence of past, present, and future words on fixation durations. Journal of Experimental
Psychology: General, 135(1), 12–35. doi:10.1037/0096-3445.135.1.12
*Kohlstedt,T., & Mani, N. (2018).The influence of increasing discourse context on L1 and
L2 spoken language processing. Bilingualism: Language and Cognition, 21(1), 121–136.
doi:10.1017/S1366728916001139
Kohsom, C., & Gobet, F. (1997). Adding spaces to Thai and English: Effects on reading. In
L. R. Gleitman & A. K. Joshi (Eds.), Proceedings of the twenty-second annual conference of
the Cognitive Science Society (pp. 388–393). Mahwah, NJ: Lawrence Erlbaum Associates.
Retrieved from https://mindmodeling.org/cogscihistorical/cogsci_22.pdf
Krauzlis, R. J. (2013). Eye movements. In L. R. Squire, D. Berg, F. E. Bloom, S. du Lac, A.
Ghosh, & N. C. Spitzer (Eds.) Fundamental Neuroscience (4th ed., pp. 697–714).Waltham,
MA: Elsevier.
Kretzschmar, F., Bornkessel-Schlesewsky, I., & Schlesewsky, M. (2009). Parafoveal versus
foveal N400s dissociate spreading activation from contextual fit. NeuroReport, 20(18),
1613–1618. doi:10.1097/WNR.0b013e328332c4f4
Kreysa, H., & Pickering, M. J. (2011). Eye movements in dialogue. In S. P. Liversedge, I.
D. Gilchrist, & S. Everling (Eds.), The Oxford handbook of eye movements (pp. 943–959).
Oxford, UK: Oxford University Press.
Kroll, J. F., & Bialystok, E. (2013). Understanding the consequences of bilingualism for
language processing and cognition. Journal of Cognitive Psychology, 25(5), 497–514. doi:1
0.1080/20445911.2013.799170
Kroll, J. F., Dussias, P. E., Bice, K., & Perrotti, L. (2015). Bilingualism, mind, and brain. Annual
Review of Linguistics, 1(1), 377–394. doi:10.1146/annurev-linguist-030514-124937
Kroll, J. F., & Ma, F. (2017).The bilingual lexicon. In E. M. Fernández & H. S. Cairns (Eds.),
The handbook of psycholinguistics (pp. 294–319). Hoboken, NJ: Wiley.
Kühberger, A., Fritz, A., & Scherndl, T. (2014). Publication bias in psychology: A diagnosis
based on the correlation between effect size and sample size. PLoS ONE, 9(9), e105825.
doi:10.1371/journal.pone.0105825
Kuperberg, G. R., & Jaeger, T. F. (2016). What do we mean by prediction in language
comprehension? Language, Cognition and Neuroscience, 31(1), 32–59. doi:10.1080/2327
3798.2015.1102299
Kuperman,V., & Van Dyke, J. A. (2011). Effects of individual differences in verbal skills on
eye-movement patterns during sentence reading. Journal of Memory and Language, 65(1),
42–73. doi:10.1016/j.jml.2011.03.002
Kurtzman, H. S., Crawford, L. F., & Nychis-Florence, C. (1991). Locating Wh-traces. In
R. C. Berwick, S. P. Abney, & C. Tenny (Eds.), Principle-based parsing (pp. 347–382).
Dordrecht, the Netherlands: Springer. doi:10.1007/978-94-011-3474-3_13
Kutner, M. H., Nachtsheim, C. J., Neter, J., & Li, W. (2005). Applied linear statistical models
(5th ed.). Irwin, NY: McGraw-Hill.
Lachaud, C. M., & Renaud, O. (2011). A tutorial for analyzing human reaction times:
How to filter data, manage missing values, and choose a statistical model. Applied
Psycholinguistics, 32(2), 389–416. doi:10.1017/S0142716410000457
382 References
Lagrou, E., Hartsuiker, R. J., & Duyck, W. (2013). The influence of sentence context
and accented speech on lexical access in second-language auditory word
recognition. Bilingualism: Language and Cognition, 16(3), 508–517. doi:10.1017/
S1366728912000508
Lamare, M. (1892). Des mouvements des yeux dans la lecture. Bulletins et Mémoires de la
Société Française d’Ophthalmologie, 10, 354–364.
Larsen-Freeman, D., & Long, M. H. (1991). An introduction to second language acquisition
research. New York: Routledge.
Larson-Hall, J. (2016). A guide to doing statistics in second language research using SPSS and R.
New York: Routledge.
Larson-Hall, J. (2017). Moving beyond the bar plot and the line graph to create informative
and attractive graphics. The Modern Language Journal, 101(1), 244–270. doi:10.1111/
modl.12386
Lau, E., & Grüter,T. (2015). Real-time processing of classifier information by L2 speakers of
Chinese. In E. Grillo & K. Jepson (Eds.), Proceedings of the 39th Annual Boston University
Conference on language development (pp. 311–323). Somerville, MA: Cascadilla Press.
Lee, C. H., & Kalyuga, S. (2011). Effectiveness of different pinyin presentation formats
in learning Chinese characters: A cognitive load perspective. Language Learning, 61(4),
1099–1118. doi:10.1111/j.1467-9922.2011.00666.x
*Lee, S., & Winke, P. (2018). Young learners’ response processes when taking
computerized tasks for speaking assessment. Language Testing, 35(2), 239–269.
doi:10.1177/0265532217704009
Leeser, M. J., Brandl, A., & Weissglass, C. (2011). Task effects in second language sentence
processing research. In P. Trofimovich & K. McDonough (Eds.), Applying priming
methods to L2 learning, teaching, and research: Insights from psycholinguistics (pp. 179–198).
Amsterdam, the Netherlands: John Benjamins.
Legge, G. E., & Bigelow, C. A. (2011). Does print size matter for reading? A review
of findings from vision science and typography. Journal of Vision, 11(5), 8, 1–22.
doi:10.1167/11.5.8
Legge, G. E., Klitz,T. S., & Tjan, B. S. (1997). Mr. Chips: An ideal-observer model of reading.
Psychological Review, 104(3), 524–553. doi:10.1037/0033-295X.104.3.524
Leow, R. P. (1997). Attention, awareness, and foreign language behavior. Language Learning,
47(3), 467–505. doi:10.1111/0023-8333.00017
Leow, R. P. (1998). Toward operationalizing the process of attention in SLA: Evidence
for Tomlin and Villa’s (1994) fine grained analysis of attention. Applied Psycholinguistics,
19(1), 133–159. doi:10.1017/S0142716400010626
Leow, R. P. (2000). A study of the role of awareness in foreign language behavior. Studies in
Second Language Acquisition, 22(4), 557–584. doi:10.1017/S0272263100004046
Leow, R. P. (2015). Explicit learning in the L2 classroom: A student-centered approach. New York:
Routledge.
Leow, R. P., Grey, S., Marijuan, S., & Moorman, C. (2014). Concurrent data elicitation
procedures, processes, and the early stages of L2 learning: A critical overview. Second
Language Research, 30(2), 111–127. doi:10.1177/0267658313511979
Leow, R. P., Hsieh, H.-C., & Moreno, N. (2008). Attention to form and meaning revisited.
Language Learning, 58(3), 665–695. doi:10.1111/j.1467-9922.2008.00453.x
Leow, R. P., & Morgan-Short, K. (2004). To think aloud or not to think aloud: The issue
of reactivity in SLA research methodology. Studies in Second Language Acquisition, 26(1),
35–57. doi:10.1017/S0272263104261022
References 383
Lettvin, J. Y., Maturana, H. R., MsCulloch, W. S., & Pitts, W. H. (1968). What the frog’s
eye tells the frog’s brain. In W. C. Corning & M. Balaban (Eds.), The mind: Biological
approaches to its functions (pp. 233–258). London, UK: John Wiley and Sons Inc.
Leung, C. Y., Sugiura, M., Abe, D., & Yoshikawa, L. (2014). The perceptual span in second
language reading: An eye-tracking study using a gaze-contingent moving window
paradigm. Open Journal of Modern Linguistics, 4(5), 585–594. doi:10.4236/ojml.2014.45051
Leung, J. H. C., & Williams, J. N. (2011). The implicit learning of mappings between forms
and contextually derived meanings. Studies in Second Language Acquisition, 33(1), 33–55.
doi:10.1017/S0272263110000525
Levenshtein, V. I. (1966). Binary codes capable of correcting deletions, insertions and
reversals. Soviet Physics Doklady, 10(8), 707–710.
Lew-Williams, C., & Fernald, A. (2007).Young children learning Spanish make rapid use of
grammatical gender in spoken word recognition. Psychological Science, 18(3), 193–198.
doi:10.1111/j.1467-9280.2007.01871.x
Lew-Williams, C., & Fernald, A. (2010). Real-time processing of gender-marked articles by
native and non-native Spanish speakers. Journal of Memory and Language, 63(4), 447–464.
doi:10.1016/j.jml.2010.07.003
Li, X., Liu, P., & Rayner, K. (2011). Eye movement guidance in Chinese reading: Is there
a preferred viewing location? Vision Research, 51(10), 1146–1156. doi:10.1016/j.
visres.2011.03.004
Liberman, A. M. (2005). How much more likely? The implications of odds ratios
for probabilities. American Journal of Evaluation, 26(2), 253–266. doi:10.1177/
1098214005275825
Lim, H., & Godfroid, A. (2015). Automatization in second language sentence processing:
A partial, conceptual replication of Hulstijn,Van Gelderen, and Schoonen’s 2009 study.
Applied Psycholinguistics, 36(5), 1247–1282. doi:10.1017/S0142716414000137
Lim, J. H., & Christianson, K. (2013). Second language sentence processing in reading for
comprehension and translation. Bilingualism: Language and Cognition, 16(3), 518–537.
doi:10.1017/S1366728912000351
*Lim, J. H., & Christianson, K. (2015). Second language sensitivity to agreement errors:
Evidence from eye movements during comprehension and translation. Applied
Psycholinguistics, 36(6), 1283–1315. doi:10.1017/S0142716414000290
Linck, J. A., & Cunnings, I. (2015). The utility and application of mixed-effects models in
second language research. Language Learning, 65(S1), 185–207. doi:10.1111/lang.12117
Lipsey, M. W., & Wilson, D. B. (1993). The efficacy of psychological, educational, and
behavioral treatment: Confirmation from meta-analysis. American Psychologist, 48(12),
1181–1209. doi:10.1037/0003-066X.48.12.1181
Liu,Y., Wang, M., & Perfetti, C. A. (2007). Threshold-style processing of Chinese characters
for adult second-language learners. Memory and Cognition, 35(3), 471–480. doi:10.3758/
BF03193287
Liversedge, S. P., & Findlay, J. M. (2000). Saccadic eye movements and cognition. Trends in
Cognitive Sciences, 4(1), 6–14. doi:10.1016/S1364-6613(99)01418-7
Liversedge, S. P., Gilchrist, I., & Everling, S. (2011). The Oxford handbook of eye movements.
Oxford, UK: Oxford University Press.
Loewen, S. (2015). Introduction to instructed second language acquisition. New York: Routledge.
Lotto, L., Dell’Acqua, R., & Job, R. (2001). Le figure PD/DPSS. Misure di accordo sul
nome, tipicità, familiarità, età di acquisizione e tempi di denominazione per 266 figure.
Giornale Italiano Di Psicologia, 28(1), 193–210. doi:10.1421/337
384 References
Lowell, R., & Morris, R. K. (2014). Word length effects on novel words: Evidence from
eye movements. Attention, Perception, & Psychophysics, 76(1), 179–189. doi:10.3758/
s13414-013-0556-4
Luck, S. J. (2014). An introduction to the event-related potential technique (2nd ed.). Cambridge,
MA: The MIT Press.
Luke, S. G., Henderson, J. M., & Ferreira, F. (2015). Children’s eye-movements during
reading reflect the quality of lexical representations: An individual differences approach.
Journal of Experimental Psychology: Learning, Memory, and Cognition, 41(6), 1675–1683.
https://psycnet.apa.org/doi/10.1037/xlm0000133
Lupyan, G. (2016).The centrality of language in human cognition. Language Learning, 66(3),
516–553. doi:10.1111/lang.12155
MacDonald, M. C., Pearlmutter, N. J., & Seidenberg, M. S. (1994). The lexical
nature of syntactic ambiguity resolution. Psychological Review, 101(4), 676–703.
doi:10.1037/0033-295X.101.4.676
Mackey, A., & Gass, S. M. (2016). Second language research: Methodology and research (2nd ed.).
Mahwah, NJ: Laurence Erlbaum Associates.
MacLeod, C. M. (1991). Half a century of research on the Stroop effect: An integrative
review. Psychological Bulletin, 109(2), 163–203. doi:10.1037/0033-2909.109.2.163
*Marian, V., & Spivey, M. (2003a). Bilingual and monolingual processing of competing
lexical items. Applied Psycholinguistics, 24(2), 173–193. doi:10.1017/S0142716403000092
*Marian,V., & Spivey, M. (2003b). Competing activation in bilingual language processing:
Within- and between-language competition. Bilingualism: Language and Cognition, 6(2),
97–115. doi:10.1017/S1366728903001068
Marinis, T. (2010). Using on-line processing methods in language acquisition research. In
E. Blom & S. Unsworth (Eds.), Experimental methods in language acquisition research (pp.
139–162). New York: John Benjamins.
Marinis, T., Roberts, L., Felser, C., & Clahsen, H. (2005). Gaps in second language
sentence processing. Studies in Second Language Acquisition, 27(1), 53–78. doi:10.1017/
S0272263105050035
Marslen-Wilson, W., & Tyler, L. K. (1987). Against modularity. In J. L. Garfield (Ed.),
Modularity in knowledge representation and natural language understanding (pp. 37–62).
Cambridge, MA: The MIT Press.
Martinez-Conde, S., & Macknik, S. L. (2007). Science in culture: Mind tricks. Nature,
448(7152), 414–414. doi:10.1038/448414a
Martinez-Conde, S., & Macknik, S. L. (2011). Microsaccades. In S. P. Liversedge, I. Gilchrist,
& S. Everling (Eds.), The Oxford handbook of eye movements (pp. 95–114). Oxford, UK:
Oxford University Press. doi:10.1093/oxfordhb/9780199539789.013.0006
Matin, E. (1974). Saccadic suppression: A review and an analysis. Psychological Bulletin,
81(12), 899–917. doi:10.1037/h0037368
Matin, E., Shao, K. C., & Boff, K. R. (1993). Saccadic overhead: Information-processing time
with and without saccades. Perception and Psychophysics, 53(4), 372–380. doi:10.3758/
BF03206780
Matuschek, H., Kliegl, R., Vasishth, S., Baayen, H., & Bates, D. (2017). Balancing Type I
error and power in linear mixed models. Journal of Memory and Language, 94, 305–315.
doi:10.1016/j.jml.2017.01.001
McClelland, J. L., & Elman, J. L. (1986).The TRACE model of speech perception. Cognitive
Psychology, 18(1), 1–86. doi:10.1016/0010-0285(86)90015-0
McClelland, J. L., & O’Regan, J. K. (1981). Expectations increase the benefit
derived from parafoveal visual information in reading words aloud. Journal
References 385
Morgan-Short, K., Sanz, C., Steinhauer, K., & Ullman, M. T. (2010). Second
language acquisition of gender agreement in explicit and implicit training
conditions: An event-related potential study. Language Learning, 60(1), 154–193.
doi:10.1111/j.1467-9922.2009.00554.x
Morgan-Short, K., Steinhauer, K., Sanz, C., & Ullman, M.T. (2012). Explicit and implicit
second language training differentially affect the achievement of native-like brain
activation patterns. Journal of Cognitive Neuroscience, 24(4), 933–947. doi:10.1162/jocn
Morgan-Short, K., & Tanner, D. (2014). Event-related potentials (ERPs). In J. Jegerski & B.
VanPatten (Eds.), Research methods in second language psycholinguistics (pp. 127–152). New
York: Routledge.
Morrison, R. E. (1984). Manipulation of stimulus onset delay in reading: Evidence for
parallel programming of saccades. Journal of Experimental Psychology: Human Perception
and Performance, 10(5), 667–682. doi:10.1037/0096-1523.10.5.667
Mueller, J. L. (2005). Electrophysiological correlates of second language processing. Second
Language Research, 21(2), 152–174. doi:10.1191/0267658305sr256oa
Mulvey, F., Pelz, J., Simpson, S., Cleveland, D.,Wang, D., Latorella, K., … Hayhoe, M. (2018,
March). How reliable is eye movement data? Results of large-scale system comparison
and universal standards for measuring and reporting eye data quality. Paper to the
Annual Meeting of the American Association of Applied Linguistics, Chicago, IL.
*Muñoz, C. (2017). The role of age and proficiency in subtitle reading. An eye-tracking
study. System, 67, 77–86. doi:10.1016/j.system.2017.04.015
Murray,W. S. (2000). Sentence processing: Issues and measures. In A. Kennedy, R. Radach, D.
Heller, & J. Pynte (Eds.), Reading as a perceptual process (pp. 649–664). Oxford, UK: Elsevier.
Murray, W. S., Fischer, M. H., & Tatler, B. W. (2013). Serial and parallel processes in eye
movement control: Current controversies and future directions. The Quarterly Journal of
Experimental Psychology, 66(3), 417–428. doi:10.1080/17470218.2012.759979
Nassaji, H. (2003). Higher–level and lower–level text processing skills in advanced
ESL reading comprehension. The Modern Language Journal, 87(2), 261–276.
doi:10.1111/1540-4781.00189
Nesselhauf, N. (2005). Collocations in a learner corpus. Amsterdam, the Netherlands: John
Benjamins.
Nieuwenhuis, R., te Grotenhuis, H. F., & Pelzer, B. J. (2012). influence.ME: Tools for
detecting influential data in mixed effects models. The R-Journal, 4(2), 38–47. Retrieved
from http://hdl.handle.net/2066/103101
Nyström, M., Andersson, R., Holmqvist, K., & Van De Weijer, J. (2013). The influence of
calibration method and eye physiology on eyetracking data quality. Behavior Research
Methods, 45(1), 272–288. doi:10.3758/s13428-012-0247-4
O’Regan, J. K., & Levy-Schoen, A. (1987). Eye movement strategy and tactics in word
recognition and reading. In M. Coltheart (Ed.), Attention and performance XII (pp. 363–
383). Hillsdale, NJ: CRC Press.
Open Science Collaboration. (2015). Estimating the reproducibility of psychological
science. Science, 349(6251). doi:10.1126/science.aac4716
Osterhout, L., McLaughlin, J., Kim, A., Greenwald, R., & Inoue, K. (2004). Sentences in
the brain: Event-related potentials as real-time reflections of sentence comprehension
and language learning. In M. Carreiras & C. Clifton (Eds.), The on-line study of sentence
comprehension: Eyetracking, ERPs, and beyond (pp. 271–308). New York: Psychology Press.
Paap, K. R. (2018). Bilingualism in cognitive science. In A. De Houwer & L. Ortega (Eds.),
The Cambridge handbook of bilingualism (pp. 435–465). Cambridge, MA: Cambridge
University Press. doi:10.1017/9781316831922.023
388 References
Paas, F., Renkl, A., & Sweller, J. (2004). Cognitive load theory: Instructional implications of
the interaction between information structures and cognitive architecture. Instructional
Science, 32, 1–8.
Panichi, M., Burr, D., Morrone, M. C., & Baldassi, S. (2012). Spatiotemporal dynamics of
perisaccadic remapping in humans revealed by classification images. Journal of Vision,
12(4), 11, 1–15. doi:10.1167/12.4.11
Papadopoulou, D., & Clahsen, H. (2003). Parsing strategies in L1 and L2 sentence processing.
Studies in Second Language Acquisition, 25(4), 501–528. doi:10.1017/S0272263103000214
Papadopoulou, D., Tsimpli, I., & Amvrazis, N. (2014). Self-paced listening. In J. Jegerski
& B. VanPatten (Eds.), Research methods in second language psycholinguistics (pp. 50–68).
London, UK: Taylor & Francis.
Papafragou, A., Hulbert, J., & Trueswell, J. (2008). Does language guide event perception?
Evidence from eye movements. Cognition, 108(1), 155–184. doi:10.1016/j.
cognition.2008.02.007
Paterson, K. B., McGowan,V. A.,White, S. J., Malik, S., Abedipour, L., & Jordan,T. R. (2014).
Reading direction and the central perceptual span in Urdu and English. PLoS ONE,
9(2), e88358. doi:10.1371/journal.pone.0088358
Pawley, A., & Syder, F. H. (1983). Two puzzles for linguistic theory: nativelike selection and
nativelike fluency. In J. C. Richards & R. W. Schmidt (Eds.), Language and communication
(pp. 191–227). New York: Routledge.
*Pellicer-Sánchez, A. (2016). Incidental L2 vocabulary acquisition from and while reading:
An eye-tracking study. Studies in Second Language Acquisition, 38(1), 97–130. doi:10.1017/
S0272263115000224
Perfetti, C.A., Liu,Y., & Tan, L. H. (2005).The lexical constituency model: some implications
of research on Chinese for general theories of reading. Psychological Review, 112(1),
43–59. https://psycnet.apa.org/doi/10.1037/0033-295X.112.1.43
Peters, R. E., Grüter, T., & Borovsky, A. (2018). Vocabulary size and native speaker self-
identification influence flexibility in linguistic prediction among adult bilinguals.
Applied Psycholinguistics, 39(6), 1439–1469. doi:10.1017/S0142716418000383
*Philipp, A. M., & Huestegge, L. (2015). Language switching between sentences in reading:
Exogenous and endogenous effects on eye movements and comprehension. Bilingualism:
Language and Cognition, 18(4), 614–625. doi:10.1017/S1366728914000753
Phillips, C. (2006). The real-time status of island phenomena. Language, 82(4), 795–823.
doi:10.1353/lan.2006.0217
Pickering, M. J., & Garrod, S. (2013). An integrated theory of language production
and comprehension. Behavioral and Brain Sciences, 36(4), 329–347. doi:10.1017/
S0140525X12001495
Plonsky, L. (2013). Study quality in SLA: An assessment of designs, analyses, and reporting
practices in quantitative L2 research. Studies in Second Language Acquisition, 35(4), 655–
687. doi:10.1017/S0272263113000399
Plonsky, L. (2014). Study quality in quantitative L2 research (1990–2010):A methodological
synthesis and call for reform. The Modern Language Journal, 98(1), 450–470.
doi:10.1111/j.1540-4781.2014.12058.x
Plonsky, L., & Derrick, D. J. (2016). A meta-analysis of reliability coefficients in second
language research. The Modern Language Journal, 100(2), 538–553.
Plonsky, L., & Ghanbar, H. (2018). Multiple regression in L2 research: A methodological
synthesis and guide to interpreting R2 values. The Modern Language Journal, 102(4),
713–731. doi:10.1111/modl.12509
References 389
Plonksy, L., Marsden, E., Crowther, D., Gass, S. M., & Spinner, P. (2019). A methodological
synthesis and meta-analysis of judgment tasks in second language research. Second
Language Research. doi:10.1177/0267658319828413
Plonsky, L., & Oswald, F. L. (2014). How big is “big”? Interpreting effect sizes in L2 research.
Language Learning, 64(4), 878–912. doi:10.1111/lang.12079
Plonsky, L., & Oswald, F. L. (2017). Multiple regression as a flexible alternative to ANOVA
in L2 research. Studies in Second Language Acquisition, 39(3), 579–592. doi:10.1017/
S0272263116000231
Polio, C., & Gass, S. (1997). Replication and reporting: A commentary. Studies in Second
Language Acquisition, 19(4), 499–508. doi:10.1017/S027226319700404X
Pollatsek,A., Bolozky, S.,Well,A. D., & Rayner, K. (1981).Asymmetries in the perceptual span
for Israeli readers. Brain and Language, 14(1), 174–180. doi:10.1016/0093-934X(81)
90073-0
Pomplun, M., Reingold, E. M., & Shen, J. (2001). Investigating the visual span in comparative
search: The effects of task difficulty and divided attention. Cognition, 81(2), B57–B67.
doi:10.1016/S0010-0277(01)00123-8
Porte, G. (Ed.). (2012). Replication research in applied linguistics. Cambridge, MA: Cambridge
University Press.
Posner, M. A. (1980). Orienting of attention. Quarterly Journal of Experimental Psychology,
32(1), 3–25. doi:10.1080/00335558008248231
Posner, M. I., & Petersen, S. (1990). The attention system of the human brain. Annual
Review of Neuroscience, 13, 25–42. doi:10.1146/annurev.ne.13.030190.000325
Posner, M. I., Snyder, C. R. R., & Davidson, B. J. (1980). Attention and the
detection of signals. Journal of Experimental Psychology: General, 109(2), 160–174.
doi:10.1037/0096-3445.109.2.160
Potter, M. C. (1984). Rapid serial visual presentation (RSVP): A method for studying
language processing. In D. Kieras & M. Just (Eds.), New methods in reading comprehension
research (pp. 91–118). Hillsdale, NJ: Erlbaum.
*Pozzan, L., & Trueswell, J. C. (2016). Second language processing and revision of garden-
path sentences: A visual word study. Bilingualism: Language and Cognition, 19(3), 636–643.
doi:10.1017/S1366728915000838
Pressley, M., & Afflerbach, P. (1995). Verbal protocols of reading: The nature of constructively
responsive reading. Hillsdale, NJ: Lawrence Erlbaum.
Pynte, J., & Kennedy, A. (2006). An influence over eye movements in reading exerted
from beyond the level of the word: Evidence from reading English and French. Vision
Research, 46(22), 3786–3801. doi:10.1016/j.visres.2006.07.004
Pynte, J., New, B., & Kennedy, A. (2008). A multiple regression analysis of syntactic and
semantic influences in reading normal text. Journal of Eye Movement Research, 2(1), 1–11.
doi:10.16910/jemr.2.1.4
Qi, D. S., & Lapkin, S. (2001). Exploring the role of noticing in a three-stage second
language writing task. Journal of Second Language Writing, 10(4), 277–303. doi:10.1016/
S1060-3743(01)00046-7
Radach, R., Inhoff, A. W., Glover, L., & Vorstius, C. (2013). Contextual constraint and N
+ 2 preview effects in reading. The Quarterly Journal of Experimental Psychology, 66(3),
619–633. doi:10.1080/17470218.2012.761256
Radach, R., & Kennedy, A. (2004). Theoretical perspectives on eye movements in reading:
Past controversies, current issues, and an agenda for future research. European Journal of
Cognitive Psychology, 16, 3–26. doi:10.1080/09541440340000295
390 References
Radach, R., & Kennedy, A. (2013). Eye movements in reading: Some theoretical context.
The Quarterly Journal of Experimental Psychology, 66(3), 429–452. doi:10.1080/1747021
8.2012.750676
Radach, R., Reilly, R., & Inhoff, A. (2007). Models of oculomotor control in reading. In
R. P. G. van Gompel, M. H. Fischer, W. S. Murray, & R. L. Hill (Eds.), Eye movements:
A window on mind and brain (pp. 237–269). Amsterdam, the Netherlands: Elsevier.
doi:10.1016/B978-008044980-7/50013-6
Radach, R., Schmitten, C., Glover, L., & Huestegge, L. (2009). How children read
for comprehension: Eye movements in developing readers. In R. K. Wagner, C.
Schatschneider, & C. Phythian-Sence (Eds.), Beyond decoding:The biological and behavioral
foundations of reading comprehension (pp. 75–106). New York: Guildford Press.
Ratcliff, R. (1993). Methods for dealing with reaction time outliers. Psychological Bulletin,
114(3), 510–532. doi:10.1037/0033-2909.114.3.510
Rayner, K. (n.d.) Keith Rayner (1943–2015). Retrieved from http://www.forevermissed.
com/keith-rayner/#about
Rayner, K. (1975).The perceptual span and peripheral cues in reading. Cognitive Psychology,
7(1), 65–81. doi:10.1016/0010-0285(75)90005-5
Rayner, K. (1979). Eye guidance in reading: Fixation locations within words. Perception, 8,
21–30. doi:10.1068/p080021
Rayner, K. (1986). Eye movements and the perceptual span in beginning and skilled readers.
Journal of Experimental Child Psychology, 41(2), 211–236. doi:10.1016/0022-0965(86)
90037-8
Rayner, K. (1998). Eye movements in reading and information processing: 20 years of
research. Psychological Bulletin, 124(3), 372–422. doi:10.1037/0033-2909.124.3.372
Rayner, K. (2009). Eye movements and attention in reading, scene perception, and
visual search. The Quarterly Journal of Experimental Psychology, 62(8), 1457–1506.
doi:10.1080/17470210902816461
Rayner, K., Juhasz, B. J., & Pollatsek, A. (2005). Eye movements during reading. In M.
Snowling & C. Hulme (Eds.), The science of reading: A handbook (pp. 79–97). Oxford,
UK: Blackwell.
Rayner, K., & McConkie, G. W. (1976). What guides a reader’s eye movements? Vision
Research, 16(8), 829–837. doi:10.1016/0042-6989(76)90143-7
Rayner, K., & Morris, R. K. (1991). Comprehension processes in reading ambiguous
sentences: Reflections from eye movements. Advances in Psychology, 77, 175–198.
doi:10.1016/S0166-4115(08)61533-2
Rayner, K., & Pollatsek, A. (2006). Eye-movement control in reading. In M. J.Traxler & M.
A. Gernsbacher (Eds.), Handbook of psycholinguistics (pp. 613–657). New York: Elsevier.
doi:10.1016/B978-012369374-7/50017-1
Rayner, K., Pollatsek, A., Ashby, J., & Clifton Jr., C. (2012). Psychology of reading (2nd ed.).
Psychology Press.
Rayner, K., Pollatsek, A., Drieghe, D., Slattery, T. J., & Reichle, E. D. (2007). Tracking
the mind during reading via eye movements: Comments on Kliegl, Nuthmann,
and Engbert (2006). Journal of Experimental Psychology: General, 136(3), 520–529.
doi:10.1037/0096-3445.136.3.520
Rayner, K., Schotter, E. R., Masson, M. E., Potter, M. C., & Treiman, R. (2016). So much
to read, so little time: How do we read, and can speed reading help? Psychological Science
in the Public Interest, 17(1), 4–34. doi:10.1177%2F1529100615623267
References 391
Rayner, K., Warren, T., Juhasz, B. J., & Liversedge, S. P. (2004). The effect of plausibility
on eye movements in reading. Journal of Experimental Psychology: Learning, Memory, and
Cognition, 30(6), 1290–1301. doi:10.1037/0278-7393.30.6.1290
Rayner, K., & Well, A. D. (1996). Effects of contextual constraint on eye movements
in reading: A further examination. Psychonomic Bulletin & Review, 3(4), 504–509.
doi:10.3758/BF03214555
Rayner, K., Well, A. D., Pollatsek, A., & Bertera, J. H. (1982). The availability of useful
information to the right of fixation in reading. Perception and Psychophysics, 31(6), 537–
550. doi:10.3758/BF03204186
Rebuschat, P. (2008). Implicit learning of natural language syntax (Unpublished doctoral
dissertation). University of Cambridge. https://doi-org.proxy1.cl.msu.edu/10.17863/
CAM.15883
Rebuschat, P., & Williams, J. N. (2012). Implicit and explicit knowledge in second language
acquisition. Applied Psycholinguistics, 33(4), 829–856. doi:10.1017/S0142716411000580
Reichle, E. D. (2011). Serial-attention models of reading. In S. P. Liversedge, I. Gilchrist,
& S. Everling (Eds.), The Oxford handbook of eye movements (pp. 767–786). Oxford, UK:
Oxford University Press. doi:10.1093/oxfordhb/9780199539789.013.0042
Reichle, E. D., Liversedge, S. P., Drieghe, D., Blythe, H. I., Joseph, H. S. S. L., White, S. J.,
& Rayner, K. (2013). Using E-Z Reader to examine the concurrent development
of eye-movement control and reading skill. Developmental Review, 33(2), 110–149.
doi:10.1016/j.dr.2013.03.001
Reichle, E. D., Pollatsek, A., & Rayner, K. (2006). E–Z Reader: A cognitive-control, serial-
attention model of eye-movement behavior during reading. Cognitive Systems Research,
7(1), 4–22. doi:10.1016/j.cogsys.2005.07.002
Reichle, E. D., Pollatsek, A., & Rayner, K. (2012). Using E-Z Reader to simulate eye
movements in nonreading tasks: A unified framework for understanding the eye–mind
link. Psychological Review, 119(1), 155–185. doi:10.1037/a0026473
Reichle, E. D., Rayner, K., & Pollatsek, A. (1999). Eye movement control in reading:
accounting for initial fixation locations and refixations within the E-Z Reader model.
Vision Research, 39(26), 4403–4411. doi:10.1016/S0042-6989(99)00152-2
Reichle, E. D., Rayner, K., & Pollatsek, A. (2003).The E-Z Reader model of eye-movement
control in reading: Comparisons to other models. Behavioral and Brain Sciences, 26,
445–526. doi:10.1017/S0140525X03000104
Reichle, E. D.,Warren,T., & McConnell, K. (2009). Using E-Z Reader to model the effects
of higher level language processing on eye movements during reading. Psychonomic
Bulletin & Review, 16(1), 1–21. doi:10.3758/PBR.16.1.1
Reilly, R. G., & Radach, R. (2006). Some empirical tests of an interactive activation
model of eye movement control in reading. Cognitive Systems Research, 7(1), 34–55.
doi:10.1016/j.cogsys.2005.07.006
Révész, A. (2014). Towards a fuller assessment of cognitive models of task-based learning:
Investigating task-generated cognitive demands and processes. Applied Linguistics, 35(1),
87–92. doi:10.1093/applin/amt039
Révész, A., Michel, M., & Lee, M. (2019). Exploring second language writers’ pausing and
revision behaviors: A mixed-methods study. Studies in Second Language Acquisition, 41(3),
605–631. doi: 10.1017/S027226311900024X
*Révész, A., Sachs, R., & Hama, M. (2014). The effects of task complexity and input
frequency on the acquisition of the past counterfactual construction through recasts.
Language Learning, 64(3), 615–650. doi:10.1111/lang.12061
392 References
Richard, F. D., Bond, C. F., & Stokes-Zoota, J. J. (2003). One hundred years of social
psychology quantitatively described. Review of General Psychology, 7(4), 331–363.
doi:10.1037/1089-2680.7.4.331
Rizzolatti, G., Riggio, L., Dascola, I., & Umiltá, C. (1987). Reorienting attention across the
horizontal and vertical meridians: Evidence in favor of a premotor theory of attention.
Neuropsychologia, 25(1), 31–40. doi:10.1016/0028-3932(87)90041-8
Rizzolatti, G., Riggio, L., & Sheliga, B. M. (1994). Space and selective attention. In C.
Umiltá & M. Moscovitch (Eds.), Attention and performance XV: Conscious and nonconscious
information processing (Vol. 15, pp. 232–265). Cambridge, MA: The MIT Press.
doi:10.1016/j.cub.2010.11.004
*Roberts, L., Gullberg, M., & Indefrey, P. (2008). Online pronoun resolution in L2
discourse: L1 influence and general learner effects. Studies in Second Language Acquisition,
30(3), 333–357. doi:10.1017/S0272263108080480
Roberts, L., & Siyanova-Chanturia, A. (2013). Using eye-tracking to investigate topics in
L2 acquisition and L2 processing. Studies in Second Language Acquisition, 35(2), 213–235.
doi:10.1017/S0272263112000861
Robinson, P. (2001). Task complexity, task difficulty, and task production: Exploring
interactions in a componential framework. Applied Linguistics, 22(1), 27–57. doi:10.1093/
applin/22.1.27
Robinson, P. (2011). Task-based language learning: A review of issues. Language Learning,
61(s1), 1–36. doi:10.1111/j.1467-9922.2011.00641.x
Robinson, P., & Ellis, N. C. (Eds.). (2008). Handbook of cognitive linguistics and second language
acquisition. New York: Routledge.
*Rodríguez, D. D. L., Buetler, K. A., Eggenberger, N., Preisig, B. C., Schumacher, R.,
Laganaro, M., … Müri, R. M. (2016).The modulation of reading strategies by language
opacity in early bilinguals: An eye movement study. Bilingualism: Language and Cognition,
19(3), 567–577. doi:10.1017/S1366728915000310
Rosa, E., & Leow, R. P. (2004). Awareness, different learning conditions, and second
language development. Applied Psycholinguistics, 25(2), 269–292. doi:10.1017/
S0142716404001134
Rosa, E., & O’Neill, M. D. (1999). Explicitness, intake, and the issue of awareness. Studies in
Second Language Acquisition, 21(4), 511–556. doi:10.1017/S0272263199004015
Rossion, B., & Pourtois, G. (2004). Revisiting Snodgrass and Vanderwart’s object pictorial
set: The role of surface detail in basic-level object recognition. Perception, 33(2), 217–
236. doi:10.1068/p5117
Rothman, J. (2009). Understanding the nature and outcomes of early bilingualism:
Romance languages as heritage languages. International Journal of Bilingualism, 13(2),
155–163. doi:10.1177/1367006909339814
Rubin, J. (1995).The contribution of video to the development of competence in listening.
In D. J. Mendelsohn & J. Rubin (Eds.), A guide for the teaching of second language listening
(pp. 151–165). San Diego, CA: Dominie Press.
Runner, J. T., Sussman, R. S., & Tanenhaus, M. K. (2003). Assignment of reference to
reflexives and pronouns in picture noun phrases: Evidence from eye movements.
Cognition, 89(1), B1–B13. doi:10.1016/S0010-0277(03)00065-9
Runner, J. T., Sussman, R. S., & Tanenhaus, M. K. (2006). Processing reflexives and
pronouns in picture noun phrases. Cognitive Science, 30(2), 193–241. doi:10.1207/
s15516709cog0000_58
References 393
Sachs, R., & Polio, C. (2007). Learners’ uses of two types of written feedback on a L2
writing revision task. Studies in Second Language Acquisition, 29(1), 67–100. doi:10.1017/
S0272263107070039
Sachs, R., & Suh, B. R. (2007). Textually enhanced recasts, learner awareness, and L2
outcomes in synchronous computer-mediated interaction. In A. Mackey (Ed.),
Conversational interaction in second language acquisition: A collection of empirical studies (pp.
197–227). Oxford, UK: Oxford University Press.
*Sagarra, N., & Ellis, N. C. (2013). From seeing adverbs to seeing verbal morphology:
Language experience and adult acquisition of L2 tense. Studies in Second Language
Acquisition, 35(2), 261–290. doi:10.1017/S0272263112000885
Salverda, A. P., Brown, M., & Tanenhaus, M. K. (2011). A goal-based perspective on eye
movements in visual world studies. Acta Psychologica, 137(2), 172–180. doi:10.1016/j.
actpsy.2010.09.010
Salvucci, D. D. (2001). An integrated model of eye movements and visual encoding.
Cognitive Systems Research, 1(4), 201–220. doi:10.1016/S1389-0417(00)00015-2
Sanz, C., Morales-Front, A., Zalbidea, J, & Zárate-Sández, G. (2016). Always in motion
the future is: Doctoral students’ use of technology for SLA research. In R. P. Leow, L.
Cerezo, & M. Baralt (Eds.), A psycholinguistic approach to technology and language learning
(pp. 49–68). Berlin, Germany: De Gruyter Mouton.
Saslow, M. G. (1967). Effects of components of displacement-step stimuli upon latency for
saccadic eye movement. Journal of the Optical Society of America, 57(8), 1024. doi:10.1364/
JOSA.57.001024
Schmidt, R. (1990). The role of consciousness in second language learning. Applied
Linguistics, 11(2), 129–158. doi:10.1093/applin/11.2.129
Schmidt, R., & Frota, S. (1986). Developing basic conversational ability in a second
language: A case study of an adult learner of Portuguese. In R. Day (Ed.), Talking
to learn: Conversation in second language acquisition (pp. 237–326). Rowley, MA:
Newbury.
Schmitt, N. (2010). Researching vocabulary: A vocabulary research manual. New York: Springer.
Schott, E. (1922). Über die Registrierung des Nystagmus und anderer Augenbewegungen
vermittels des Seitengalvenometers. Deutsches Archiv Für Klinische Medizin, 140, 79–90.
Sedivy, J. C. (2003). Pragmatic versus form-based accounts of referential contrast: Evidence
for effects of informativity expectations. Journal of Psycholinguistic Research, 32(1), 3–23.
doi:10.1023/A:1021928914454
Sedivy, J. C., Tanenhaus, M., Chambers, C. G., & Carlson, G. N. (1999). Achieving
incremental semantic interpretation through contextual representation. Cognition,
71(2), 109–147. doi:10.1016/S0010-0277(99)00025-6
Segalowitz, N. (2010). Cognitive bases of second language fluency. New York: Routledge.
*Sekerina, I. A., & Sauermann, A. (2015). Visual attention and quantifier-spreading
in heritage Russian bilinguals. Second Language Research, 31(1), 75–104.
doi:10.1177/0267658314537292
*Sekerina, I. A., & Trueswell, J. C. (2011). Processing of contrastiveness by heritage
Russian bilinguals. Bilingualism: Language and Cognition, 14(3), 280–300. doi:10.1017/
S1366728910000337
Sereno, S. C., & Rayner, K. (2003). Measuring word recognition in reading: Eye movements
and event-related potentials. Trends in Cognitive Sciences, 7(11), 489–493. doi:10.1016/j.
tics.2003.09.010
394 References
Sereno, S. C., Rayner, K., & Posner, M. I. (1998). Establishing a time-line of word
recognition: Evidence from eye movements and event-related potentials. Cognitive
Neuroscience, 9(10), 2195–2200. doi:10.1097/00001756-199807130-00009
Severens, E.,Van Lommel, S., Ratinckx, E., & Hartsuiker, R. J. (2005).Timed picture naming
norms for 590 pictures in Dutch. Acta Psychologica, 119(2), 159–187. doi:10.1016/j.
actpsy.2005.01.002
Sheliga, B. M., Craighero, L., Riggio, L., & Rizzolatti, G. (1997). Effects of spatial attention
on directional manual and ocular responses. Experimental Brain Research, 114(2), 339–
351. doi:10.1007/PL00005642
Shen, D., Liversedge, S. P., Tian, J., Zang, C., Cui, L., Bai, X., … Rayner, K. (2012). Eye
movements of second language learners when reading spaced and unspaced Chinese
text. Journal of Experimental Psychology: Applied, 18(2), 192–202. doi:10.1037/a0027485
Shen, H. H. (2014). Chinese L2 literacy debates and beginner reading in the United States.
In M. Bigelow & J. Ennser-Kananen (Eds.), Routledge handbook of educational linguistics
(pp. 276–288). New York: Routledge.
*Shintani, N., & Ellis, R. (2013). The comparative effect of direct written corrective
feedback and metalinguistic explanation on learners’ explicit and implicit knowledge
of the English indefinite article. Journal of Second Language Writing, 22(3), 286–306.
doi:10.1016/j.jslw.2013.03.011
Simard, D., & Wong,W. (2001). Alertness, orientation, and detection:The conceptualization
of attentional functions in SLA. Studies in Second Language Acquisition, 23(1), 103–124.
doi:10.1017/S0272263101001048
Sinclair, J. (1991). Corpus, concordance, collocation. Oxford, UK: Oxford University Press.
*Singh, N., & Mishra, R. K. (2012). Does language proficiency modulate oculomotor
control? Evidence from Hindi–English bilinguals. Bilingualism: Language and Cognition,
15(4), 771–781. doi:10.1017/S1366728912000065
*Siyanova-Chanturia, A., Conklin, K., & Schmitt, N. (2011). Adding more fuel to the fire:
An eye-tracking study of idiom processing by native and non-native speakers. Second
Language Research, 27(2), 251–272. doi:10.1177/0267658310382068
Siyanova-Chanturia, A., Conklin, K., & van Heuven, W. J. B. (2011). Seeing a phrase “time
and again” matters: The role of phrasal frequency in the processing of multiword
sequences. Journal of Experimental Psychology: Learning, Memory, and Cognition, 37(3),
776–784. doi:10.1037/a0022531
Skehan, P. (1998). Task-based instruction. Annual Review of Applied Linguistics, 18, 268–286.
doi:10.1017/S0267190500003585
Skehan, P. (2009). Modelling second language performance: Integrating complexity,
accuracy, fluency, and lexis. Applied Linguistics, 30(4), 510–532. doi:10.1093/applin/
amp047
Smith, B. (2005). The relationship between negotiated interaction, learner uptake, and
lexical acquisition in task-based computer-mediated communication. TESOL Quarterly,
39(1), 33–58. doi:10.2307/3588451
Snijders,T. A. B., & Bosker, R. J. (2012). Multilevel analysis: An introduction to basic and advanced
multilevel modeling (2nd ed.). London, UK: Sage.
Snodgrass, J. G., &Vanderwart, M. (1980).A standardized set of 260 pictures: Norms for name
agreement, image agreement, familiarity, and visual complexity. Journal of Experimental
Psychology: Human Learning and Memory, 6(2), 174–215. doi:10.1037/0278-7393.6.2.174
*Sonbul, S. (2015). Fatal mistake, awful mistake, or extreme mistake? Frequency effects on
off-line/on-line collocational processing. Bilingualism: Language and Cognition, 18(3),
419–437. doi:10.1017/S1366728914000674
References 395
Song, M.-J., & Suh, B.-R. (2008).The effects of output task types on noticing and learning
of the English past counterfactual conditional. System, 36(2), 295–312. doi:10.1016/j.
system.2007.09.006
Sorace, A. (2005). Syntactic optionality at interfaces. In L. Cornips & K. Corrigan (Eds.),
Syntax and variation: Reconciling the biological and the social (pp. 46–111). Amsterdam, the
Netherlands: John Benjamins.
Sorace, A. (2011). Pinning down the concept of “interface” in bilingualism. Linguistic
Approaches to Bilingualism, 1(1), 1–33. doi:10.1075/lab.1.1.01sor
Spector, R. H. (1990).Visual fields. In H. K. Walker, W. D. Hall, & J. W. Hurst (Eds.), Clinical
methods: The history, physical, and laboratory examinations (3rd ed.) (pp. 565–572). Boston,
MA: Butterworths.
Spinner, P., & Gass, S. M. (2019). Using judgments in second language acquisition research.
New York: Routledge.
*Spinner, P., Gass, S. M., & Behney, J. (2013). Ecological validity in eye-tracking. Studies in
Second Language Acquisition, 35(2), 389–415. doi:10.1017/S0272263112000927
Spivey, M. J., & Marian, V. (1999). Cross talk between native and second languages:
Partial activation of an irrelevant lexicon. Psychological Science, 10(3), 281–284.
doi:10.1111/1467-9280.00151
Spivey, M., & Cardon, C. (2015). Methods for studying adult bilingualism. In J.W. Schwieter
(Ed.), The Cambridge handbook of bilingual processing (pp. 108–132). Cambridge, UK:
Cambridge University Press. doi:10.1017/CBO9781107447257.004
Springob, C. (2015). Why is it easier to see a star if you look slightly to the side? Ask an
astronomer. Retrieved from http://curious.astro.cornell.edu/physics/81-the-universe/
stars-and-star-clusters/stargazing/373-why-is-it-easier-to-see-a-star-if-you-look-
slightly-to-the-side-intermediate
SR Research. (2017). EyeLink Data Viewer 3.1.97 (Computer Software). Mississauga, ON:
SR Research Ltd.
Starr, M. S., & Rayner, K. (2001). Eye movements during reading: Some current controversies.
Trends in Cognitive Sciences, 5(4), 156–163. doi:10.1016/S1364-6613(00)01619-3
Steinhauer, K. (2014). Event-related potentials (ERPs) in second language research: A brief
introduction to the technique, a selected review, and an invitation to reconsider critical
periods in L2. Applied Linguistics, 35(4), 393–417. doi:10.1093/applin/amu028
Steinhauer, K., & Drury, J. E. (2012). On the early left-anterior negativity (ELAN) in syntax
studies. Brain and Language, 120(2), 135–162. doi:10.1016/j.bandl.2011.07.001
Stephane, A. L. (2011). Eye tracking from a human factors perspective. In G. A. Boy (Ed.),
The handbook of human-machine interaction: A human-centered design approach. (pp. 339–364)
New York: CRC Press.
Stevenson, H. W., Lee, S.-Y., Chen, C., Stigler, J. W., Hsu, C.-C., Kitamura, S., & Hatano,
G. (1990). Contexts of achievement: A study of American, Chinese, and Japanese
children. Monographs of the Society for Research in Child Development, 55(1/2), i-119.
doi:10.2307/1166090
Stroop, J. R. (1935). Studies of interference in serial verbal reactions. Journal of Experimental
Psychology, 18, 643–662. doi:10.1037/h0054651
Styles, E. A. (2006). The psychology of attention. New York: Psychology Press.
Sussman, R. S. (2006). Processing and representation of verbs: Insights from instruments
(Unpublished doctoral dissertation). University of Rochester.
*Suvorov, R. (2015). The use of eye tracking in research on video-based second language
(L2) listening assessment: A comparison of context videos and content videos. Language
Testing, 32(4), 463–483. doi:10.1177%2F0265532214562099
396 References
Tokowicz, N., & MacWhinney, B. (2005). Implicit and explicit measures of sensitivity to
violations in second language grammar: An event-related potential investigation. Studies
in Second Language Acquisition, 27(2), 173–204. doi:10.1017/S0272263105050102
Tomlin, R. S., & Villa, V. (1994). Attention in cognitive science and second language
acquisition. Studies in Second Language Acquisition, 16(2), 183–203. doi:10.1017/
S0272263100012870
*Tremblay, A. (2011). Learning to parse liaison-initial words: An eye-tracking study.
Bilingualism: Language and Cognition, 14(3), 257–279. doi:10.1017/S1366728910000271
*Trenkic, D., Mirković, J., & Altmann, G.T. (2014). Real-time grammar processing by native
and non-native speakers: Constructions unique to the second language. Bilingualism:
Language and Cognition, 17(2), 237–257. doi:10.1017/S1366728913000321
Trueswell, J. C., Sekerina, I., Hill, N. M., & Logrip, M. L. (1999). The kindergarten-path
effect: studying on-line sentence processing in young children. Cognition, 73(2), 89–134.
doi:10.1016/S0010-0277(99)00032-3
Trueswell, J. C., Tanenhaus, M. K., & Kello, C. (1993). Verb-specific constraints in
sentence processing: Separating effects of lexical preference from garden-paths.
Journal of Experimental Psychology: Learning, Memory, and Cognition, 19(3), 528–553.
doi:10.1037/0278-7393.19.3.528
Tsai, J. L., & McConkie, G. W. (2003). Where do Chinese readers send their eyes. In R.
Radach, J. Hyona, & H. Deubel (Eds.), The mind’s eye: Cognitive and applied aspects of eye
movement research (pp. 159–176). New York: Elsevier.
Ullman, M.T. (2005). A cognitive neuroscience perspective on second language acquisition:
The declarative/procedural model. In C. Sanz (Ed.), Mind and context in adult second
language acquisition: Methods, theory and practice (pp. 141–178). Washington, DC:
Georgetown University Press.
Vainio, S., Hyönä, J., & Pajunen, A. (2009). Lexical predictability exerts robust effects on
fixation duration, but not on initial landing position during reading. Experimental
Psychology, 56(1), 66–74. doi:10.1027/1618-3169.56.1.66
*Vainio, S., Pajunen, A., & Hyönä, J. (2016). Processing modifier–head agreement in
L1 and L2 Finnish: An eye-tracking study. Second Language Research, 32(1), 3–24.
doi:10.1177/0267658315592201
Valian, V. (2015). Bilingualism and cognition. Bilingualism: Language and Cognition, 18(1),
3–24. doi:10.1017/S1366728914000522
Van Assche, E., Drieghe, D., Duyck, W., Welvaert, M., & Hartsuiker, R. J. (2011). The
influence of semantic constraints on bilingual word recognition during sentence
reading. Journal of Memory and Language, 64(1), 88–107. doi:10.1016/j.jml.2010.08.006
*Van Assche, E., Duyck, W., & Brysbaert, M. (2013). Verb processing by bilinguals in
sentence contexts:The effect of cognate status and verb tense. Studies in Second Language
Acquisition, 35(2), 237–259. doi:10.1017/S0272263112000873
Vanderplank, R. (2016). Captioned media in foreign language learning and teaching: Subtitles for
the deaf and hard-of-hearing as tools for language learning. London, UK: Palgrave Macmillan.
doi:10.1057/978-1-137-50045-8
Van Hell, J. G., & Tanner, D. (2012). Second language proficiency and cross-language lexical
activation. Language Learning, 62(s2), 148–171. doi:10.1111/j.1467-9922.2012.00710.x
Van Hell, J. G., & Tokowicz, N. (2010). Event-related brain potentials and second language
learning: Syntactic processing in late L2 learners at different L2 proficiency levels.
Second Language Research, 26(1), 43–74. doi:10.1177/0267658309337637
398 References
Van Merriënboer, J. J., & Sweller, J. (2005). Cognitive load theory and complex learning:
Recent developments and future directions. Educational Psychology Review, 17(2), 147–
177. doi:10.1007/s10648-005-3951-0
VanPatten, B., & Williams, J. (2002). Research criteria for tenure in second language
acquisition: Results from a survey of the field (Unpublished manuscript). University of
Illinois at Chicago.
Van Wermeskerken, M., Litchfield, D., & van Gog, T. (2018). What am I looking at?
Interpreting dynamic and static gaze displays. Cognitive Science, 42(1), 220–252.
doi:10.1111/cogs.12484
Veldre, A., & Andrews, S. (2014). Lexical quality and eye movements: Individual differences
in the perceptual span of skilled adult readers. The Quarterly Journal of Experimental
Psychology, 67(4), 703–727. doi:10.1080/17470218.2013.826258
Veldre, A., & Andrews, S. (2015). Parafoveal lexical activation depends on skilled reading
proficiency. Journal of Experimental Psychology: Learning, Memory, and Cognition, 41(2),
586–595. doi:10.1037/xlm0000039
Vilaró, A., Duchowski, A. T., Orero, P., Grindinger, T., Tetreault, S., & di Giovanni,
E. (2012). How sound is the Pear Tree Story? Testing the effect of varying audio
stimuli on visual attention distribution. Perspectives, 20(1), 55–65. doi:10.1080/0907
676X.2011.632682
Vitu, F. (1991). The existence of a center of gravity effect during reading. Vision Research,
31(7–8), 1289–1313. doi:10.1016/0042-6989(91)90052-7
Vitu, F., O’Regan, J. K., Inhoff, A. W., & Topolski, R. (1995). Mindless reading: Eye-
movement characteristics are similar in scanning letter strings and reading texts.
Perception and Psychophysics, 57(3), 352–364. doi:10.3758/BF03213060
Vitu, F., O’Regan, J. K., & Mittau, M. (1990). Optimal landing position in reading isolated
words and continuous text. Perception and Psychophysics, 47(6), 583–600. doi:10.3758/
BF03203111
Von der Malsburg, T., & Angele, B. (2017). False positives and other statistical errors in
standard analyses of eye movements in reading. Journal of Memory and Language, 94,
119–133. doi:10.1016/j.jml.2016.10.003
Von der Malsburg, T., & Vasishth, S. (2011). What is the scanpath signature of syntactic
reanalysis? Journal of Memory and Language, 65(2), 109–127. doi:10.1016/j.
jml.2011.02.004
Vonk, W., & Cozijn, R. (2003). On the treatment of saccades and regressions in eye
movement measures of reading time. In J. Hyönä, R. Radach, & H. Deubel (Eds.), The
mind’s eye: Cognitive and applied aspects of eye movement research (pp. 291–312). Amsterdam,
the Netherlands: North-Holland.
Wade, N. J. (2007). Scanning the seen: Vision and the origins of eye movement research.
In R. P. G. Van Gompel, M. H. Fischer, W. S. Murray, & R. L. Hill (Eds.), Eye
movements: A window on mind and brain (pp. 31–63). Oxford, UK: Elsevier. doi:10.1016/
B978-008044980-7/50004-5
Wade, N. J., & Tatler, B.W. (2005). The moving tablet of the eye. Oxford, UK: Oxford University
Press. doi:10.1093/acprof:oso/9780198566175.001.0001
Wagner, E. (2008). Video listening tests: What are they measuring? Language Assessment
Quarterly, 5(3), 218–243. doi:10.1080/15434300802213015
Wang, D., Mulvey, F. B., Pelz, J. B., & Holmqvist, K. (2017). A study of artificial eyes for the
measurement of precision in eye-trackers. Behavior Research Methods, 49(3), 947–959.
doi:10.3758/s13428-016-0755-8
References 399
Wang, M., Perfetti, C. A., & Liu,Y. (2003). Alphabetic readers quickly acquire orthographic
structure in learning to read Chinese. Scientific Studies of Reading, 7(2), 183–208.
doi:10.1207/S1532799XSSR0702_4
Wedel, M., & Pieters, R. (2008). Eye tracking for visual marketing. Foundations and Trends
in Marketing, 1(4), 231–320. doi:10.1561/1700000011
Wengelin, A., Torrance, M., Holmqvist, K., Simpson, S., Galbraith, D., Johansson, V., &
Johansson, R. (2009). Combined eyetracking and keystroke-logging methods for
studying cognitive processes in text production. Behavior Research Methods, 41(2), 337–
351. doi:10.3758/BRM.41.2.337
Whelan, R. (2008). Effective analysis of reaction time data. The Psychological Record, 58(3),
475–482. doi:10.1007/BF03395630
White, S. J. (2008). Eye movement control during reading: Effects of word frequency
and orthographic familiarity. Journal of Experimental Psychology: Human Perception and
Performance, 34(1), 205–223. doi:10.1037/0096-1523.34.1.205
Whitford,V., & Titone, D. (2015). Second-language experience modulates eye movements
during first- and second-language sentence reading: Evidence from a gaze-contingent
moving window paradigm. Journal of Experimental Psychology: Learning, Memory, and
Cognition, 41(4), 1118–1129. doi:10.1037/xlm0000093
Wicha, N. Y. Y., Moreno, E. M., & Kutas, M. (2004). Anticipating words and their gender:
An event-related brain potential study of semantic integration, gender expectancy, and
gender agreement in Spanish sentence reading. Journal of Cognitive Neuroscience, 16(7),
1272–1288. doi:10.1162/0898929041920487
Wilcox, R. (2012). Introduction to robust estimation and hypothesis testing (3rd ed.). Amsterdam,
the Netherlands: Elsevier.
Williams, J. N. (2009). Implicit learning in second language acquisition. In W. C. Ritchie &
T. K. Bhatia (Eds.), The new handbook of second language acquisition (pp. 319–353). Bingley,
UK: Emerald Publishing.
Williams, R., & Morris, R. (2004). Eye movements, word familiarity, and vocabulary
acquisition. European Journal of Cognitive Psychology, 16(1–2), 312–339.
doi:10.1080/09541440340000196
Wilson, M. P., & Garnsey, S. M. (2009). Making simple sentences hard: Verb bias effects
in simple direct object sentences. Journal of Memory and Language, 60(3), 368–392.
doi:10.1016/j.jml.2008.09.005
*Winke, P. (2013). The effects of input enhancement on grammar learning and
comprehension. Studies in Second Language Acquisition, 35(2), 323–352. doi:10.1017/
S0272263112000903
*Winke, P., Gass, S., & Sydorenko, T. (2013). Factors influencing the use of captions by
foreign language learners: An eye-tracking study. The Modern Language Journal, 97(1),
254–275. doi:10.1111/j.1540-4781.2013.01432.x
Winke, P., Godfroid, A., & Gass, S. M. (2013). Introduction to the special issue. Studies in
Second Language Acquisition, 35(2), 205–212. doi:10.1017/S027226311200085X
Winskel, H., Radach, R., & Luksaneeyanawin, S. (2009). Eye movements when reading
spaced and unspaced Thai and English: A comparison of Thai–English bilinguals and
English monolinguals. Journal of Memory and Language, 61(3), 339–351. doi:10.1016/j.
jml.2009.07.002
Wray, A. (2002). Formulaic language and the lexicon. Cambridge, MA: Cambridge University
Press.
Wray, A. (2008). Formulaic language: Pushing the boundaries. Oxford, UK: Oxford University
Press.
400 References
Wright, R. D., & Ward, L. M. (2008). Orienting of attention. New York: Oxford University
Press.
Yan, M., Kliegl, R., Richter, E. M., Nuthmann, A., & Shu, H. (2010). Flexible saccade-
target selection in Chinese reading. The Quarterly Journal of Experimental Psychology,
63(4), 705–725. doi:10.1080/17470210903114858
Yang, H. M., & McConkie, G. W. (1999). Reading Chinese: Some basic eye-movement
characteristics. In J. Wang, H. C. Chen, R. Radach, & A. Inhoff (Eds.), Reading Chinese
script: A cognitive analysis (pp. 207–222). London, UK: Lawrence Erlbaum.
Yang, S. (2006). An oculomotor-based model of eye movements in reading: The
competition/interaction model. Cognitive Systems Research, 7(1), 56–69. doi:10.1016/j.
cogsys.2005.07.005
Yatabe, K., Pickering, M. J., & McDonald, S. A. (2009). Lexical processing during saccades
in text comprehension. Psychonomic Bulletin & Review, 16(1), 62–66. doi:10.3758/
PBR.16.1.62
*Yi, W., Lu, S., & Ma, G. (2017). Frequency, contingency and online processing of
multiword sequences: An eye-tracking study. Second Language Research, 33(4), 519–549.
doi:10.1177/0267658317708009
Young, L. R., & Sheena, D. (1975). Survey of eye movement recording methods. Behavior
Research Methods & Instrumentation, 7(5), 397–429. doi:10.3758/BF03201553
Zang, C., Liang, F., Bai, X.,Yan, G., & Liversedge, S. P. (2013). Interword spacing and landing
position effects during Chinese reading in children and adults. Journal of Experimental
Psychology: Human Perception and Performance, 39(3), 720–734. doi:10.1037/a0030097
Zhu, Z., & Ji, Q. (2007). Novel eye gaze tracking techniques under natural head movement.
IEEE Transactions on Biomedical Engineering, 54(12), 2246–2260. doi:10.1109/
TBME.2007.895750
Zlatev, J., & Blomberg, J. (2015). Language may indeed influence thought. Frontiers in
Psychology, 6, 1631. doi:10.3389/fpsyg.2015.01631
*Zufferey, S., Mak,W., Degand, L., & Sanders,T. (2015). Advanced learners’ comprehension
of discourse connectives:The role of L1 transfer across on-line and off-line tasks. Second
Language Research, 31(3), 389–411. doi:10.1177/0267658315573349
INDEX OF NAMES
Italic page numbers indicate figures. Bold page numbers indicate tables.
Aaronson, D. 6 Bertera, J. H. 38
Alanen, R. 3 Bialystok, E. 102
Allopenna, P. D. 11, 89, 91, 92, 94, 96, 100 Binda, P. 34–35
Alsadoon, R. 71, 77, 138, 141–142, 220, Bisson, M. 63, 80–82, 134, 211, 213, 226
222, 225 Blumenfeld, H. K. 91, 100
Altmann, G. T. M. 49, 90–91, 93, Blythe, H. I. 34, 341
94–95, 96, 105, 107, 144, 168, 182, Boers, F. 73, 75
191–192, 347 Boersma, P. 198
Andersson, R. 229, 324–326 Bojko, A. 237
Andringa, S. 78, 98, 112–114, 116, 124, Bolger, P. 64, 98, 114–116
165, 219, 344 Boston, M. F. 47
Anthony, L. 355, 362 Bowles, M. A. 1, 3–4, 12
Boxell, O. 65, 68–69, 225
Baayen, R. H. 264, 266–269, 276, 278 Braze, D. 252
Baccino, T. 15 Brône, G. 11, 123
Bachman, L. 61 Brysbaert, M. 153–154, 157n4
Baddeley, A. D. 81 Bultena, S. 49
Bahill, A. T. 34 Burnat, K. 26
Bai, X. 40
Balling, L. W. 72, 75, 221, 340 Cameron, A. C. 307
Balota, D. A. 44 Canseco-Gonzalez, E. 49
Baltova, I. 349 Carrol, G. 71, 74–75, 104, 211, 221, 223
Bar, M. 105 Chambers, C. G. 144, 347–348
Barnes, G. R. 36 Chamorro, G. 65, 223
Barnett,V. 260, 265–266 Chen, H. C. 40, 43
Barr, D. J. 277–278, 280–281, 283, 288, Chepyshko, R. 290, 293, 295, 299–300,
294, 298, 302 302–303
Bates, D. 188, 190, 309n6 Choi, J. E. S. 33
Bates, E. 309n6 Choi, S. 77, 87n5
Bax, S. 82, 175, 211, 227, 233, 238, 241 Choi, S.Y. 41, 43
Bell, B. A. 277 Choi, W. 38–39
402 Index of Names
Italic page numbers indicate figures. Bold page numbers indicate tables.
binning 290; see also time bins 264, 266, 274, 307; outliers 260, 261,
blinks, in eye-tracking records 255–256 265, 267, 269, 270, 271; residuals 268;
Brocanto2 study 14–15 sensitivity analysis 269; software for
252–253, 257, 261, 290; statistical tests
cameras 122, 159, 312, 316–317, 319, 271–272, 308; winsorization 266, 268
322, 329, 331–332, 333, 349, 355, 360; data collection: and areas of interest 159;
see also eye trackers logbook 359, 362; trials of eye-tracking
Canada: bilingualism in 42 data 356–360
Canadian Modern Language Review journal data collection methods 2; empirical
63, 97 data 31; eye tracking versus self-paced
captions see subtitles reading 69; offline methodologies 2;
categorical variables 127, 128, 132, online methodologies 2; qualitative 3;
136, 286 quantitative 3; real-time 86; see also
children: age/proficiency study 82–83; data research study designing
from 30; Finnish study of 43; saccades data quality 23, 140, 177, 252, 253, 256,
30, 34; and subtitles 82; and test-taking 320, 327, 329, 330, 333, 336, 348, 359
84, 85; TOEFL test study 84 databases: and images 188–189, 190
Chinese language: studies involving degrees of visual angle (°) 26, 27, 30,
40, 341–343, 345, 347, 351, 352; 176; Courier font 29, 30; and different
translated 74 tasks 32; formula 28; and saccades 33;
chinrests 361 see also visual angles
Cloze test 47 dependency paradigms 65
cognates 75, 177–178 dependency studies 68–69
cognitive processing: and eye gazes 22 dependent variables 127, 140, 145,
competitors 2, 91, 101, 115, 185, 192; 179, 238, 248, 272, 291, 292, 307;
competition effects 91, 98, 100, 101, bilingualism/SLA studies 214, 221, 226;
103; see also visual world paradigm and growth curves 293; scanpaths 246;
computers: in eye-tracking labs 12, 144, visual world studies 206, 208–209,
159, 176–177, 334, 335 212, 275
concurrent data collection methods 23; different-gender trials 108–109, 109,
see also online investigations; online 110, 125n6
methodologies discourse-level phenomena: and SPR 9,
concurrent verbalizations 12 112–113
confounds 46, 127–128, 182, 186, 188, 203 dispersion-based algorithms 323, 326, 328
content/container verbs 293, 294–295, doublets 128, 129, 132
302, 305, 306 drift 172–173, 257, 258
contextual constraint 31, 46, 49, 178;
see also word predictability E-Z Reader model 51, 53, 54, 55, 56
counterbalancing 136, 137, 138, 155–156; early processing: and eye-tracking 12;
and item lists 126, 135, 136, 155; see also measures of 216–220, 217, 221–224
research study designing ecological validity 12, 17, 122, 144, 174,
covert attention 38, 55; and eye 193, 321, 361
movements 21 EEG recordings 13–14; and eye
covert orienting 19–21 movements 15; FRPs 16
cross-linguistic research 41 effective field of view 38
Esperanto 114
data cleaning/analysis 251–252, 361–363; event conceptualization 121
data transformation 263–264, 265, 266, event-related potentials (ERPs) 2, 13,
270; drift correction 257, 258–259; 14–17, 18, 103; advantages of 14;
four-stage procedure 260, 261, 307; compared with eye-tracking 14; and
growth curve analysis 288, 290; the eye-tracking perspective 62; and
individual records/trials 253, 254–256, L2 proficiency 15; and predictability
257; logarithms 264, 265; logarithmic 16, 104; and processing 14; and reading
transformation 249n2, 250n3, 263, speeds 15; waveforms from 13, 14;
408 Index
target words: processing of 73; in saccade 140, 147; and fixation crosses 193–195;
studies 35; in visual world studies 91, practice trials 19, 138, 140, 177
166–167, 183; see also spillover region triplets 128, 129, 132, 180
task-based teaching/learning 79 tuning hypothesis 69
temporal sampling errors 324, 325, 326
TESOL Quarterly journal 63–64, 97 unconscious processes: methodologies
test taking behavior 3, 84, 85 capturing 3; see also implicit learning
test validity 84; and eye tracking 83 unspaced languages: and eye movements 40
text-based studies 65, 97, 98, 206;
defining 63; eye-tracking measures valid-cue trials 20
206, 207, 209; and fonts 174–175; validity 4, 12, 17, 58, 83–84, 85, 122, 127,
grammar 65–71; primary tasks for 143; 130, 144, 146, 174, 193, 226–227, 257,
vocabulary acquisition 71–76 260–261, 264, 321, 327–328, 361;
text-based study guidelines 181; artistic see also ecological validity; internal
factors 174–175, 176, 177; double validity; test validity
spacing 172, 362; linguistic constraints velocity-based algorithms 323
177–178; spatial constraints 171–72, visual acuity 24, 25, 26, 33, 39
173, 174; see also research study visual angles 26, 27–28; fonts 174; formula
designing 27, 28; see also degrees of visual angle (°)
text-reading experiments: and research visual field 24, 26, 27, 39; see also fovea;
ideas 339, 340 parafovea; periphery
text skimming: eye tracking records of 253, visual lobes 38
254, 255 visual spans; see also perceptual spans
text-to-speech programs 196 visual world paradigm 11–12, 17, 21,
think-aloud protocols 2–3, 23n1; awareness 88, 92, 94, 97, 112, 125n5, 143, 181,
vs. processing depth 4; equipment for 206; carrier phrase 200, 201, 202;
12; and reactivity 4; and short-term competitors 91; distractors 166–167,
memory 3; and SPR 12; studies of 168; dynamic interest areas 170; entry-
12–13 level ideas 343, 344, 345; experiment
thinking aloud: defined 2–3; and reading designing 186; and eye movements 89,
comprehension 4, 5, 6; and SLA 103, 107; eye tracking 97, 123; fixation
research 3, 17, 23n1 crosses 193, 194–195; image use in
time bins 288, 296, 298; see also binning 165, 166–170; and instruction effects
time-course analysis 287, 288, 302–304, 113–114, 115, 116; intermediate ideas
305, 305; anticipatory baseline effects 347–348; looking-while-listening
302; choosing time terms 299, 300–301, 21, 142–145, 193; morphosyntactic
302; data visualization 291, 293; growth predictions 108, 109, 110, 111, 112;
curve analysis 288, 290–291, 293, 295, oral production 120, 121–122, 123;
302; logistic/quasi-logistic regressions predictions 103, 104, 105–107, 112,
296, 297–298, 299, 302–303, 304–305; 123; previews 191–195; research
reporting 306 98–99; semantic predictions 107; Stroop
time stamps 199–200, 229, 290 tasks 99, 102, 229–230, 231; word
time windows: and fixations 198, 199, 200, recognition 99–100, 101–102, 103;
201, 202 see also spoken language research
TRACE model 92, 93 vocabulary 71; and images 189; see also
track loss 253–257, 308n1 bilingual lexicon
transparency: in research 196–197 vocabulary acquisition 3, 71–76, 135;
trials 86n2, 138, 139, 140–142; and areas questions about 76; and reading 72;
of interest 159; critical trials 70, 70, see also incidental vocabulary acquisition;
131, 138–141, 213, 344; distractor trials intentional vocabulary learning
166, 168, 184–186, 185; experimental vocabulary research study set-up 159,
trials 140, 157n1, 193; filler trials 138, 160, 161
414 Index